Designing and building webbased infrastructures for data sharing and collaboration - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Designing and building webbased infrastructures for data sharing and collaboration

Description:

Designing and building web-based infrastructures for data sharing and collaboration ... The Common Warehouse Metamodel (CWM) from OMG a model and syntax for the ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 24
Provided by: jost79
Category:

less

Transcript and Presenter's Notes

Title: Designing and building webbased infrastructures for data sharing and collaboration


1
Designing and building web-based infrastructures
for data sharing and collaboration
  • Presentation at
  • New Approaches to Software for Statistical
    Processing
  • RSS / ASC Joint Event
  • London, 21st January 2004
  • Jostein Ryssevik
  • Managing Director, Nesstar Ltd

2
What if.
  • statistical data could be published as easily on
    the Web as documents and pictures?
  • users of statistical data could search for
    relevant sources across the Web more or less in
    the same way as they are googling for relevant
    documents?
  • users of statistical data had a give me more
    data like this function at hand allowing them to
    locate data from disparate sources in order to
    create a time-series or do a comparison
  • published statistical data could be described in
    such a way that human users as well as software
    clients would know exactly what they mean and how
    they can be used
  • agreed languages were at hand allowing one
    statistical software system to talk to and
    exchange information with another
  • we had a Data Web allowing all of this to happen

3
Essentials
  • Open standards
  • Shared protocols
  • Partial agreements

4
Starting from the human end of the equation
  • The majority of users of statistical data have
    not been engaged in the creation of a dataset.
  • Statistical data will frequently be used for
    other research purposes than intended by the
    creators (secondary analysis).
  • Statistical data will frequently be used many
    years after they were created.
  • Users of statistical data are often comparing and
    combining data from a broad range of sources
    (across time and space).
  • In sum Statistical data are travelling. The
    distance between the end users of statistical
    data and the production process is normally long.

5
Metadata data about.....
6
The functions of (human readable) metadata
Finding
Understanding
Assessing
7
Towards the Semantic Web
  • From documents to data
  • From brainware to software
  • From machine readable to machine understandable
    information
  • Metadata the glue of the Semantic Web
  • A framework for knowledge representation RDF
  • The introduction of namespaces (allowing
    different system of terms and concepts to
    cohabitate in a single information system)
  • Partial understanding/agreement
  • The vision the creation of a dynamic framework
    facilitating cooperation/interoperability across
    domains and communities - gradually expanding the
    web of understanding.

The Semantic Web is an extension of the current
Web in which information is given well defined
meaning, better enabling computers and people to
work in cooperation. Tim Berners-Lee
8
What is Web services adding to the party
  • A simple down to earth architecture for
    distributed computing and cross system
    interoperability based on XML messaging and HTTP.
  • A mechanism (directory service) allowing clients
    to dynamically locate relevant services on the
    Web (Universal Description, Discovery and
    Integration Services, UDDI)
  • A way of describing (the interface of) remote
    objects that will allow clients to recognise and
    talk to them (Web Service Definition Language,
    WSDL)
  • An communication protocol supporting calls to
    remote objects (SOAP).

9
Requirements to metadata standards for global
information systems
  • Modularity the Lego? block principle
  • Extensibility allowing domains or application
    providers to add metadata elements without
    compromising the interoperability offered by the
    base schema
  • Refinement allowing domains or application
    providers to refine the use of a universal
    standard (making elements obligatory, restricting
    value domains, requiring the use of specific
    controlled vocabularies etc.)

10
The tension between the makers of standards and
the implementers
  • The main concern of standards makers is
    interoperability across domains, communities and
    systems (creating Empires of understanding)
  • The main concern of implementers is efficiency
    and relevance within an application
  • ...even in situations where interoperability is
    high on the agenda there will be plenty of
    reasons for breaking the standards
  • Therefore metadata standards aspiring for
    universal acceptance cannot insist on regulating
    every little detail the wider the acceptance
    the thinner the standard.

11
Metadata for traveling statistics where are we?
  • The Common Warehouse Metamodel (CWM) from OMG a
    model and syntax for the exchange of metadata
    for data warehousing and business intelligence
  • ISO 11179 a universal standard for describing
    data elements in a metadata repository
  • GESMES (and GESMES CB) a metadata model for the
    exchange of multidimensional data and
    time-series.
  • IQML, AskXML and Triple-S, metadata for the
    exchange of questionnaire data
  • SPSS MR data model
  • The Data Documentation Initiative (DDI) a
    general metadata standard for statistical data
    (micro as well as aggregated)

12
  • Established in 1995 to create a universally
    supported metadata standard for the social
    science community
  • Initiated and organised by the the
    Inter-University Consortium for Political and
    Social Research (ICPSR), Michigan, USA
  • Members coming from social science data archives
    and libraries in USA, Canada and Europe and from
    major producers of statistical data
  • First version of the standard expressed as an
    SGML-DTD
  • Translated to XML in 1997
  • DDI 1.0 published spring 2000
  • Extended to cover multidimensional data
    (cubes/tables) in 2001
  • Architectural reform process initiated 2003
  • Fast take-up in the core community and beyond

13
Characteristics
  • End-user perspective
  • provide the end-user with the information needed
    to locate relevant data sources and to use
    data-sources in a sound way
  • Initial emphasis on survey-data
  • developed to describe independent surveys on
    study, file and variable-level (rudimentary
    support for other types of data)
  • Emphasis on codebooks (survey-data dictionaries)
  • metadata seen as a complete book or document
  • Library-orientation
  • strong on catalogue information,
  • mapping to Dublin Core

14
Achievements
  • Acceptance
  • fast take-up in the community of data archives
    and data libraries world-wide
  • Community building
  • revitalised the co-operation and sharing of
    know-how and technologies among the archives and
    libraries
  • Strengthening of the ties between the data
    archiving and data producing communities
  • Software development

15
Nesstar - vision
To develop a truly distributed platform for
electronic publishing of statistical data,
building on object technology, open (metadata-)
standards and lightweight Internet protocols.
....or simply
To bring the models, technologies and collective
energy of the Web to the world of statistics.
16
Nesstar an overview
  • An architecture for a totally distributed virtual
    data library
  • The ability to locate multiple data sources
    across national boundaries
  • The ability to browse detailed information about
    these data sources
  • ..and to do data analysis and visualisation over
    the net
  • ..or to download the appropriate subset of data
    in one of a number of formats
  • Supporting standard micro-data as well as
    aggregated tables/cubes
  • Allowing the user to bookmark/hyperlink resources
    in the data and metadata repositories
  • searches
  • datasets
  • analysis (tables, models etc.)
  • ..and to hyperlink these resources from external
    Web-objects (like texts)
  • A system for imposing a variety of access control
    policies, including statistical disclosure
    control (SDC)
  • Powerful data preparation tools, including a
    system for remote publishing of data to NESSTAR
    servers

17
Nesstar - a fully distributed Data Web
18
Ongoing development
  • Integrated multilingual thesaurus support
  • Trend Wizard locating and harmonizing
    potentially comparable variables from disparate
    sources/sites in order to create trends and
    comparisons
  • Geo-referencing of statistical data facilitating
    geo-oriented resource location
  • Directory service (web service registry) allowing
    clients to automatically discover Nesstar servers
    and get a description of the services that they
    are providing.

19
Examples of use The World Bank
  • Using Nesstar to provide access to large amount
    of survey-data colected by the World Bank in
    various developing countries
  • Data to be used by internal reserachers to
    evaluate the effects of the Banks investments
  • A customised version of Nesstar fully integrated
    with the Banks intranet

Demo
20
Examples of use Black Country regional
observatory
  • Using Nesstar to build a regional observatory
    providing access to local data and knowledge
  • One of a series of obsrevatories set up to serve
    the community, local industry and the general
    public.
  • Fully based on the UK e-government standards
    (e.g. e-GMS)

Demo
21
Characteristics of the architecture
  • Fully distributed architecture with no "central
    server". Creating a resilient and scalable system
    with no single point of failure.
  • Well defined namespace. Every resource or
    operation has a corresponding URL. Operations can
    consequently be "bookmarked" and reapplied at a
    later time. The Nesstar URLs can be embedded in
    normal Web pages to provide Nesstar/Web
    integration.
  • Programming language independent protocol.
    Nesstar is implemented in Java and C/C but the
    protocol is XML and RDF based and fully language
    independent.
  • Integration by hyperlinking. The Nesstar objects
    can link to each other across server boundaries.
  • Object orientation. The Nesstar system is built
    according to an object-oriented principle,
    consisting of an extensible set of self
    describing components.

22
NEOOM Nesstar Object Oriented Middleware
  • All statistical objects live at a URL
  • Objects are self describing when a client
    access the URL of the object, the object returns
    a description of its current state (and its
    available methods) in RDF (using RDF as an
    Interface Description Language)
  • Remote object-oriented calls are performed by a
    simple protocol running on top of HTTP. The calls
    can be stored as a URL, specifying the location
    of the relevant object as well as the method
    parameters.
  • This allows for client side storage of
    statistical operations that easily can be rerun
    at a later stage thereby creating a simple batch
    language for operations on remote statistical
    objects.

23
Nesstar end-user client Runs on any
PC/workstation with any operating system that
can run a modern web-browser with standard
Javascript support. Recommended minimum hardware
configuration RAM 256 Mb Processor 800 Mhz
Software architecture
End-user client
Standard Java script enabled Web Browser
MS Internet Explorer 5.0 or Netscape/Mozilla
5.0
Nesstar Web engine
Using Cocoon 2.1 and Velocity 1.3
Web Client Application
Nesstar server Dedicated server running under MS
Windows 2000 or XP operating. Recommended minimum
hardware configuration RAM 2 Gb Processor 2 x
2.0 Ghz Harddisk 60Gb Nesstar Web engine can run
on top of the Nesstar server or reside on another
server machine with the same recommended minimum
configuration
Object cache
Proxy objects
HTTP Interface Servlet
RDF Class Interface Definitions
Web Server/Container Tomcat 4.1.24
BridgeRemote Bean
LocalBean
Percistence manager
J2EE compliant EJB Container Jboss 3.2.1
MVCSoft 1.1
Metadata database Oracle/MySQL/MS SQL-Server
Write a Comment
User Comments (0)
About PowerShow.com