Turning information into knowledge: the challenges of integrating diverse information sources Alex P - PowerPoint PPT Presentation

About This Presentation
Title:

Turning information into knowledge: the challenges of integrating diverse information sources Alex P

Description:

Experts in education, sociology, culture and media, semiotics, philosophy, knowledge management ... to design, build and evaluate systems, processes and ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 32
Provided by: Poulova
Category:

less

Transcript and Presenter's Notes

Title: Turning information into knowledge: the challenges of integrating diverse information sources Alex P


1
Turning information into knowledge the
challenges of integrating diverse information
sourcesAlex Poulovassilis, Birkbeck, U. of
LondonCo-Director of the London Knowledge Lab
2
The London Knowledge Lab
Institute of Education University of London
Birkbeck College University of London
purpose designed building Science Research
Infrastructure Fund 6m Research staff and
students 50 Location Bloomsbury Open June 2004
Social scientists Experts in education,
sociology, culture and media, semiotics,
philosophy, knowledge management ...
Computer scientists Experts in information
systems, information management, web
technologies, personalisation, ubiquitous
technologies
3
LKL mission
to understand how digital technologies and media
are transforming peoples relationships to
information, learning and culture at home, work
and play to design, build and evaluate systems,
processes and interfaces which enhance learning
throughout life to examine critically the
assumptions about knowledge and learning that
underlie the different uses of digital
technologies
The starting point for our mission is that
digital technologies and new media will change
how we learn, work, collaborate and communicate
4
LKL research themes
  • Our research is funded by projects from EU,
    EPSRC, ESRC, BBSRC,
  • JISC, Wellcome Trust currently about 25
    projects.
  • Four broad themes guide our work and inform our
    research
  • new forms of knowledge
  • turning information into knowledge
  • the changing cultures of new media
  • creating empowering technologies for formal and
    informal learning

5
New forms of knowledge
  • What do children and adults of the twenty-first
    century need to know?
  • How can we learn in new and more effective ways?
  • What kinds of knowledge are emerging in the
    knowledge economy?
  • How can this knowledge be made more accessible to
    more people?

6
Turning information into knowledge
  • The need to cope with ubiquitous, complex,
    incomplete and inconsistent information is
    pervasive in our societies
  • How can people benefit from this information in
    their learning, working and social lives ?
  • What new techniques are necessary for managing,
    accessing, integrating and personalising such
    information ?
  • How to design and build tools that help people to
    understand such information and generate new
    knowledge from it ?

7
The changing cultures of new media
  • What are differences and continuities between
    old media (books, film, TV) and new media
    (internet, computer games, mobile phones) ?
  • How do children and adults use these media in
    different contexts, both as consumers and
    produces ?
  • How are they learning in, and from, this
    convergent media environment ?
  • What are the implications of these developments
    for formal and informal learning ?

8
Creating empowering technologies for learning
  • How are equity, participation, learner autonomy,
    and the structuring of learning impacted by
    digital technologies and new media?
  • Which media-enhanced approaches can help people
    to learn and collaborate?
  • How can the Internet, and ambient and mobile
    technologies create new learning opportunities?

9
Turning information into knowledge information
integration
  • AutoMed (EPSRC)
  • developing tools for semi-automatic integration
    of heterogeneous
  • information sources
  • can handle both structured and semi-structured
    (RDF/S, XML) data
  • can handle virtual, materialised and hybrid
    integration scenarios
  • application in biological data integration,
    e-learning, p2p data integration
  • ISPIDER (BBSRC e-Science programme)
  • developing an integrated platform of proteomic
    data sources, enabled as
  • Grid and Web services
  • collaboration with groups at EBI, Manchester,
    UCL

10
The AutoMed Project
  • Partners Birkbeck and Imperial Colleges
  • Data integration based on schema
    equivalence/subsumption
  • Low-level metamodel, the Hypergraph Data Model
    (HDM), in terms of which higher-level data
    modelling languages are defined extensible
    therefore with new modelling languages
  • Provides a set of primitive equivalence-preserving
    schema transformations for higher-level
    modelling languages
  • addT(c,q) deleteT(c,q) renameT(c,n,n)
  • Also two more primitive transformations for
    imprecise integration scenarios
  • extendT(c,Range q q) contractT(c,Range q q)

11
Features of the AutoMed toolkit
  • Schema transformations are automatically
    reversible
  • addT/deleteT(c,q) by deleteT/addT(c,q)
  • extendT(c,Range q1 q2) by contractT(c,Range q1
    q2)
  • renameT(c,n,n) by renameT(c,n,n)
  • Hence bi-directional transformation pathways
    (more generally transformation networks) are
    defined between schemas
  • The queries within transformations allow
    automatic data and query translation
  • Schemas may be expressed in a variety of
    modelling languages

12
Schema transformation/integration networks
GS
id
id
id
id
id
US1
US2
USi
USn




LS1
LS2
LSi
LSn
13
Schema transformation/integration networks
(contd)
  • On the previous slide
  • GS is a global schema
  • LS1, , LSn are local schemas
  • US1, , USn are union-compatible schemas
  • the transformation pathways between each pair LSi
    and USi may consist of add, delete, rename,
    expand and contract primitive transformation,
    operating on any modelling construct defined in
    the AutoMed Model Definitions Repository
  • the transformation pathway between USi and GS is
    similar
  • the transformation pathway between each pair of
    union-compatible schemas consists of id
    transformation steps

14
AutoMed architecture
Schema and Transformations Repository (STR)
Wrapper
Schema Transformation and Integration Tools
Global Query Processor
Model Definitions Repository (MDR)
Global Query Optimiser
Model Definition Tool
Schema Evolution Tool
15
Other data integration approaches GAV LAV
  • Global-As-View (GAV) approach specify GS
    constructs by view definitions over LS constructs
  • Local-As-View (LAV) approach specify LS
    constructs by view definitions over GS constructs

16
Evolution problems of GAV and LAV
  • GAV does not readily support evolution of local
    schemas e.g. adding a new attribute to a source
    table may invalidate some of the global view
    definitions
  • In LAV, changes to a local schema impact only the
    derivation rules defined for that schema
  • But conversely LAV has problems if one wants to
    evolve the global schema since all the view
    definitions defining local schema constructs in
    terms of the global schema would need to be
    reviewed
  • These evolution problems are exacerbated in P2P
    data integration scenarios where there is no
    distinction between local and global schemas

17
AutoMed vs GAV/LAV/GLAV
  • AutoMed schema transformation pathways capture at
    least the information available from GAV and LAV
    rules
  • add/extend transformations correspond to GAV
    rules
  • delete/contract transformations correspond to LAV
    rules
  • Thus, GAV and LAV view definitions can be derived
    from a BAV network
  • GLAV rules e - e are also captured, by BAV
    transformations of the form add(T,e) del(T,e)
  • Thus, any reasoning or processing that is
    possible using GAV, LAV or GLAV is also possible
    using BAV

18
Schema Evolution in BAV
New Global Schema S
  • Unlike GAV/LAV/GLAV, BAV readily supports the
    evolution of both local and global schemas.
  • The evolution of a global or local schema is
    specified by a schema transformation pathway T
    from the old schema S to the new schema S
  • The transformation network and schemas can then
    be systematically repaired (rather than having to
    be redefined)

T
Global Schema S
New Local Schema S
Local Schema S
T
19
Global Query Processing
  • We handle query language heterogeneity by
    translation into/from a functional intermediate
    query language IQL
  • A query Q expressed in a high-level query
    language on a global schema S is first
    translated into IQL (this functionality is not
    yet supported in the AutoMed toolkit)
  • View definitions are derived from the
    transformation pathways between S and the
    requested data source schemas
  • These view definitions are substituted into Q,
    reformulating it into an IQL query over source
    schema constructs

20
Global Query Processing (contd)
  • Query optimisation and query evaluation then
    occur
  • During query evaluation, the evaluator submits to
    wrappers sub-queries that they are able to
    translate into the local query language.
    Currently, AutoMed supports wrappers for SQL,
    OQL, XPath, XQuery and flat-file data sources
  • The wrappers translate sub-query results back
    into the IQL type system
  • Further query post-processing then occurs in the
    IQL evaluator

21
Other AutoMed research at BBK
  • As well as virtual integration of data sources,
    we have investigated using AutoMed for
    materialised data integration i.e. a data
    warehousing approach
  • In particular, Hao Fan has worked on incremental
    view maintenance, data lineage tracing and schema
    evolution over AutoMed schema transformation
    pathways
  • Lucas Zamboulis has developed semi-automatic
    techniques for transforming and integrating
    heterogeneous XML data
  • In recent work he is investigating used
    correspondences to ontologies to enhance these
    techniques
  • Sandeep Mittal is working on update translation
    and update propagation along AutoMed pathways
    e.g. in P2P environments

22
Other AutoMed research at BBK (contd)
  • Dean Williams has been working on extracting
    structure from unstructured text sources
  • The aim here is to integrate information
    extracted from unstructured text with structured
    information available from other sources
  • Dean is using existing technology (the GATE tool)
    for the text annotation and IE part of this work
  • The information extracted from the text is
    matched with existing structured information to
    derive new instance data and perhaps also new
    schema fragments
  • AutoMed is being used for the schema and data
    integration aspects of this project

23
ISPIDER Project
  • Partners Birkbeck, EBI, Manchester, UCL
  • Aims
  • Vast, heterogeneous biological data
  • Need for interoperability
  • Need for efficient processing
  • Development of Proteomics Grid Infrastructure,
    use existing proteomics resources and develop new
    ones, develop new proteomics clients for
    querying, visualisation, workflow etc.

24
Project Aims
25
Project Aims
26
Project Aims
27
Project Aims
28
Project Aims
29
myGrid / DQP / AutoMed
  • myGrid collection of services/components
    allowing high-level integration of
    data/applications for in-silico experiments in
    biology
  • DQP
  • OGSA-DAI (Open Grid Services Architecture Data
    Access and Integration)
  • Distributed query processing over OGSA-DAI
    enabled resources
  • Ongoing research
  • AutoMed / DQP interoperability
  • AutoMed / myGrid interoperability

30
DQP / AutoMed interoperability
  • Data sources wrapped with OGSA-DAI
  • AutoMed OGSA-DAI wrappers extract data sources
    metadata
  • Semantic integration of data sources using
    AutoMed transformation pathways into an
    integrated AutoMed schema
  • IQL queries submitted to this integrated schema
    are
  • Reformulated to IQL queries on the data sources,
    using the AutoMed transformation pathways
  • Submitted to DQP for evaluation

31
Ongoing and future research
  • Heterogeneous data integration in Grid and P2P
    environments, with bioinformatics and e-learning
    as example application domains
  • Flexible combinations of virtual, materialised or
    hybrid integration
  • Flexible query processing in imprecise
    integration scenarios
  • P2P query processing over BAV pathways
  • P2P update processing over BAV pathways
  • Use of ECA rules and a P2P ECA rule execution
    engine for flexible update processing and data
    sharing
Write a Comment
User Comments (0)
About PowerShow.com