Data%20and%20Knowledge%20Evolution - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Data%20and%20Knowledge%20Evolution

Description:

Data and Knowledge Evolution Giorgos Flouris fgeo_at_ics.forth.gr Open Data Tutorials, May 2013 Slides available at: http://www.ics.forth.gr/~fgeo/Publications/WOD13.ppt – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 171
Provided by: Gior91
Learn more at: http://users.ics.forth.gr
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data%20and%20Knowledge%20Evolution


1

Data and Knowledge Evolution
Giorgos Flourisfgeo_at_ics.forth.gr
Open Data Tutorials, May 2013
Slides available at http//www.ics.forth.gr/fge
o/Publications/WOD13.ppt
2
World Wide Web
  • WWW (and HTML) focus on human readability
  • Page presentation (fonts, colors, images, )
  • Human understanding
  • Presentation ? Semantical content
  • Content is not formally described (for a machine
    to understand)
  • WWW contains documents, not data

3
Problems with the Current Web
  • Search and access becomes difficult
  • Software ignorant of the semantical content of a
    web page
  • Keyword search
  • High recall, low precision
  • Terminological issues
  • Synonyms (heart disease cardiac disease)
  • Hyponyms/hypernyms (parliament members are
    politicians)
  • Queries on the semantical content cannot be made
  • Fetch articles that support B. Obamas foreign
    policy
  • Fetch the home pages of all members of the Greek
    Parliament

4
Semantic Web
  • The Semantic Web is an extension of the current
    web in which information is given well-defined
    meaning, better enabling computers and people to
    work in cooperationBLHL01
  • The Semantic Web provides a common framework that
    allows data to be shared and reused across
    application, enterprise, and community boundaries
    http//www.w3.org/2001/sw/
  • Semantic Web is a collaborative effort led by
    W3C with participation from a large number of
    researchers and industrial partnershttp//www.w3.
    org/2001/sw/

5
Semantic Web in Practice
  • Web of data, rather than documents
  • HTML for presentation
  • Semantical languages for semantical content
  • Readable and understandable by humans and
    machines
  • Semantic Web languages, protocols, etc
  • Web page annotation (metadata descriptions etc)
  • Publication of data on the Internet
  • Efficient communication and manipulation of data
    over the Internet
  • Different applications
  • Efficient searching
  • Sharing of data (e-science, e-government, remote
    learning, )
  • Linked Open Data (more on that later)

6
Ontologies and Data (Datasets)
  • An ontology is an explicit specification of a
    shared conceptualization of a domain Gru93
  • Precise, logical account of the intended meaning
    of terms
  • Common (shared) interpretation of terms
  • Formal vocabulary for information exchange
    (humans/machines)
  • Ontologies (vocabularies) allow the description
    of data
  • Terminology
  • Ontology vocabulary schema
  • Data instances
  • Dataset data and the related ontology (i.e., a
    dataset may contain schema and/or data)

7
Dataset Dynamics
  • Datasets change constantly
  • World changes (dynamic models)
  • View on the world changes (new knowledge,
    measurements, etc)
  • Perspective and usage changes
  • Example
  • Gene Ontology (information about gene products)
    daily versions
  • DBPedia 1,4 updates/second (http//live.dbpedia.o
    rg/LiveStats/) MLA12
  • Need methodologies to cope with the problems
    related to dynamicity
  • Evolution (modify a dataset in response to a
    change)
  • Versioning (keep track of versions and their
    relations)
  • Debugging, cleaning, repairing, quality (maintain
    consistency and quality in a dynamic environment)
  • Change monitoring, detection and propagation
    (identify changes and use them to synchronize
    remote datasets)

8
Linked (Open) Data
  • Datasets can be interlinked
  • Sharing knowledge
  • Reusing knowledge
  • Modular development
  • Reuse of schemas
  • Linked Open Data (LOD) movement
  • Constantly growing
  • 31 billion triples and 295 datasets as of
    September 2011

9
Linked Open Data Cloud Diagram
10
Linked Open Data Challenges
  • Both a blessing and a curse
  • Added-value benefits
  • Discovery of unknown correlations, connections,
    relationships
  • Vast amount of interrelated knowledge
  • No central control, everyone can publish and
    relate to others
  • Quality of datasets lies/depends on different
    providers
  • A change in one dataset affects all related ones
  • Several new problems related to dynamics
  • Propagation of changes among interrelated
    datasets
  • Maintaining the quality of local datasets
  • Co-evolution

11
Scope Dynamic Linked Datasets
You are here
Dynamic Datasets
LinkedDatasets
12
Purpose of This Talk
  • To survey different research areas related to
    dynamic LOD
  • Remote Change Management
  • Repair
  • Data and Knowledge Evolution
  • Categorize and classify works in each field
  • Broad but shallow description
  • Several references for more in-depth study
  • No claims of completeness (references are just
    indicative)
  • Two relevant surveys FMK08, ZAA13
  • Emphasis on some related work done in FORTH
  • Will avoid technical discussion
  • References will be given for further details

13
Defining Remote Change Management
  • Managing the effects of remote changes on
    interlinked datasets
  • Remote changes have profound effects on local
    datasets
  • Good practices are important
  • Proper versioning, change logging, adaptation to
    remote changes,
  • Attention exploded after the success of the LOD
    paradigm
  • Related research questions
  • How should I version my data?
  • How can I efficiently monitor changes in my
    dataset?
  • How can I detect changes in remote datasets?
  • How does the evolution of remote datasets affect
    my data?
  • How can I efficiently propagate changes from one
    dataset to another?

14
Remote Change Management Visualization
Remote Site
Versioning, Change Monitoring
Change Detection
Local Site
Change Propagation
15
Remote Change Management Structure
  • Three subfields
  • Versioning
  • Change monitoring and detection
  • Change propagation
  • Structure
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13

16
Defining Repair
  • Assessing and improving the quality and the
    semantical or structural integrity of the data
  • Maintaining consistency, coherency, validity
  • Restoring consistency, coherency, validity, when
    violated
  • Assessing and improving quality
  • Preserve quality/integrity in the face of remote
    changes
  • Related research questions
  • How can I preserve the integrity and quality of
    my data in a dynamic and interlinked environment?
  • How can I guarantee consistency and validity?
  • How can I restore consistency and validity, if
    violated?

17
Repair Visualization
Repair Process(Cleaning, Debugging, Repairing,
Quality Enhancement)
Assessment Module (Diagnosis, Quality Assessment)
18
Repair Structure
  • Four subfields
  • Cleaning
  • Debugging
  • Validity repair
  • Quality enhancement
  • Structure
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11

19
Defining Evolution
  • Modifying a dataset in response to a change in
    the domain or its conceptualization
  • Identify the result of applying new information
    on the dataset
  • Determine the result of change propagation from
    remote datasets
  • Understand the process of change
  • Related research questions
  • What is the semantics of evolution and change?
  • How can I efficiently compute the ideal evolution
    result?

20
Evolution Visualization
Real World
EvolutionAlgorithm
Delete_Class()Pull_Up_Class()Rename_Class()
Dataset
21
Evolution Summary
  • Evolution topics
  • Understanding the evolution challenges
  • Understanding the process of change
  • Balancing between philosophical and practical
    considerations
  • Cross-fertilization with belief change
  • Structure
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review

22
General Structure of this Talk
  • Introduction to RDF/S, DLs, OWL
  • Remote change management
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13
  • Repair
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11
  • Data and Knowledge Evolution
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review
  • The final few slides contain citations for the
    references in this talk

Part I(2 hours)
Part II(1 hour)
23
Talk Structure (A)
  • Introduction to RDF/S, DLs, OWL
  • Remote change management
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13
  • Repair
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11
  • Data and Knowledge Evolution
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review

24
Datasets
  • Basic structures
  • Classes (or concepts) collections of objects
    (e.g., Actor, Politician)
  • Properties (or roles) binary relationships
    between objects (e.g., started_on, member_of)
  • Instances (or individuals) objects (e.g.,
    Giorgos, B. Obama)
  • Relations between them
  • Subsumption (Parliament_Member subclass of
    Politician), instantiation (B. Obama instance of
    Politician),
  • The allowed relations and their semantics depend
    on the language
  • Different representation languages for LOD
  • RDF/S, OWL

25
Visualization, Triples, Serialization
Visualization
Triple Representation
Serialization (RDF/XML)
Period
ltrdfsClass rdfIDPeriodgt lt/rdfsClassgt
ltrdfProperty rdfIDparticipantsgt
ltrdfsdomain rdfresourceOnset/gt ltrdfsrange
rdfresourceActor/gt lt/rdfPropertygt ltG_Birth
rdfabout Birthgt ltparticipantsgt ltGiorgos
rdfabout Actor/gt lt/participantsgt lt/G_Birthgt ltrdfs
Class rdfIDEventgt ltrdfssubClassOf
rdfresourcePeriod/gt lt/rdfsClassgt
Define classes Period type Class Define
properties participants type Property participa
nts domain Onset participants range
Actor Instantiate/define individuals G_Birth
type Birth Giorgos type Actor G_Birth
participants Giorgos Define hierarchies Event
subClass Period
Actor
Event
participants
started_on
Onset
Existing
Stuff
Birth
participants
Giorgos
G_Birth
26
RDF and RDFS
  • An RDF dataset consists of triples
  • RDFS adds semantics
  • Subsumption hierarchies (classes and properties)
  • Transitive
  • Instantiation
  • Inheritance, implicit instantiation
  • Sometimes more than subsumption/instantiation is
    needed
  • Combining concepts, roles to form more complex
    relations
  • Concept definitions a mother is a female who has
    a child
  • Other knowledge all items stored in warehouse X
    are flammable
  • Constraints on data
  • Each person must have one mother

27
Extensions of RDF/S DLs (1/2)
  • Description Logics (DLs)
  • http//dl.kr.org/
  • Formal underpinning of web representation
    languages
  • Family of logical formalisms
  • Well-defined semantics
  • Model-theoretic reasoning based on
    interpretations
  • Formally studied
  • Expressiveness, reasoning tools, computational
    complexity,
  • Components
  • Individuals specific objects (instances)
    Giorgos
  • Concepts sets of individuals (classes) Parent
  • Roles sets of pairs of individuals (properties)
    has_child
  • Operators ?, ?, ?, ., ?,
  • Connectives ?, ,

28
Extensions of RDF/S DLs (2/2)
  • Definitions, partial definitions, constraints,
    subsumptions,
  • A mother is a female who has a child
  • Mother ?has_child ? Female
  • Each person must have one mother
  • Person ? ?has_child-1.Mother
  • A great variety of DLs (trade-off involved)
  • Different properties
  • Different expressive power
  • Different reasoning complexity

29
Extensions of RDF/S OWL
  • OWL (Web Ontology Language)
  • http//www.w3.org/2004/OWL/
  • General-purpose representation language
  • Compatible with the architecture of the Semantic
    Web
  • A family of languages
  • Flavors OWL-Lite, OWL-DL, OWL Full
  • Profiles OWL 2 EL, OWL 2 QL, OWL 2 RL
  • Different expressiveness (and complexity)
  • Each corresponds to a specific DL
  • Useful from a modeling perspective
  • Expressive but not too complex
  • Appealing computationally

30
Representation Languages in LOD
  • Mostly RDF
  • With RDFS semantics
  • Instantiations
  • Class subsumption
  • Property subsumption is rare
  • Some OWL
  • Mostly OWL Lite
  • Extensive use of owlsameAs
  • Often abusing it HHM10
  • OWL 2 profiles are gaining ground

31
Talk Structure (B1)
  • Introduction to RDF/S, DLs, OWL
  • Remote change management
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13
  • Repair
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11
  • Data and Knowledge Evolution
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review

32
Motivation for Remote Change Management
  • Crucial problem for dynamic linked datasets
  • Linking datasets linked to other datasets (e.g.,
    vocabularies)
  • Dynamics changes cause problems to linked
    datasets
  • No central curation or control
  • No control over (or knowledge of) other datasets
    evolution process
  • Curators dont bother annotating and logging
    changes
  • Temporal and versioning information is usually
    missing RPH12
  • Remote change management seeks solutions to
    allow
  • Keeping track of versions
  • Restoring previous versions
  • Assessing compatibility of versions
  • Monitoring and detecting changes
  • Tracing back the evolution history (of datasets,
    concepts, )
  • For visualization and understanding
  • Propagating changes to synchronize linked datasets

33
Subfields of Remote Change Management
  • Remote Change Management
  • Versioning
  • Keep track of versions
  • Change monitoring and detection
  • Monitoring record changes as they happen
  • Detection identify changes after they happen
  • Change propagation
  • Propagate changes across linked datasets for
    synchronization purposes

34
Versioning
  • Versioning
  • Keep track of versions
  • Identify different versions of a dataset
  • Enable transparent access to the correct
    version (smooth interoperation)
  • Issues involved
  • Identification
  • Determine which versions to store and how to
    identify them
  • Manually or automatically (syntactical,
    semantical considerations)
  • Packaging of changes
  • Relation between versions
  • A sequence or a tree
  • Compatibility information
  • Backwards/forwards compatibility and how to
    determine it (often manually)
  • Dataset-wide compatibility or fine-grained
    compatibility (e.g., at resource level)
  • Metadata on the different versions
  • Transparent access
  • Relate versions with (compatible) data sources,
    applications etc

35
Change Monitoring and Detection
  • Change monitoring
  • Record changes as they happen
  • Manual (error-prone and often incorrect)
  • Automatic (not used in practice)
  • In the good will of the dataset owner
  • Sometimes change logs are inaccessible
  • Change detection
  • Identify changes after they happen
  • Based on the previous and current versions
  • In both cases, a change language is required
  • Supported set of changes, along with their
    semantics
  • Can be low-level or high-level

36
Change Propagation
  • Change propagation
  • Communicate changes to linked datasets for
    synchronization
  • Push-based or pull-based propagation
  • Push-based locally-initiated, via registration
    or via monitoring and versioning
  • Pull-based consumer-initiated
  • Communication based on deltas (rather than
    versions)
  • Reduce communication overhead
  • Reduce storage requirements
  • On average, 2-3 of a dataset changes between
    versions OK02
  • Deltas are based on a language of changes

37
Talk Structure (B2)
  • Introduction to RDF/S, DLs, OWL
  • Remote change management
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13
  • Repair
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11
  • Data and Knowledge Evolution
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review

38
Versioning Approaches (1/3)
  • Capture different aspects of versioning, such as
  • Detecting versions
  • Storing versions efficiently
  • Allow cross-snapshot queries
  • Find gene products whose functions have not
    changed in the last 50 versions
  • Determine price fluctuation for x along different
    versions of the product catalog
  • Early versioning approaches inspired by SVN
  • Good for files, not directly adaptable to
    semantical languages
  • SHOE language HH00
  • Machine-readable version information (e.g.,
    compatibility)
  • Provided by curator as SHOE statements
  • Memento SSN10
  • Fine-grained versioning at URI level (resources,
    web pages)
  • Machine-readable version information, in the HTTP
    header
  • Timestamps, traversal information (prior/current
    versions) etc

39
Versioning Approaches (2/3)
  • Theoretical foundations for versioning HP04
  • Formal definitions to capture notions such as
  • Compatibility (between versions)
  • Commitment (resources committing to a certain
    ontology)
  • Ontology perspectives (the part of the web
    committing to an ontology)
  • Temporal approaches HS05, PTC05, KLGE07
  • For capturing temporal relations between versions
  • For allowing cross-snapshot queries
  • Versioning in multi-editor environments RSDT08
  • Via change monitoring

40
Versioning Approaches (3/3)
  • Automatically detecting version relationships
    AAM09
  • Using heuristics based on URIs
  • Study of relatedness between versions CQ13
  • A model of relatedness between vocabularies
    from various sources
  • Similar to links in web pages
  • POI Partial Order Index TTA08
  • Efficient method for storing versions and their
    differences
  • Stores several versions, exploiting their common
    triples for efficient storage

41
Change Languages (1/2)
  • Change languages necessary for monitoring,
    detection, propagation
  • Granularity
  • Low-level (or atomic, or elementary)
  • Simple add/remove operations
  • Add(s,p,o), Delete(s,p,o)
  • Simple to detect and define
  • Focus on machine-readability determinism,
    well-defined semantics
  • High-level (or complex, or composite)
  • More coarse-grained, compact, closer to editors
    perception and intuition
  • Generalize_Domain(P,A), Delete_Class(A)
  • More interesting harder to detect and define
  • Focus on human-understandability often unclear
    and/or informal semantics

42
Change Languages (2/2)
  • Many different high-level languages (no standard)
  • HGR12, JAP09, PFF13, SK03, AH06, DA09, PTC07,
  • Some are domain-specific (e.g., HGR12)
  • Some are dynamic (e.g,, AH06, DA09, PTC07)
  • Allow custom, user-defined changes
  • Some allow terminological changes (e.g.,
    PFF13)
  • Rename, merge, split
  • Common, but tough to detect (easily confused with
    add/delete)

43
Representation Issues
  • Deltas are just sets of changes from the change
    language
  • Changes usually represented using a change
    ontology
  • Ontology represents changes
  • A specific change is an instance of such an
    ontology
  • Deltas associated with sets of such instances
  • Different proposals NCLM06, KFKO02, KN03, PT05
  • Allows the manipulation and communication of
    deltas/changes using standard Semantic Web
    technologies

44
Change Monitoring Approaches
  • Using a version log PT05
  • Logging actions on the dataset
  • Use it for change detection, as well as proper
    versioning
  • Good quality, high-level change monitoring
  • Based on a dynamic language of changes
  • Using migration specifications ZZL03
  • Similar to logs, but with a more formal structure
  • DBPedia change monitoring MLA12
  • http//live.dbpedia.org/
  • Live versions, as opposed to standard versions

45
Low-Level Change Detection (1/2)
  • SemVersion VWS05
  • Developed in Karlsruhe (FZI, AIFB)
  • Low-level change detection tool for RDF
  • Provides also versioning functionalities
  • Allows cross-snapshot queries
  • For RDF ILK12
  • Low-level change detection based on set
    difference
  • Aggregating and compressing deltas
  • Also dealing with versioning issues
  • For RDF/S ZTC11
  • Takes into account semantics (RDFS inference)
  • Four different methods to compute deltas (all
    based on set difference)
  • Formal analysis of these methods properties and
    semantics
  • Extension effect of blank nodes on change
    detection TLZ12

46
Low-Level Change Detection (2/2)
  • Bubastis (http//www.ebi.ac.uk/fgpt/sw/bubastis/in
    dex.html)
  • Simple diff tool (triple-based comparison)
  • Basically RDF, but also supports OWL
  • For DL-Lite KWZ08
  • Formal, semantical approach
  • For EL KWW08
  • Uses a concept-based description of changes
  • For propositional knowledge bases FMV10
  • Propositional, but generic it can be applied to
    DLs
  • Formal analysis of the problem
  • Also dealing with propagation semantics

47
High-Level Change Detection (1/2)
  • For OWL PromptDiff NKKM04, OntoView KFKO02
  • Employ heuristics and probabilistic methods
  • Evaluation using precision/recall metrics against
    a gold standard
  • Integrated into tools that also provide
    versioning functionalities
  • For RDF/S PFF13
  • Dealing with both machine-readability and
    human-understandability
  • Also dealing with propagation (applying changes)
  • To be discussed in detail later
  • COnto-Diff HGR12
  • Rule-based approach
  • Also dealing with propagation

48
Change Propagation Approaches
  • Usually part of other tools SMMS02, MMS03
  • Versioning, monitoring tools (push-based
    propagation)
  • Detection tools (pull-based propagation)
  • Evolution and repair tools (pull-based
    propagation)
  • Adapt your data to be compatible with the new
    remote version
  • SparqlPush PM10
  • Push-based propagation of changes on SPARQL
    views
  • PRISM, PRISM CMZ08, CMDZ10
  • High-level language of schema changes for
    relational data
  • Also supports changes on the integrity
    constraints
  • Identifies and propagates the changes required in
    the data for abiding to the new schema
  • Query and update rewriting
  • For applications that try to access the old schema

49
Other Change Management Approaches
  • Complete approach for XML SP10
  • Representing changes inline with the data using a
    graph (evograph)
  • Supports different change representation
    languages (both low-level and high-level)
  • Timestamps changes
  • Monitoring evograph can be used to log the
    changes
  • Propagation changes can be accessed and
    propagated
  • Versioning timestamps in changes can be used to
    generate snapshots (versions) at different times
  • Allows cross-snapshot queries
  • Fairly generic, can be adapted for RDF

50
Talk Structure (B3)
  • Introduction to RDF/S, DLs, OWL
  • Remote change management
  • Introduction, definition of subfields
  • Literature review
  • An approach for change detection PFF13
  • Repair
  • Introduction, definition of subfields
  • Literature review
  • An approach for validity repair RFC11
  • Data and Knowledge Evolution
  • Introduction, connection with belief change
  • Understanding the process of change
  • Literature review

51
Our Approach on Change Detection
  • Purpose of this work change detection PFF13
  • A posteriori detect the differences (delta or
    diff) between versions in a concise, intuitive
    and correct way
  • Main design choices
  • Change detection based on a general-purpose
    high-level language
  • Human-understandable, but also machine-readable
  • Clear, formal semantics
  • Provable formal properties and functionality
    guarantees
  • Detection and application (propagation) semantics

52
Sample Evolution
Version 1 (V1)
Version 2 (V2)
Period
participants
Actor
Event
Actor
Event
started_on
Persistent
Onset
Birth
participants
Evolution
started_on
Onset
Existing
Stuff
Stuff
Birth
participants
G_Birth
Giorgos
participants
Giorgos
G_Birth
53
Analyzing the Evolution (Using Triples)
  • Triples in V1 (partial list)
  • Event type Class
  • Period type Class
  • Event subclass Period
  • participants type Property
  • participants domain Onset
  • participants range Actor
  • Giorgos type Actor
  • Existing type Class
  • Stuff subclass Existing
  • started_on domain Existing
  • Onset subclass Event
  • Birth subclass Onset
  • Triples in V2 (partial list)
  • Event type Class
  • participants type Property
  • Event domain participants
  • participants range Actor
  • Giorgos type Actor
  • Persistent type Class
  • Stuff subclass Persistent
  • started_on domain Persistent
  • Onset subclass Event
  • Birth subclass Event

54
Low-Level Delta
  • Triples in V2 but not in V1(added triples)
  • Event domain participants
  • Persistent type Class
  • Stuff subclass Persistent
  • started_on domain Persistent
  • Birth subclass Event
  • Triples in V1 but not in V2(deleted triples)
  • Period type Class
  • Event subclass Period
  • participants domain Onset
  • Existing type Class
  • Stuff subclass Existing
  • started_on domain Existing
  • Birth subclass Onset

Low-Level Delta Add(Event domain
participants)Add(Persistent type
Class) Del(Period type Class)
55
Analyzing the Evolution (Visually)
Version 1 (V1)
Version 2 (V2)
Period
participants
Actor
Event
started_on
Actor
Event
Persistent
Onset
Birth
participants
Evolution
started_on
Onset
Existing
Stuff
participants
G_Birth
Giorgos
Stuff
Birth
High-Level Delta Generalize_Domain(participants,
Onset, Event) Pull_Up_Class(Birth, Onset,
Event) Delete_Class(Period, Ψ, Event, Ψ, Ψ, Ψ,
Ψ) Rename_Class(Existing, Persistent)
participants
Giorgos
G_Birth
56
Comparing the Deltas
Version 1 (V1)
Version 2 (V2)
Period
participants
Actor
Event
started_on
Actor
Event
Persistent
Onset
Birth
participants
Evolution
started_on
Onset
Existing
Stuff
participants
G_Birth
Giorgos
Stuff
Birth
participants
Giorgos
G_Birth
Low-level delta
High-level delta
57
Associations (Partitioning)
Low-Level Changes Associated High-Level Changes
Del(participants domain Onset) Generalize_Domain(participants, Onset, Event)
Add(participants domain Event) Generalize_Domain(participants, Onset, Event)
Del(Birth subclass Onset) Pull_Up_Class(Birth, Onset, Event)
Add(Birth subclass Event) Pull_Up_Class(Birth, Onset, Event)
Del(Period type Class) Delete_Class(Period, Ψ, Event, Ψ, Ψ, Ψ, Ψ)
Del(Event subclass Period) Delete_Class(Period, Ψ, Event, Ψ, Ψ, Ψ, Ψ)
Del(Existing type Class) Rename_Class(Existing, Persistent)
Del(Stuff subclass Existing) Rename_Class(Existing, Persistent)
Del(started_on domain Existing) Rename_Class(Existing, Persistent)
Add(Persistent type Class) Rename_Class(Existing, Persistent)
Add(Stuff subclass Persistent) Rename_Class(Existing, Persistent)
Add(started_on domain Persistent) Rename_Class(Existing, Persistent)
58
Challenges for High-Level Languages
  • High-level deltas are superior
  • More concise (e.g., Rename_Class)
  • More intuitive (e.g., Pull_Up_Class)
  • Carry additional information (e.g.,
    Generalize_Domain)
  • Challenges for high-level languages
  • Must be deterministic (exactly one high-level
    delta)
  • Must be fine-grained enough to capture subtle
    changes
  • Must be coarse-grained enough to be concise
  • Must be intuitive and close to editors
    perception of the changes
  • Compatible detection and application algorithms
  • Intuitive results
  • Efficient

59
Proposed Language L
  • The formal definition of a change consists of
  • Changes required in the low-level delta
    (added/deleted triples)
  • Conditions that should hold in V1 and/or V2
  • Generalize_Domain(P, X, Y)
  • Del(P domain X)
  • Add(P domain Y)
  • P existing property in both V1, V2
  • X, Y existing classes in both V1, V2
  • X subclass of Y in both V1, V2
  • Generalize_Domain(participants, Onset, Event)
    detectable
  • Similarly for the other changes in L (132
    high-level ones)

60
Types and Number of Defined Changes
Changes (134)
Low-Level (2)
High-Level (132)
AddDel
Basic(54)
Composite(51)
Heuristic (27)
Delete_Subclass Delete_Domain
Pull_Up_Class Change_Domain
Rename_Class Split_Class
61
Results on L Granularity
  • Granularity problem solved by defining levels of
    changes
  • Basic Changes fine-grained, roughly correspond
    to low-level
  • Composite Changes coarse-grained, group several
    basic changes together
  • Heuristic Changes based on heuristics, necessary
    for Rename, Merge, Split etc require mappings
    between URIs
  • Problems with determinism
  • One evolution could correspond to different sets
    of basic/composite changes
  • Priorities in detection
  • Heuristic ? Composite ? Basic

62
Results on L Determinism
  • Each low-level change is associated with exactly
    one detectable high-level change
  • Full partitioning of low-level changes into
    high-level ones
  • Each pair of versions (V1, V2) is associated
    with
  • Exactly one low-level delta
  • Exactly one high-level delta
  • Determinism is necessary
  • More than one would lead to ambiguities
  • Less than one would make some inputs (V1, V2)
    irresolvable

63
Results on L Propagation
Version 1 (V1)
Version 2 (V2)
Period
participants
Actor
Event
Actor
Event
Detect C
started_on
Persistent
Onset
Birth
participants
started_on
Apply C
Onset
Existing
Stuff
Apply C-1
Stuff
Birth
participants
G_Birth
Giorgos
participants
Giorgos
G_Birth
64
Results on L Deltas Keep Version History
  • Can reproduce all versions as long as you keep
    (any) one version and the deltas
  • Deltas are more concise than the versions
    themselves
  • Storage and communication efficiency

65
Change Detection Evaluation
  • Detection and application algorithms implemented
    for evaluation
  • Performance
  • Complexity O(maxN1,N2,N2)
  • Performance depends on the detected changes
    (type, number)
  • Bottleneck calculating the low-level delta (gt80
    of total time)
  • Intuitiveness
  • Changes in our language are used in practice
  • Results confirmed by literature/editor notes
    (CIDOC, GO)
  • Better than CIDOCs manually recorded changes (18
    changes missed)
  • Conciseness
  • Basic
About PowerShow.com