Title: Simplifying Family History Research for the Nave User: Building an Ontology and Expert Logic for Sea
1Simplifying Family History Research for the Naïve
User Building an Ontology and Expert Logic for
Searching Danish Genealogical Primary Records
- By
- Charla Woodbury
- June 13, 2005
2Real User Problem
- A person decides to do family history research
for the first time on their Danish family lines. - Where do they go?
- What records do they look for?
- How do they handle records in Danish?
- How can they tell when the records they have
match their search family? -
3Problem
- Semantic web tools - Expanded to specialized
domain expertise - SMART websites
- Automatically link to best information
- Make the user an expert
- HELP
- ANTICIPATE
- GUIDE
- TRAIN
4Solution
- Use an ontology with lexicons and description
logic to - Extract the correct matching primary records
- Compute feast dates and birth dates from age at
death - Match names and families
5Methods
- Preparing for the records extraction
- Producing results listing
- Evaluating the methodology
6Preparing for Records Extraction
- Ontology Building at the Entity Level
- Annotating Primary Record Websites
- Building Research Tools Inside the Ontology
- Logic and Reasoning inside the Ontology
71 Ontology Entity Level
8ONTOLOGY ENTITIES
- FIND and MARK UP relevant web pages by
- NAME
- DATE
- PLACE
- RELATIONSHIP
- OCCUPATION
- RECORD_TYPE
- SOURCE
9Danish GIVEN NAME LEXICONAdd synonyms and
thesaurus
- MALE
- Anders And.
- Andreas
- Christen Kristen
- Christian Kristian
- Erik Eric
- Gregers
- Hans
- Ib Jep Jeppe
- Jacob
- Jens
- Johan Johannes Joh.
- Jorgen Jørgen
- Knud
- Lars Laurs Laurids Lauritz
- Mads Mats - Mats
- FEMALE
- Ane Anna Anne
- Birthe Birte
- Bodil
- Caroline
- Dorthe Dorte
- Ellen -Helene -Elene
- Elisabeth Elsbeth Lisbeth
- Else Ilse
- Ingeborg
- Inger
- Karen
- Kirsten Christen Kirstine Christine Kirstine
Chirstine - Malene
- Maren
10DATE Lexicon Adds Thesaurus of Synonyms
- MONTHS
- January Jan Januar -11br
- Februrary Feb Februar -12br
- March Mar Marts
- April Apr Apl
- May Mai
- June Jun Juni
- July Jul Juli -5br
- August Aug Augst -6br
- September Sep Sept -7br Septembre
- October Oct -8br Octobre
- November Nov -9br Novembre
- December Dec -10br
- TIME
- Year yr aar år
- Month mo maaned m.
- Week uge ug.
- FEAST DATES
- Easter Paaske Påske Paasche Påsche P.
- Pentecost Pent Pinse -Pin
- Trinity Tr Trin Trinitatis
- DAYS OF WEEK
- Sunday Sun Dominico Dom.
- Monday Mon Mondag Mond.
- Tuesday Tue Tirsdag Tirsd.
- Wednesday Wed -Onsdag Onsd.
- Thursday Thur Tørsdag Tørsd.
- Friday Fri Fredag Fred.
- Saturday Sat Lørsdag Lørs
112 Annotating Primary Record Websites
- Colors are used to represent the mark-ups
12Web Page
- SOURCE URL -Tvilum Sogne Kirkebog
- PAGE HEADER Fødde 1751 3
- BODY Truust Dom. 23 p Trinit laest over
Niels Baches SØREN fadd. Johannes Michelsens og
Niels Mollers hustruer af Søebyevad, Peder
Rasmussen af Søebyevad, Jens Bachis søn Peder og
Niels Thylkes s. Peder af Truust
13ONTOLOGY ENTITIES
- FIND and MARK UP relevant web pages by
- NAME
- DATE
- PLACE
- RELATIONSHIP
- OCCUPATION
- RECORD_TYPE
- SOURCE
14Annotated Web Page
- SOURCE -Tvilum Parish Register
- PAGE HEADER Fødde 1751 3
- BODY Truust Dom. 23 p Trinit laest over
Niels Baches SØREN fadd. Johannes Michelsens og
Niels Mollers hustruer af Søebyevad, Peder
Rasmussen af Søebyevad, Jens Bachis søn Peder og
Niels Thylkes s. Peder af Truust
153 Building Research Tools Inside the
Ontology
- Conversion functions
- Matching different name forms
- Matching place names to appropriate records
16CONVERSION FUNCTIONSinside the ontology
- Compute birthdate from age at death
- Death 22 Mar 1743
- Age - 23 yr 2 m
- - BIRTH Jan 1720
- Compute dates from feast dates
- Sunday 23rd after Trinity 1751
- - 14 Nov 1751
17Match different name forms as ONE PERSON
- Uses lexicon to determine different forms of the
same name - JENS PEDERSEN
- JENS PEDERSEN BACH
- JENS BACH
- JENS BACHIS
18PLACES - County Map of DENMARK
19Parish and District Map of SKANDERBORG
20Matching Places to Records
21Logic and Reasoning inside the Ontology
- Correct family placement of primary records -
This is a logic and reasoning knowledge base
which applies rules to determine that - Names of the children follows common naming
practices - High percentage of the witnesses match
individuals in the family knowledge base
22Naming Practices
- Male children are named in this order
- occasional Mothers previous husband
- Fathers father
- Mothers father
- Father
23Knowledge Base Points out deviations of naming
practices
- Father
- FathersFather
- Mother
- MothersFather
- MothersPrevHusband
- Son1
- Son2
- Son3
- Son4
- LARS Andersen
- ANDERS Pedersen
- Maren Jensen
- JENS Olesen
- HENRICH Sorensen
- HENRICH Larsen
- ANDERS Larsen
- JENS Larsen
- LARS Larsen
24Witness Match Knowledge Base
- PURPOSE -Correct Family Placement
- Description logic knowledge base
- CHILD
- PARENT
- SPOUSE
- SIBLING
- Match christening record to family where highest
of witnesses can matched to the knowledge base
load
25Sample LoadNiels Baches SØREN fadd. Johannes
Michelsens og Niels Mollers hustruer af
Søebyevad, Peder Rasmussen af Søebyevad, Jens
Bachis søn Peder og Niels Thylkes s. Peder af
Truust
Jens Pedersen Bach Inger Nielsen Michel
Jensen Anna Ibsen Peder Jensen Bach Anna
Michelsen Niels Thylke Niels Jensen
BachAbigael Michelsen Peder Nielsen
Thylke Johannes Michelsen Soren
Nielsen Bach SPOUSE arrow PARENT
CHILD SIBLING
26Producing Results Listing
- Processing the Input
- Enough information?
- Do the names, dates, places, and relationships
correspond to lexicon values? - Using ONTOS to extract records
27RESULTS LISTING
- TARGET Jens Pedersen Bach
- Truust, Tvilum Parish, Gjern District,
Skanderborg - born 1693, died 1778
SOURCE -Tvilum Parish Register PAGE HEADER
Fødde 1751 3 BODY Truust Dom. 23 p Trinit
laest over Niels Baches SØREN fadd. Johannes
Michelsens og Niels Mollers hustruer af
Søebyevad, Peder Rasmussen af Søebyevad, Jens
Bachis søn Peder og Niels Thylkes s. Peder af
Truust
28Evaluating the Methodology
- Search Speed
- User Relevance Feedback
- Accuracy of the results list
- Ease or difficulty of use
- Precision and Recall
29MAJOR CONTRIBUTIONS
- A portal for family history research that could
be easily expanded with - Maps and gazeteers
- Look-ups
- Helps
- Training
- Other countries and states
- The first genealogical primary record extractor
using semantic web tools which promises - Accuracy
- Fast response
- Ease of use
- The first use of logic and reasoning inside an
ontology to add expert rules for family history - A practical demonstration of the superiority of
semantic web tools for future research