XML - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

XML

Description:

TMG Offers limited access to its data from another program ... TMG GenBridge feature of SuperTools feature improves on GEDCOM 5.5 and not just ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 49
Provided by: timco67
Category:
Tags: xml | tmg

less

Transcript and Presenter's Notes

Title: XML


1
XML
  • Using
  • Extensible Markup Language for Genealogy

April 19,2003 Tim Costin For CAGGNI
2

At present, there can be little doubt the the
whole of mankind is in mortal danger, not
because we are short of scientific and
technological know-how, but because we tend to
use it destructively, without wisdom. E.F.
Schumacher Small is Beautiful 1973
Try to realize that your data is far more
important than the applications that access
it. Eric Miller Information Week Oct 14.2002
Semantic Web activity lead for the World Wide
Web Consortium
3
Too Many Forks
Henry Petroski The Evolution of Useful Things
1992
4
Data Format Tower of Babel
Data Loss
Data Loss
One format
Hard to use
Save as webpage
Save as webpage
5
Relative Portability
FTW
DOC
PDF
TXT
GED
HTML
XML
6
Round Peg in a Square Hole
7
What is the problem?
  • Software vendors using proprietary data formats
  • Operating system dependencies and differences
  • Language and character set dependencies and
    differences
  • In general, a lack of broad industry standards
  • As a result
  • Data cannot be easily reused with other
    software
  • Data cannot be easily searched
  • Data and software quickly become obsolete
  • Data requires manual effort or special
    conversion programs to achieve limited portability

8
First Family
We will be using the Kennedy Family as our Sample
Family.
9
Family Tree Format (447K)
FTM stores data in an bulky, inefficient,
unreadable, untransferrable, proprietary format
10
What is a good data format?
  • Is it a widely accepted standard?
  • Is is reusable by other programs?
  • Is it portable to other OSs, languages,
    character sets?
  • Is it stored efficiently?
  • Is is accessible and searchable?

If your genealogy data format has these features,
your genealogy data will have much less chance of
becoming obsolete and much greater chance of
being readily available for your descendants.
11
(No Transcript)
12
HTML Sample (6k)
HTML uses format tags, without genealogical
meaning
ltHTMLgt ltHEADgt ltTITLEgtI1 John Fitzgerald KENNEDY
(29 May 1917 - 22 Nov 1963)lt/TITLEgt lt/HEADgt ltBODYgt
ltH2gtltA NAME"I1"gtlt/AgtJohn Fitzgerald KENNEDY
lt/H2gt ltA HREF"2"gt2lt/Agt ltH3gt29 May 1917 - 22
Nov 1963lt/H3gt ltULgt ltLIgtltEMgtBIRTHlt/EMgt 29 May
1917, Brookline, MA, USA ltLIgtltEMgtDEATHlt/EMgt 22
Nov 1963, Dallas, TX, USA ltA HREF"1"gt1lt/Agt ltLI
gtltEMgtREFERENCElt/EMgt 1 lt/ULgt ltBgtFather lt/Bgt ltA
HREF"../d0000/g0000037.htmlI8"gtJoseph Patrick
KENNEDY lt/AgtltBRgt ltBgtMother lt/Bgt ltA
HREF"../d0000/g0000038.htmlI9"gtRose Elizabeth
FITZGERALD lt/AgtltBRgt ltBRgt ltBgtFamily 1lt/Bgt ltA
HREF"../d0000/g0000031.htmlI2"gtJaqueline Lee
BOUVIER lt/Agt ltULgt ltLIgtltEMgtMARRIAGElt/EMgt 12 Sep
1953, Newport, RI, USA lt/ULgt ltOLgt ltLIgt
ltTTgtnbsplt/TTgtltA HREF"../d0000/g0000034.htmlI5"
gtCaroline Bouvier KENNEDY lt/Agt ltLIgt
ltTTgtnbsplt/TTgtltA HREF"../d0000/g0000035.htmlI6"
gtJohn Fitzgerald KENNEDY lt/Agt ltLIgt
ltTTgtnbsplt/TTgtltA HREF"../d0000/g0000036.htmlI7"
gtPatrick Bouvier KENNEDY lt/Agt lt/OLgt ..
13
What is GEDCOM
  • GEDCOM is an acronym for "GEnealogical Data
    COMmunication". 
  • GEDCOM is a standard for transferring genealogy
    data from one genealogy program to another.
  • Authored by the Church of Jesus Christ of Latter
    Day Saints (LDS or Mormon Church).
  • The current version is 5.5 dated 1996 

14
GEDCOM 5.5 sample (4K)
GEDCOM uses 4 character TAGS( shown in upper
case) to label data
Family 1 consists of I1 John Kennedy, I2
Jacqueline Bouvier I5 Caroline Kennedy I6
John-John Kennedy I7 Patrick Kennedy
0 _at_F1_at_ FAM 1 HUSB _at_I1_at_ 1 WIFE _at_I2_at_ 1 CHIL _at_I5_at_ 1
CHIL _at_I6_at_ 1 CHIL _at_I7_at_ 1 MARR 2 DATE 12 SEP 1953 2
PLAC Newport, RI, USA 0 _at_F2_at_ FAM 1 HUSB _at_I8_at_ 1
WIFE _at_I9_at_ 1 CHIL _at_I10_at_ 1 CHIL _at_I1_at_ 1 CHIL _at_I11_at_ 1
CHIL _at_I12_at_ 1 CHIL _at_I13_at_ 1 MARR 2 DATE 07 OCT
1914 2 PLAC Boston, MA, USA
0 _at_I1_at_ INDI 1 REFN 1 1 NAME John
Fitzgerald/Kennedy/ 1 SEX M 1 CHAN 2
DATE 13 FEB 2000 1 BIRT
2 DATE 29 MAY 1917 2 PLAC Brookline, MA,
USA 1 DEAT 2 DATE 22 NOV 1963 2
PLAC Dallas, TX, USA 2 NOTE Assassinated by
Lee Harvey Oswald. 3 CONT 1 NOTE
Educated at Harvard University. Elected
Congressman in 1945 2 CONT aged 29 served
three terms in the House of Representatives.
2 CONT Elected Senator in 1952. Elected
President in 1960, the 2 CONT youngest ever
President of the United States. 2 CONT
2 CONT 1 FAMS _at_F1_at_ 1 FAMC _at_F2_at_ 0 _at_I2_at_
INDI 1 REFN 2 1 NAME Jaqueline
Lee/Bouvier/ ..
Family 2 consists of I8 Joe Kennedy I9 Rose
Kennedy I10 Joe Kennedy I1 John F. Kennedy I11
Bobby Kennedy
15
GEDCOM Testbook Project
  • This was a A project of the National Genealogy
    Society
  • Volunteers typed the same genealogy into 8
    commercial genealogy programs
  • The genealogies were exported and imported to
    GEDCOM
  • Results were less than spectacular
  • Your results may vary

16
GEDCOM import/export errors
  • Custom tags unknown to GEDCOM standard
  • Tags in the wrong position
  • Ignored tags
  • Tags converted to/from the wrong GEDCOM tag
  • Incorrect links
  • Tags in wrong format
  • Lost or corrupted source information
  • Losses affected less commonly used fields more
    often

17
Master Genealogist
  • TMG Offers limited access to its data from
    another program
  • TMG Offers limited export to a spreadsheet
  • TMG GenBridge feature of SuperTools feature
    improves on GEDCOM 5.5 and not just for Master
    Genealogist users
  • GenBridge not free. Not an industry standard.
    Not general purpose software.

18
GenBridge
Family Tree SuperTools brings advanced project
management,the industrys most flexible charting
tools, and many other exclusive features to users
of Family Tree Maker, Personal Ancestral
File,The Master Genealogist, Family Origins,
Ultimate Family Tree, Legacy,and others. By
reading data directly from these programs with
its built-in GenBridge technology, this new
companion product avoids the many problems
normally associated with GEDCOM transfers.
(GEDCOM imports are also supported, however, for
users of other programs.)
19
Computer trade mags are full of XML headlines
20
XML a weapon in Office Suite battle
21
Theres even whole magazines on XML.
22
What is XML?
  • XML stands for Extensible Markup language
  • XML is a new standard created by the World Wide
    Web Consortium (W3C) in the late 1990s for the
    exchange of annotated (tagged) text data between
    programs
  • XML is a meta language. Meta means data about
    data. XML is self-describing.
  • XML is a grammar for constructing custom
    tag(label) languages for different applications.

23
XML Usage Diversity
  • XML text documents and databases
  • MusicML music notation
  • GEDCOM 6.0 Genealogy
  • EBXML B2B E-Commerce
  • MathML Mathematics Data
  • VoiceXML Voice Applications

24
Taxonomy Application
25
Universal Language for Data
  • XML is meant for storing or transporting data
    between programs This is ideal for genealogy
    data and very diverse types of data


26
From Geography Markup to Rendering
lt?xml version"1.0" encoding"iso-8859-1"?gt ltrsgt lt
rgtltnamegtHorton Plazalt/namegtltURLgtlt/URLgtltlabelposgt41
.46,77.51lt/labelposgtltcgt5076,1540 4986,1540
4895,1539 4803,1539 4715,1539 4622,1539 4534,1538
4534,1641 4534,1745 4534,1856 4622,1856 4711,1856
4800,1856 4893,1855 4984,1855 5075,1854 5075,1749
5076,1646 lt/cgtlt/rgt ltrgtltnamegtGaslamplt/namegtltURLgtlt/U
RLgtltlabelposgt44.60,83.00lt/labelposgtltcgt5162,1013
5084,1057 5083,1116 5081,1222 5079,1326 5079,1433
5076,1540 5076,1646 5075,1749 5075,1854 5167,1854
5257,1855 5257,1750 5259,1647 5260,1541 5262,1434
5262,1328 5263,1222 5263,1013 lt/cgtlt/rgt . . .
XML encoding of geographic features (such as GML)
27
Universal Computer Language
Java is a programming language that enables
portability of programs to different computers,
operating systems, languages and character
sets. Java was originally designed for small
appliances. Java is highly successful and is now
the dominant programming language. Java works
very well with XML.
28
XML is a Standard file format
  • All that is needed is to define a set of tags
    for each application
  • XML is so extensible, it can replace many
    proprietary file formats
  • Wont replace all file types
  • Best suited to text formats but can LINK to
    non-text data

29
XML Solves Problems
  • Common grammar allows transmission of data
    between programs instead of reentry
  • Hardware and software independence instead of
    locking data into proprietary formats good only
    in one operating system
  • Reuse of data in many formats instead of reentry
  • Self describing data allows targeted search
    instead of searching heterogeneous data.
  • Unicode character set allows any Language instead
    of just Western languages

30
XML is a Markup language
ltNAMEgtJohn FitzgeraldltSgtKennedylt/Sgt lt/NAMEgt
ltSEXgtMlt/SEXgt ltBIRTgt ltDATEgt29 MAY
1917lt/DATEgt ltPLACgtBrookline, MA,
USAlt/PLACgt lt/BIRTgt
31
XML is Extensible
You can make up your own XML tags - You cannot do
that with HTML
Tags are in red
XML FORMAT ltINDI ID"I1"gt ltNAMEgtJohn
FitzgeraldltSgtKennedylt/Sgt lt/NAMEgt
ltSEXgtMlt/SEXgt ltBIRTgt ltDATEgt29 MAY
1917lt/DATEgt ltPLACgtBrookline, MA,
USAlt/PLACgt lt/BIRTgt ltDEATgt
ltDATEgt22 NOV 1963lt/DATEgt ltPLACgtDallas,
TX, USAlt/PLACgt ltNOTEgtAssassinated by Lee
Harvey Oswald.ltBR/gt lt/NOTEgt
lt/DEATgt lt/INDIgt Tags describe the meaning
(semantics) of the data Formatting is done
separately with style sheets
HTML FORMAT ltPgtJohn Fitzgerald
Kennedy ltBRgtM ltBRgtBORN 29 MAY 1917 in Brookline,
MA, USA ltBRgtDIED 22 NOV 1963 in Dallas
Texas ltPgtltBgtNOTE lt/Bgt Assasinated By Lee Harvey
Oswald.ltBR/gt Tags describe the format of
the data
32
XML sample (7K) from GEDCOM
XML for genealogy will be like much like GEDCOM
ltINDI ID"I1"gt ltREFNgt1lt/REFNgt
ltNAMEgtJohn FitzgeraldltSgtKennedylt/Sgtlt/NAMEgt
ltSEXgtMlt/SEXgt ltBIRTgt ltDATEgt29 MAY
1917lt/DATEgt ltPLACgtBrookline, MA,
USAlt/PLACgt lt/BIRTgt ltDEATgt
ltDATEgt22 NOV 1963lt/DATEgt ltPLACgtDallas,
TX, USAlt/PLACgt ltNOTEgtAssassinated by Lee
Harvey Oswald.ltBR/gtlt/NOTEgt lt/DEATgt
ltNOTEgtEducated at Harvard University. Elected
Congressman in 1945ltBR/gt aged 29 served three
terms in the House of Representatives.ltBR/gt Electe
d Senator in 1952. Elected President in 1960,
theltBR/gt youngest ever President of the United
States.ltBR/gt ltBR/gt lt/NOTEgt
ltFAMS REF"F1"/gt ltFAMC REF"F2"/gt
lt/INDIgt ltINDI ID"I2"gt ltREFNgt2lt/REFNgt
ltNAMEgtJaqueline LeeltSgtBouvierlt/Sgtlt/NAMEgt ..
ltFAM ID"F1"gt ltHUSB REF"I1"/gt ltWIFE
REF"I2"/gt ltCHIL REF"I5"/gt ltCHIL
REF"I6"/gt ltCHIL REF"I7"/gt ltMARRgt
ltDATEgt12 SEP 1953lt/DATEgt
ltPLACgtNewport, RI, USAlt/PLACgt lt/MARRgt
lt/FAMgt ltFAM ID"F2"gt ltHUSB REF"I8"/gt
ltWIFE REF"I9"/gt ltCHIL REF"I10"/gt
ltCHIL REF"I1"/gt ltCHIL REF"I11"/gt
ltCHIL REF"I12"/gt ltCHIL REF"I13"/gt
ltMARRgt ltDATEgt07 OCT 1914lt/DATEgt
ltPLACgtBoston, MA, USAlt/PLACgt lt/MARRgt
lt/FAMgt
33
Key Standards for Genealogy in the future
  • XML standards
  • (XML,XSLT,DTD,)
  • GEDCOM standard based on XML GEDCOM 6.0 will
    be in XML format
  • Browser standards (HTML,CSS,JAVASCRIPT)
  • Server standards (Java Servlets)
  • Related XML languages for public records, and
    geography

34
Trends in the software industry
  • Toward Open Source software (Linux, Apache, Java
    Servlets)
  • Toward Freeware and Shareware (XML utilities,
    GEDCOM utilities, PAF)
  • Toward Standards-based software (Star office,
    browser user interface)
  • Away from proprietary data formats and toward
    reusable formats (Microsoft office formats, all
    genealogy program formats)

35
GEDCOM 6.0
  • Authored by the Family History Department of
    the Church of Latter Day Saints
  • Beta released December 6,2002
  • Uses XML format and Unicode
  • Includes a DTD Document Type Definition that
    defines the rules for a common vocabulary and
    grammar for genealogy data in XML files.

36
Presentation WEB Page
Meta-description Meta-keywords Scripts
Comments
header
Format tags lttitlegt ltpgtparagraph ltbgtbold lttablegt lt
olgtordered list lth1gtheader ltbuttongt ltcolorgt ltfontgt
ltaligngt ltsizegt
text
Formatting Stylesheet (CSS)
text
body
links
Photos graphics
forms
  • A Web page contains formatted text and images
  • Looks good, is accessible, but not very searchable

37
Semantic WEB Page
Family_sheet.xsl Ancestor_chart.xsl Descendant_cha
rt.xsl
Kennedy_family.html Kennedy_ancestor_chart.html Ke
nnedy_descendant_chart.html
Kennedy.xml
Transformation Stylesheet (XSL) Formats xml tags
by rules ltheadergt lttitlegt ltpgtparagraph ltbgtbold ltta
blegt ltolgtordered list lth1gtheader ltbuttongt ltcolorgt
ltfontgt ltaligngt ltsizegt
Formatting Stylesheet (CSS)
XML data
Meta-description Meta-keywords Scripts Comments
text
Format tags lttitlegt ltpgtparagraph ltbgtbold lttablegt lt
olgtordered list lth1gtheader ltbuttongt ltcolorgt ltfontgt
ltaligngt ltsizegt
XML data
text
Graphics Photos Media
links
Photos graphics
forms
  • A Semantic Web page separates data from the
    presentation
  • Tagged XML data is much more searchable

38
Servlet Examples
  • GedServlet
  • XALAN

http//www.kennedy.org
39
Command Line Examples
  • SAXON
  • XALAN

40
XSL Stylesheets
XSL stylesheet to create a list of
names ltxsltransform gt ltxsltemplate
match"/"gt ltxslapply-templates/gt lt/xsltemplat
egt ltxsltemplate match"GED"gt lthtmlgt
ltheadergtlttitlegtlist of names and
birthdayslt/titlegt lt/headergt
ltbodygtltxslapply-templates select"INDI"/gtlt/bodygt
lt/htmlgt lt/xsltemplategt ltxsltemplate
match"INDI"gt ltp/gt ltbgtltxslvalue-of
select"NAME"/gtlt/bgt ltbr/gt-----BORN
ONltxslvalue-of select"BIRT"/gt
lt/xsltemplategt lt/xsltransformgt
Html file lthtmlgt ltheadergt lttitlegtlist of names
and birthdayslt/titlegt lt/headergt ltbodygt ltp/gtltbgtJohn
Fitzgerald Kennedylt/bgt ltbrgt-----BORN ON 29 May
1917 ltp/gtltbgtJoseph Patrick Kennedylt/bgt ltbrgt-----B
ORN ON 6 SEP 1888 next person lt/bodygt lt/htmlgt
XML data ltGEDgt ltINDIgt ltNAMEgt
John Fitzgerald ltSgtKennedylt/Sgt
lt/NAMEgt ltBIRTgt
ltDATEgt29 MAY 1917lt/DATEgt lt/BIRTgt
lt/INDIgt ltGEDgt
  • XSL stylesheets have templates to control how to
    format each xml tag
  • XSL stylesheets control which tags to process
    and in what order
  • You can have as many XSL stylesheets as there
    are ways to format the data

41
Searching the Presentation Web
42
Presentation Web Searching
  • Search Engines search through unstructured data
  • Search Engines only can try to match strings of
    characters so some of the data on some of the
    webpages - not everything
  • Search Engines do not know what you mean
  • Search Engines do not what the data in any web
    pages means
  • Most of the intelligence must be supplied behind
    the eyes of the surfer
  • Much of the search results are irrelevant and
    waste a lot of time
  • Search Engines like Google make the best of it
    and have ways to score
  • hits and sort by relevance or they specialize
    like Ancestry.com

43
WGA HOMEPAGE
44
KEYWORD WEB SEARCHES
  • Commonly used in business databases like Oracle,
    Sybase, and DB2
  • Search for John Carpenter in a NAME field not
    just John Carpenter
  • Only search NAMEs, not everything else on the
    semantic web
  • Wont get articles on Carpentry or Carpenter ant
    because of synonyms
  • No need for HTML META tags, all XML data is
    already tagged
  • Add GEDCOM to the search to the search only
    Genealogy pages

Hypothetical search DOCTYPEGEDCOM AND
SURNAMECARPENTER AND BIRTHDATEgt1903
45
Searching the Semantic Web
46
SEMANTIC WEB SEARCHES
  • Currently the stuff Science Fiction, like HAL
    or R2D2
  • W3C and academics are working on it
  • Based on more detailed definition on XML tags
    and how the relate to one another. I.E. It
    will know Illinois is in the USA
  • Encodes the meaning of words in context, in
    phrases and sentences
  • Understands words with multiple meanings -
    Polysemy
  • Understands words with the same meaning
    synonyms
  • Computers search for what you mean, not for
    arbitrary words
  • Your personal web agent will search the web for
    you and will know the following is looking for
    a person in a place at a certain time

Hypothetical search Find all Ian Kennedys in
Waterford from 1800-1820
47

Only a Genealogist regards a step backwards as
progress --Unknown
48
Resources
GEDCOM 6.0 http//www.familysearch.org/GEDCOM/GedX
ML60.pdf
GEDCOM FAQ http//www.familysearch.org/Eng/Home/FA
Q/frameset_faq.asp?FAQfaq_gedcom.asp
GEDCOM Testbook Project http//www.gentech.org/ngs
gentech/projects/TestBook2001/index.htm
XML and genealogy http//www.oasis-open.org/cover
/genealogy.htmlgedML
Write a Comment
User Comments (0)
About PowerShow.com