Title: ISO 16642 - a tutorial Part 2: Representing data categories
1ISO 16642 - a tutorialPart 2 Representing data
categories
- TMF - Terminological Markup Framework
- Laurent Romary - Laboratoire Loria
2Why formalizing DatCats?
- Systematizing data category description
- Notion of Data Category Registry (DCR)
- I need a data category is it there?
- Query by name, definition etc.
- Automatizing processes
- Format control of TMLs
- Filters from one TML to GMT
3Which model for DatCats?
- Using XML
- Coherence with TMF principles
- Using stylesheet to generate schemas and filters
- Using RDF (Resource Description Framework)
- Intended format for representing meta-data
- Description of a DatCat is meta-data with regards
TMF
4RDF - a quick presentation
5Data Categories
6Data Category Registry
DCRegistry
rdfabout
Description
dcsdDataCategory
dcsdVersionNumber
Data Category
VersionNumber
7Data Category description
DCIdentifier
DCParent
DCName
dcsdDCIdentifier
dcsdDCParent
DCDefinition
dcsdDCName
dcsdDCDefinition
DCType (S, C)
dcsdDCType
Data Category
dcsdDCExample
DCExample
dcsdDCAdmin
dcsdDCComment
dcsdContent
dcsdLevel
DCAdmin
DCComment
Content
Locus
Salt 2000-11-08/SEW
8Simple and complex DatCats
- Complex data categories
- shall serve as field identifiers (not names) in
databases and can have content. The datatype for
this content shall be declared for each data
category and can commonly take the form of
different categories of text, defined data types
(such as dates), and specified data domains,
e.g., picklists comprising standardized
permissible instances. - Example /Part of Speech/
- Simple data categories
- shall serve as the content of complex data
categories. - Example /Noun/, /Verb/, /Adjective/ etc.
9Levels and content
Content
dcsdDataType
dcsdTargetType
Level/Loci
rdfAlt
rdfAlt
TargetType
DataType
List of References
List of References
rdfAlt
rdfli
Ref to other datcats
rdfli
List of References
Ref to other datcat(s)
rdfli
Ref to other datcat(s)
10Administrative properties
Source
Status
Data Category
dcsdSource
dcsdStatus
dcsdDCAdmin
StatusDate
dcsdStatusDate
DCAdmin
dcsdEditionDate
dcsdStatusNote
StatusNote
dcsdVariantNames
EditionDate
VariantNames
DcsdShortForm
DcsdForbiddenName
DcsdAdmittedName
ShortForm
AdmittedName
ForbiddenName
11RDF Representation
12/term/ - RDF description (1)
- ltdcsdDataCategory dcsdDCIdentifier"ISO12620A01
" - dcsdDCName"term"
- dcsdposition"A.01"
- dcsdDCType"C"gt
- ltdcsdDCDefinitiongt A verbal designation of a
general - concept in a specific subject field
lt/dcsdDCDefinitiongt - ltdcsdDCCommentgt
- ltdcsdsourceCommentgtFor definition of related
term, see ISO 1087-1, 3.4.3.lt/dcsdsourceCommentgt - ltdcsdconceptCommentgtTerms can consist of
single words or be composed of multiword
stringslt/dcsdconceptCommentgt - ltdcsdExamplegt"radix" in annex C, figure
C.1.lt/dcsdExamplegt - ltdcsdDictionnaryIDgtA.1lt/dcsdDictionnaryIDgt
- lt/dcsdDCCommentgt
13/term/ - RDF description (2)
- ltdcsdContent dcsdDataType"plainText"/gt
- ltdcsdLevelgt
- ltrdfAltgt
- ltrdfligtTLlt/rdfligt
- ltrdfligtTClt/rdfligt
- lt/rdfAltgt
- lt/dcsdLevelgt
- ltdcsdDCAdmin dcsdOrgSource"ISO TC 37"
- dcsdDocSource"ISO126201999"
- dcsdsubDate"2000-10-20 SEW"
- dcsdregistryComment"Prepared
- 2000-10-20"
- dcsdStatus"Accepted"/gt
- lt/dcsdDataCategorygt
14/term type/ - RDF description (1)
- ltdcsdDataCategory dcsdDCIdentifier"ISO12620A020
1" - dcsdDCName"term type"
- dcsdposition"A.02.01"
- dcsdDCType"C"gt
- ltdcsdDCDefinitiongtAn attribute assigned to a
- termlt/dcsdDCDefinitiongt
- ltdcsdDCCommentgt
- ltdcsdDictionnaryIDgtA.2.1lt/dcsdDictionnaryIDgt
- lt/dcsdDCCommentgt
- ltdcsdContent dcsdDataType"picklist"gt
- ltrdfAltgt
- ltrdfligtISO12620A020101lt/rdfligt
- ltrdfligtISO12620A020102lt/rdfligt
- ltrdfligtISO12620A020119lt/rdfligt
- lt/rdfAltgt
- lt/dcsdContentgt
15/term type/ - RDF description (2)
- ltdcsdLevelgt
- ltrdfAltgt
- ltrdfligtTLlt/rdfligt
- ltrdfligtTClt/rdfligt
- lt/rdfAltgt
- lt/dcsdLevelgt
- ltdcsdDCAdmin dcsdOrgSource"ISO TC 37"
- dcsdDocSource"ISO126201999"
- dcsdsubDate"2000-10-20 SEW"
- dcsdregistryComment"Prepared
- 2000-10-20"
- dcsdStatus"Accepted"/gt
- lt/dcsdDataCategorygt
16Actualizing a DatCat
17Styling properties
Anchor
Level
Simple Element Attribute TypedElement ValuedEleme
nt TVElement
AnchorInfo
StyleName
Data Category
dcsdAnchor
dcsdStyleName
dcsdStyle
dcsdElementName
ElementName
Style
dcsdValue
dcsdAttributeName
dcsdTypeValue
AttributeName
Value
TypeValue
For Simple
18Attribute style description
- dcsdStyleName"Attribute"
- Conditions of use
- Not valid for annotations
- Required properties
- dcsdAttributeName
- Example
- dcsdAttributeName"id"
- ltanchorElement id"xx54893"gtlt/gt
19Element style description
- dcsdStyleName"Element"
- Required properties
- dcsdElementName
- Example
- dcsd ElementName "definition"
- ltdefinitiongtlt/definitiongt
20TypedElement style description
- dcsdStyleName"TypedElement"
- Required properties
- dcsdElementName, dcsdTypeValue
- Example
- dcsdElementName "termNote"
- dcsdTypeValue"partOfSpeech"
- lttermNote type"partOfSpeech"/gtNlt/termNotegt
21ValuedElement style description
- dcsdStyleName"ValuedElement"
- Conditions of use
- Not valid for annotations
- Required properties
- dcsdElementName
- Example
- dcsdElementName "pos"
- ltpos value"noun"/gt
22TVElement style description
- dcsdStyleName"TVElement"
- Conditions of use
- Not valid for annotations
- Required properties
- dcsdElementName, dcsdTypeValue
- Example
- dcsdElementName "free"
- dcsdTypeValue"pos"
- ltfree type"pos" value"noun"/gt
23Simple style description
- dcsdStyleName"Simple"
- Conditions of use
- Express the value of simple data categories
- Required properties
- dcsdValue
- Example
- dcsdValue "Nom"
- ltposgtNomlt/posgt
24Dealing with languages
25Two types of languages
- Working language
- The language used at a given place in a document,
along the XML hierarchy - Representation xmllang
- Object language
- The language about which you speak at a given
place in your terminological entry (e.g.
describes the Language Section level) - Representation as a data category "language",
with a narrow scope
26Example DXLT
- ltlangSet lang'en xmllang"fr"gt
- ltdescrip type"definition"gtUne valeur entre 0 et
1 utilisée...lt/descripgt - lttiggt
- ltterm xmllang"en"gtalpha smoothing
factorlt/termgt - lttermNote type"termType"gtfullFormlt/termNotegt
- lt/tiggt
- lt/langSetgt
27Example GMT
- ltstruct type"LS" xmllang"fr"gt
- ltfeat type"language"gtenlt/featgt
- ltfeat type"definition"gtUne valeur entre 0 et 1
utilisée...lt/featgt - ltstruct type"TL"gt
- ltfeat type"term" xmllang"en"gtalpha smoothing
factorlt/featgt - ltfeat type"termType"gtfullFormlt/featgt
- lt/structgt
- lt/langSetgt
28Conclusion
- A general model for analysing and representing
terminological data collection - An underlying formalism expressed in XML,RDF
- Associated tools (Salt project)
- DCSEditor,
- DCSBrowser,
- Automatic generation of XSLT filters and XML
schemas from a given TML specification
29Useful pointers
- SALT project
- http//www.loria.fr/projets/SALT
- http//www.ttt.org/
- The TMF site
- http//www.loria.fr/projets/TMF