A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema

Description:

enables more precise searching/browsing. extends to linking between sources (e. ... Unique to spoken texts kinesic Linking, segmentation and alignment anchor ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 17
Provided by: COR119
Category:

less

Transcript and Presenter's Notes

Title: A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema


1
A Common Standard for Data and Metadata The
ESDS Qualidata XML Schema
  • Libby Bishop
  • ESDS Qualidata UK Data Archive
  • E-Research Workshop
  • Melbourne
  • 27 April 2006

2
Why another schema?
  • need a standard
  • that includes both file-level metadata and
    content-level metadata
  • enables more precise searching/browsing
  • extends to linking between sources (e.g. text,
    annotations, analysis, audio etc)
  • need one customised to social science research
    that
  • meets generic needs of varied data types
  • is more analytical than ones adapted from TEI
    speech schema (e.g. oral history projects)
  • is less granular than ones for conversational
    analysis (highly detailed)

3
What does a schema enable?
  • marking up data to an XML standard for data
    providers to publish to online systems, such as
    ESDS Qualidata Online
  • meet needs of researchers requesting a standard
    they can follow
  • encourage more qualitative data analysis software
    companies to pursue XML- outputs (and
    import/export tools) based on this standard

4
Hybrid of two standards
  • for the metadata the DDI Standard for
  • study, file and variable level
  • Level 1 DDI Document description
  • Level 2 DDI Study description
  • Level 3 DDI Data file description
  • file contents format data checks processing
    software)
  • Level 4 DDI Variable description
  • for study survey data (mixed methods) or numeric
    outputs from qualitative data
  • demographic profile of sample
  • other quantified responses to qualitative data
    (attributes or thematic classifications often
    assigned (coded) in CAQDAS software)
  • Level 5 DDI Other study related materials
  • Level 6 TEI-based qualitative content

5
TEI for content mark-up
  • standard for text mark-up in humanities and
    social sciences
  • Elements for the header for a TEI-conformant
    DTDltteiheader type text/corpusgt
  • ltfileDescgt ltencodingDescgt ltprofileDescgt
    ltrevisionDescgt standard bibliographic ref to text
  • Mandatory
  • ltteiHeader typetextgt
  • ltfileDescgt
  • lttitleStmtgt lt!-- ... --gt lt/titleStmtgt
    ltpublicationStmtgtlt!-- ... --gt
    lt/publicationStmtgt
  • ltsourceDescgt lt!-- ... --gt lt/sourceDescgt
    lt/fileDescgt
  • lt!-- remainder of TEI Header here --gt
  • lt/teiHeadergt

6
What features do we need to mark-up and why?
  • Spoken interview texts provide the clearest?and
    most common?example of the kinds of encoding
    features needed
  • Three basic groups of features
  • structural features representing basic format
    utterance, specific turn taker, other speech tags
    e.g. defining idiosyncrasies
  • structural features representing links to other
    data types created in the course of the research
    process (e.g. audio or video referencing points,
    researcher annotations)
  • structural features representing identifying
    information such as real names, company names,
    place names, temporal information

7
Reduced set of TEI elements
  • Start with core tag set for transcription, then
    add
  • Editorial changes ltuncleargt
  • Names, numbers, dates ltnamegt
  • Links and cross references ltrefgt
  • Notes and annotations ltnotegt
  • Text structure ltdivgt
  • Unique to spoken texts ltkinesicgt
  • Linking, segmentation and alignment ltanchorgt
  • Advanced pointing, will use XPointer framework
  • Synchronisation
  • Contextual information (participants, setting,
    text)

8
(No Transcript)
9
Metadata for model transcript output
  • Study Name lttitlStmtgtlttitlgtMothers and
    Daughterslt/titlgtlt/titlStmtgt
  • Depositor ltdistStmtgtltdepositrgtMildred
    Blaxterlt/depositrgtlt/distStmtgt
  • Interview number ltintNumgt4943int01lt/intNumgt
  • Date of interview ltintDategt3 May 1979lt/intDategt
  • Interview ID ltpersNamegtg24lt/persNamegt
  • Date of birth ltbirthgt1930lt/birthgt
  • Gender ltgendergtFemalelt/gendergt
  • Occupation ltoccupationgtpharmacy
    assistantlt/occupationgt
  • Geo region ltgeoRegiongtScotlandlt/geoRegiongt
  • Marital status ltmarStatgtMarriedlt/marStatgt

10
Transcript with XML mark-up
11
XML is source for .rtf download
12
Metadata used to display search results
13
XMLXSL enables online publishing
14
Some questions to resolve
  • What hierarchical elements should we use for
    collections of interview transcripts? Corpus,
    group/text, text/div?
  • What is the best XPointer scheme (or schemes) to
    handle linking and pointing to external
    resources?
  • Are there preferred standards for linking to, and
    synchronising with, audio and video?
  • We have some text requiring non-hierarchical
    coding and need to determine which of the schemes
    for multiple hierarchies best suits our texts.
  • How can we best use TEI metadata to incorporate
    several DDI elements used by the UKDA for
    cataloguing?
  • We are adapting natural language processing tools
    (NXT NITE XML Toolkit http//www.ltg.ed.ac.uk/N
    ITE/) to automate the mark-up of qualitative
    data. We are seeking advice on some issues
    arising from the integration of TEI and NXT.

15
Conclusion
  • More information soon on the SQUAD website
  • http//quads.esds.ac.uk/projects/squad.asp

16
Qualitative Data Mark-up Tools (QDMT)
  • systematic preparation of digital data to
    create formatted text documents ready for xml
    output
  • mark-up of data to capture basic structural
    features of textual data e.g. speakers and
    selected demographic details
  • advanced annotation or mark-up of data
  • automated information extraction of basic
    semantic information inserting tags for names
    and temporal information
  • automated anonymisation replacing names with
    dummy forms, including co-references
  • geographic mark-up to enable data linking
    identifying and applying geographic mark-up, and
    scoping researchers' needs for geo-linking
  • basic classification or thematic coding of
    textual data will investigate linking into a
    domain ontology (e.g. social science thesaurus)
  • contextual documentation to capture richness of
    the research methods, data collection and
    analytic interpretation and representation will
    look at the interrelationships between complex
    intra-project data, annotations and context
  • exposure of annotated and contextualised
    qualitative data to the web investigating
    publishing of above QDM XML outputs to ESDS
    Qualidata Online, opportunities for exchange
    within CAQDAS tools, etc.
Write a Comment
User Comments (0)
About PowerShow.com