Title: Developing a Distributed Data Dictionary Service Using LDAP and ISO11179 to Support STEPbased Data I
1Developing a Distributed Data Dictionary Service
Using LDAP and ISO11179 to Support STEP-based
Data Integration and Re-Use
- Jim URen
- Jet Propulsion Laboratory
- California Institute of Technology
- January 22, 2003
2Agenda
- Overview of STEP ISOs family of product and
engineering data standards - Infrastructure services needed to bring
STEP-based data to the desktop - A proto-type data dictionary service based on
ISO11179 -
3What is STEP?
- STandard for the Exchange of Product Data
- an ISO standard (Part of ISO 10303 TC184/SC4)
- Designed to cover all information through a
products life cycle - Includes standard formats (APs), language
(Express) and APIs (SDAI)
4- An ISO standard (10303) that consists of
distributed parts
Application layers
Logical layers
Physical layers
5STEP Development Process
- a structured approach to developing standardized
data models - Develop an activity model that scopes the
development using IDEF0 formalisms (AAM
Application Activity Model) - Develop a user-based model using EXPRESS(ARM
Application Reference Model ) - Develop a STEP model using an integrated resource
library (AIM Application Interpreted Model) - Develop a test suite to validate the data model
(ATS Abstract Test Suite)
6STEP in Spacecraft Development
This slide provides high level information about
how STEP and other standards can be applied to
the engineering domains that are part of the
spacecraft development process.
How the family of STEP Data Standards can be
applied to Spacecraft Development
De-emphasized boxes indicate data models that
are IN DEVELOPMENT.
JAU 2002-10-25
SLIDE_STEP-in-Spacecraft-Development-Ver12d.ppt
7High-level Diagram of Infrastructure Servicesto
supportInformation Access, Reuse and Integration
Visualization Service(s)
Information Service(s)
Tool Service(s)
User Level
Directory Service(s)
Translation Service(s)
Validation Service(s)
Modeling Service(s)
Transaction Level
Repository Service(s)
Part Library Service(s)
Data Level
- Infrastructure Services should have
- Standard interfaces
- Distributed capabilities
- Platform independence
DIAGRAM_STEP-Services-2002-08-30.ppt
JAU 2002-08-30
8Problem
- 1. Data dictionaries mean different things to
different people - Vocabularies - human readable collections of
terms and definitions pertaining to a domain - Data elements - machine interpretable parts
used to build data mode - Data models (information models) - structured,
machine interpretable collections of data
elements that include the structured
relationships between data elements - 2. Dictionaries do not communicate with each other
9What is Needed
- A mechanism that can be used to access, publish,
update, relate and integrate data dictionaries
(vocabularies, data elements, and data models) - Mechanism must be able to span domains and
subdomains, e.g., engineering, science, and
administrative - Mechanism must have both manual and automated
interfaces - Mechanism should follow the distributed service
model (e.g., DNS, Internet Domain Name Service,
x.500 Directory, etc.)
10A Solution
- Develop a distributed data dictionary service
using - LDAP Internet service protocol (LightWeight
Directory Access Protocol) - ISO11179 attributes from a standard set of data
elements - DSML XML DTD/Schema (Directory Service Markup
Language) - Dublin Core Meta-data
- the Service will store and relate vocabulary,
data elements, and data model information
11Advantages of LDAP
- LDAP has many advantages, including
- Universal Access - Internet directory standard,
widely adopted and implemented by numerous
vendors and open source software solutions - Simple - a relatively simple, high-level protocol
with a straightforward API - Extensible - easily extended and adapted
- Access Control and Security - connections can be
authenticated and secured layered Internet
security mechanism - Multi-Platform Development - C/C, Perl, Java,
JavaScript, Python, PHP and other APIs are
available, making LDAP services accessible from
virtually any language, platform, or development
environment
12What is LDAP?
- An Internet Standard from an IETF working
group - RFC 1777 Lightweight Directory Access Protocol
- RFC 1778 String Representation of Standard
Attribute Syntaxes - RFC 1779 String Representation of Distinguished
Names - RFC 1959 LDAP URL Format
- RFC LDAP API
- A distributed, hierarchical data base
- Uses a multi-part naming convention to create
unique records (distinguished names) - Cnbehaviour, dcvocabulary, dcPart233,
dc10303, dcISO - cnrequirement_set, dcdata-element, dcPart233,
dc10303, dcISO - cnTBR-apha1, dcschema, dcPart233, dc10303,
dcISO - Includes ability to implement multiple levels of
security
13 Example of an LDAP tree
ISO
10303
14496
9000
. . .
237
235
. . .
233
203
210
209
Vocabulary
Schema
Data Elements
14Advantages of ISO 11179
- an established international standard
- widely supported - US Census Bureau, NIST,
Defense Information System Agency, Environmental
Security, DoE, DoJ, Bureau of Labor Statistics,
DoT, EPA, etc. - Flexible use of elements within the schema
- Easily implemented in an LDAP directory service -
flexible and easily configured LDAP servers well
suited to flexible 11179 schema
15Data Dictionary Components for a given namespace
16A Distributed Data Dictionary Serviceusing
Standards-based technology LDAP Protocol ISO
11179 meta-data schema DSML Dublin Core
Prototype service viewable at http//step.jpl.nas
a.gov/ldap
Supporting Automated Processes
Supporting Validation Scenarios
Supporting Data Modeling Activities
Supporting Terminology Lookups
17A Proposed Data Element Naming Convention
- A structured, multi-part naming system
- similar to IP addressing and URLs
- dot delimited names
- follows convention used by Dublin Core Meta-data
Initiative - short-name aliases could be supported in the
planned distributed data dictionary service - e.g. author DC.Creator, keywordDC.Subject,
etc. - Names would consist of domains, descriptors and
qualifiers.
18Examples of the Data Element Naming Convention
within JPL Domains
- Dublin Core Meta-data Initiative (a JPL adopted
standard) - DC.Date
- DC.Date.Created
- DC.Date.LastModified
- JPLs Planetary Data System (PDS)
- PDS.Target_Name
- PDS.Sampling_Factor
- JPLs Product Data Management System (PDMS)
- PDMS.Version
- PDMS.ReferenceDesignator
- JPL New Business System (NBS)
- NBS.HR.start_date
- NBS.HR.employee_status
19Terminology Lookup Scenarios
- Resolving Ambiguous Terminology - an end user,
needing to clarify use and meaning of a word used
in a specific context, performs a multi-domain
vocabulary lookup across multiple DD services
looking for published vocabulary of referenced
domain - Finding the Correct Acronym - an end user,
confronted with a number of new acronyms used in
a presentation, accesses a local DD service to
look up the acronyms based within probable
domains, thereby eliminating the alternative
meanings e.g., searching for STEP standards work
versus the JPL STEP project - Enabling Improved Search Engine Performance - as
a search engine scans through a document, it
discovers a keyword list and finds a reserved
word the document includes a reference to a
domain-specific vocabulary list in a DD service
the search engine uses this vocabulary to be
certain it is indexing the keywords in the right
context - Building Glossaries for Technical Papers - an
engineer or scientist writing a technical paper,
needs to include a glossary of relevant terms in
the paper by performing a multi-service search,
terms and definitions that relate to the topic of
the paper are quickly found and inserted into the
paper with the corresponding attributions
20Validation Scenarios
- Validating Units of Measure - a system integrator
receives an MCAD geometry model (e.g., STEP AP203
Part 21 file) of a component to be integrated
into any assembly automatically, a standard
validation routine is performed against the
schema located in a referenced data dictionary
that checks for use of the units of measure
called for in the contract and identified in the
exchange file - Enabling Automated Repository Check-In - as a
STEP model is checked into a PDM system, an
automated validation routine checks the model
using the schema (located in the DD service) that
is identified in the Part 21 data file - Improving Quality of Data Handoffs - an MCAD
geometry model is sent from design to thermal
analysis and validation is performed using the
correct schema version as referenced in the
model validation is an automated process that
occurs before any work is done with the model as
it is transferred between domains - Validating for Adequacy and Range the PDS
(NASAs Planetary Data System) central node
receives a dataset description in template format
to be ingested into the dataset catalogue
database. Automatically, a standard validation
routine is performed that checks for required
keywords, key word values and value types in the
dataset in template format against a
corresponding structure stored in the PDS domain
of the data dictionary service
21Data Modeling Scenarios
- Data Reuse in Modelling Activities- a data
modeller, charged with developing an information
model for a new application, uses data elements
published in several DD services (much like a
parts library), ensuring that the new information
model will have compatible interfaces with data
sets that share the same data elements or
collection of elements - Creating a TDP (technical data package) - an
application performs a schema check against
objects about to be wrapped into a TDP (e.g.,
STEP AP232 or PDM Schema TDP) to ensure their
correct structure and meta-data content - Data Integration Enabled - an analyst, charged
with integrating data from two or more data sets,
accesses the correct version of each schema as
referenced in the data set from the DD service
space allowing them to identify/map interfaces
between the data sets, e.g., MCAD-ECAD-cost data - Extending a schema - to solve a "local" problem,
a data modeller uses data elements from a
published collection of data items to extend an
existing official schema the new schema is
published in the DD service with traces/links
back to the official schema
22Data Archive Scenarios
- Inputing information into an archive - a project
in a post-launch phase would like to archive data
to an institutional archive service using a
translation service, data in proprietary data
formats is translated into a standard, neutral
format based on an open data model. - Retreiving information from an archive an
engineer retreives a dataset from an archive and
would like to validate the well formedness of
the data before attempting to pull the assembly
into a design. - Maintaining/updating an archive a standard data
model is updated to a new version level a
portion of the data in an archive service is in
the older format the decision is made to update
the data in the archive service to the new
format an application checks the data out of the
archive service and updates the data using the
new data model and checks it back into the
archive.
23Whats next? (Completing the prototype)
- Architecture development
- UML Model (50)
- Naming Convention (50)
- Linking ontology (25)
- Server configuration
- 2nd and 3rd DD test nodes (33)
- Wrapping existing DD DBs (10 )
- Client configurations
- LDAP URL (75 ) Java (33)
- Python (33) Perl (33)
- C/C (75) Unix Shell (25)
- PHP (25) Native clients (25)
- Security Configuration
- Government (25)
- Commercial (25)