Title: Mapping Between SIC and NAICS An Illustrative Example for XMDR
1Mapping Between SIC and NAICSAn Illustrative
Example for XMDR
- Fredric Gey
- UC Berkeley
- XMDR Project Meeting
- January, 2007
- UC Berkeley Faculty Club
2SIC to NAICS Mappings (outline)
- What are SIC and NAICS?
- How have they been loaded into the XMDR
prototype? - Kinds of mappings
- Semantic mappings
- Statistical mappings
- A strawman mapping syntax for XMDR
3SIC to NAICS mappings
- What are SIC and NAICS?
- SIC US Standard Industrial Classification
- Hierarchical classification of US industry
(2,3,4-digit) - Specified by the Office of Management and Budget
- Used to report economic summary statistics until
1997 - NAICS North American Industrial Classification
System - Replaced the SIC for uniform classification of
industry in US, Canada, Mexico, beginning 1997 - 2,3,4,5,6-digit hierarchy
- 5-digit comparability between the three countries
- 6th digit available for country-specific detail
(zero used by USA where no additional detail is
utilized)
2
4Why Might NAICS Matter to EPA?
3
5SIC to NAICS Matching(example for Mineral
Industries)
47
6SIC loading into XMDR prototype
- Experimented with Protégé for Mining sector
7SIC loading into XMDR prototype 2
- Load inputter for hierarchical classification
list for LexGrid (thanks Harold) - Uses concept code construct
- Uses NT relationship to specify hierarchy
- - ltconcept conceptCode"10"gt
- ltlgCommonentityDescriptiongtMETAL
MININGlt/lgCommonentityDescriptiongt - - lt/conceptgt
- ltconcept conceptCode"104"gt
- ltlgCommonentityDescriptiongtGOLD AND SILVER
ORESlt/lgCommonentityDescriptiongt - - lt/conceptgt
- - ltconcept conceptCode"1041"gt
- ltlgCommonentityDescriptiongtGOLD
ORESlt/lgCommonentityDescriptiongt - - lt/conceptgt
- - ltconcept conceptCode"1044"gt
- ltlgCommonentityDescriptiongtSILVER
ORESlt/lgCommonentityDescriptiongt - - lt/conceptgt
8SIC loading into XMDR prototype 3b
- Load inputter for hierarchical classification
list for LexGrid - Uses NT relationship to specify hierarchy
- - ltlgRelassociation association"NT"
forwardName"narrowerTerm" isTransitive"true"
reverseName"broaderTerm"gt - ltlgCommonentityDescriptiongtA generic
broader/Narrower relationshiplt/lgCommonentityDesc
riptiongt - ltlgRelsourceConcept sourceCodingScheme"SIC"
sourceConcept"104"gt - ltlgReltargetConcept targetCodingScheme"SIC"
targetConcept"1041" /gt - ltlgReltargetConcept targetCodingScheme"SIC"
targetConcept"1044" /gt - lt/lgRelsourceConceptgt
- lt/lgRelassociationgt
9NAICS to SIC Matching
- Important for historical data comparison and
development of time series
10NAICS to SIC Matching
- Mappings are
- One to one (rarely)
- Many to one (sometimes)
- Many to many (usually)
- Census Bureau supplies
- Comparable statistics for 1997 Economics Census
- Downloadable code files with bridge
- Open Issue
- Can statistical allocation between code sets be
captured? - I.e. NAICS1 ? SIC1 (.35) ?SIC2(.65)
11NAICS to SIC Matching(example 1-1 for oil and
gas extraction)
12A Strawman Syntax for XMDR mapping from an Old
Classification to a New Classification
ltlgRelassociation association"mapsTo"
forwardName"mapsTo" ltreverseName"mappe
dFromgt lttargetCodingScheme"coding_scheme_name"gt
-ltlgRelsourceConcept sourceConcept"source_concep
t_code"gt -ltlgReltargetConcept targetConcept"
target_concept_code "gt ltlgRelassociationQualif
ication associationQualifier"exact
almost_exact approx" /gt
ltlgRelassociationQualification
MappingType"semantic statistical" /gt
ltlgRelassociationQualification
MapsToDegree"fraction" /gt
ltlgRelassociationQualification
MapsFromDegree"fraction" /gt
ltlgRelassociationQualification
MapsToThreshold"percent" /gt lt/lgReltargetConcept
gt lt/lgRelsourceConceptgtlt/lgRelassociationgt
13Definitions of Qualifiers
- MappingType
- Semantic meaning of the source concept is
aligned with the meaning of the target concept - Statistical mapping is done using a common
database indexed/classified by both source and
target concept codes - ltassociationQualifier"exact almost_exact
approx" /gt - exact - mapping is 1-1
- Almost_exact - for statistical mapping, mapping
is almost exact within some e (epsilon) percent
difference as a MapsToThreshold - Approx mapping is inexact with overlaps between
concepts - Degree of (statistical) mapping
- ltlgRelassociationQualification
MapsToDegree"fraction1" /gt - ltlgRelassociationQualification
MapsFromDegree"fraction2" /gt - i.e. a fraction2 of the source concept is
represented by fraction1 of the target concept
14Additional Qualifiers?
- MappingMode
- Automatic meaning of the source concept is
automatically mapped between source and target
concepts by some software - Manual mapping is done with human editors
- Observation Semantic mapping could be automatic
if done by NLP software which compares the
textual definitions of source and concept codes - ReferenceDatabase - the detailed database which
has been utilized for the statistical mapping
(e.g. 1997 Economic Census company establishment
for SIC and NAICS) - Your qualifier goes here
15Discussion of XMDR standard implications
- XMDR does not currently have qualifiers for
relationships (associations between concepts) - Should we extend the model in this direction?
- Should we look elsewhere in the standard (e.g.
the ontology region)?
16NAICS to SIC Matching(example for hazardous
waste treatment/disposal)