Dissertation Defense - PowerPoint PPT Presentation

Loading...

PPT – Dissertation Defense PowerPoint presentation | free to view - id: 1baee2-N2JjO



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Dissertation Defense

Description:

Dissertation Defense. Turning Information into Action: Assessing and Reporting ... Increasing schism between data creation and its assessment ... – PowerPoint PPT presentation

Number of Views:13033
Avg rating:3.0/5.0
Slides: 48
Provided by: timothym5
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Dissertation Defense


1
Dissertation Defense
  • Turning Information into Action Assessing and
    Reporting GIS Metadata Integrity Using Integrated
    Computing Technologies

Timothy Mulrooney
2
Outline
  • GIS Metadata Background
  • The Problem
  • Research Questions
  • Methodology
  • Testing and Results
  • Conclusion
  • Discussion

3
GIS Data Facts
  • Most data have a spatial element
  • Most expensive component of GIS is data
    development

4
What is Metadata?
DUI in Winston-Salem
How Did you Create These Data?
What Do These Data Represent?
How can I Get in Touch with the Person Who
Manages the Data?
When Were These Data Published?
5
The Problem
  • Within a Single Metadata File
  • More than 400 Elements
  • 7 FGDC (as per CSDGM) Required Elements
  • 15 FGDC Suggested Elements
  • 21 Other Interesting Elements
  • My previous work
  • 120 Databases
  • 50 110 layers per database

6
Research Questions
  • How can mathematical methods be applied to GIS
    metadata to support the decision making process?

7
Methodology
  • Idea arose from mass population and extraction of
    metadata values
  • Used ArcObjects/VBA
  • Previous metadata extraction limited to
  • software package
  • operating system
  • How can this be done on a regular basis?

8
Metadata Assessment and Reporting Tool (MART)
9
Data Preparation
  • Perl (Practical Extraction and Reporting
    Language)
  • Perl extracts elements from various XML metadata
    files and puts them into single CSV file

10
FGDC Compliancy
  • Test FGDC Compliancy
  • Used Perl to check if appropriate metadata
    elements were populated

11
Data Analysis
  • What descriptive metrics can best be applied to
    GIS metadata?
  • Temporal Mean
  • Temporal Median
  • Converted big Endian format (ISO 8601) to ratio
    number, performed calculations and then back to
    date

12
Data Analysis
  • Use Perl and R programming language to
    dynamically assess quantitative and qualitative
    metadata fields
  • R is a language and environment for statistical
    computing and graphics

13
Results from this Analysis
Average Horizontal Accuracy 104.3
Meters Temporal Mean 20020402
Average Horizontal Accuracy 31.7
Meters Temporal Mean 20071002
14
Supervised Techniques
  • Custom application used to query database
    containing metadata information
  • PHP language and MySQL database to query
    information about all data layers using web
    interface
  • Helps to guide the decision making process

15
Unsupervised Data Mining Techniques
  • Process of sorting through large amounts of data
    in order to pick out relevant information
  • Rule Induction (Association Rule Mining)
  • Discover interesting relationships between
    variables
  • Beer and Diapers example
  • Integrate existing Perl ARM module with custom
    application to create transaction table from
    which rules are derived.

16
Transaction Table Creation
  • Decomposition of a 2-dimensional array into
    1-dimension. For 890 layers and 43 attributes,
    there will be 38,270 transactions
  • How to express quantitative values in nominal or
    ordinal environment (low, medium, high, location)
  • How to express categorical data within
    transaction table

17
MARTO-XML
  • XML Standard used to describe output from
    analysis (Metadata Assessment and Reporting Tool
    Output)

18
Data Rendering
  • Results are published in a web page
  • Modules tied together using batch files
  • New data and graphs are created on a schedule
  • Old data are archived
  • New links established
  • Saved and referenced from legacy XML files

19
Testing
  • Human Testing of 40 respondents using real GIS
    data on MART
  • GIS professionals navigate results from analysis
    in web environment
  • Technology Acceptance Model (TAM) used to assess
    the effectiveness of this technology

20
Testing Environment
  • GIS database of 890 individual data layers
  • Ran and published output from all aforementioned
    modules
  • Surveyed 40 respondents for their opinions on the
    applications ease of use, usefulness, potential
    utility and the intention to use the software

21
Testing FGDC Compliancy
22
Temporal and Horizontal Accuracy
23
Supervised Techiques
  • Database from 890 data layers was queried using a
    web interface created using PHP and dynamically
    created HTML form elements
  • Output was published in a tabular HTML table with
    records satisfying the data query being published

24
Unsupervised Techniques
  • Look for patterns within a large transaction
    table within a support, confidence and strength
  • Used a support Level of 2 (with a support level
    of 4, there would be more than 526,00 rules)
  • For support level 2 and confidence of .7 (1
    antecedent and 1 consequence), 6204 rules were
    created
  • Results are published in a .txt file

25
Sample Rules Created
  • 67 1.000 Place_KeyNorth_Carolina gt
    GeoidNOT_FOUND
  • 508 0.992 Place_KeyForsyth_County gt
    Publication_DateMedium
  • 32 0.800 LocationNorthwest gt Publication_DateOl
    d
  • 353 1.000 Data_ThemePublic_Safety gt
    EllipsoidNOT_FOUND
  • 14 0.824 Data_ThemeWetlands gt
    Publication_DateUnknown
  • These rules combined with supervised techniques
    can dictate allocation of resources and decisions
    in the future

26
Testing
  • Technology Acceptance Model
  • Uses survey-based questions to distinguish
    relationship between a technologys
  • Perceived Ease of Use (PEOU)
  • Perceived Usefulness (PU)
  • Attitude Towards Using (ATTIUDE)
  • Intention to Use (ITU)

27
TAM Model Used
  • H1 Perceived Ease of Use for MART has a
    significant effect on the Perceived Usefulness of
    MART.
  • H2 Perceived Ease of Use for MART has a
    significant effect on the Attitude Towards Using
    MART
  • H3 Perceived Usefulness of MART has a
    significant effect on the Attitude Towards Using
    MART.
  • H4 Perceived Usefulness of MART has a
    significant effect on Intention to Use MART.
  • H5 Attitude towards using MART has a
    significant effect on Intention to Use MART.

28
TAM Question
29
Precursory Analysis of People Taking Test
30
Results from Respondents
  • People responded most positively to the ease of
    use
  • Least positively towards the intention to use

31
Hypothesis Testing
  • After performing Chronbachs alpha to measure
    reliability and PCA to explain variance,
    hypotheses were tested from groups of question
    used in a survey.

32
Hypothesis Testing
33
Hypothesis Testing
34
Explanation of Hypothesis Testing
  • H3 not accepted Strong correlation between
    Perceived Ease of Use and Attitude Towards Using.
    Perceived Ease of Use is more of a contributing
    factor towards the Attitude Towards Using than
    Perceived Usefulness.
  • H5 not accepted Respondents impression about
    open source environment in place to implement
    MART, but was accepted at about 70 CI.

35
Conclusions
  • Increasing schism between data creation and its
    assessment
  • Metadata reinforces quality control and quality
    assurance procedures employed by an organization
  • We need a means so everyone can assess various
    dimensions of metadata in a timely manner
  • MART serves as a way to quantify GIS metadata
  • MART provides a forum so users can interact with
    GIS metadata with an end goal of supporting
    business decisions which ultimately save time and
    money

36
Conclusions
  • Quantitative metadata elements such as FGDC
    compliancy, date and horizontal accuracy can be
    assessed using programming languages such as R
    and Perl
  • Users can search GIS metadata using supervised
    techniques via a web interface
  • Association Rule Mining can be applied to GIS
    metadata
  • If given a choice, users prefer to query GIS
    metadata as opposed to being given results from
    unsupervised techniques
  • Using TAM, 3 out of 5 research hypotheses
    supported at 95 CI
  • Based on user feedback, the implementation or
    need of MART and open source environment within
    their IT was the biggest hindrance to a users
    intention to use MART

37
Discussion
  • Integration of MART with other forms of
    geo-referenced data
  • Remotely sensed data and ortho-imagery
    (Laboratory for Advanced Information Technology
    and Standards)
  • TINs
  • Topologies
  • Relationship Classes
  • Stand-alone tables
  • Metadata and proprietary format
  • Usability with current GIS software
  • VBA / ArcObjects to convert metadata in BLOB
    format to XML for ESRI software
  • Various Accuracies within MART
  • Temporal and Horizontal
  • Attribute
  • Logical Consistency
  • Semantic

38
Discussion
  • Interestingness problem
  • 6,204 rules at support level 2 and confidence of
    .7
  • Cardinality of data
  • 43 attributes ? 6,204 rules
  • 6 attributes ? 73 rules
  • Attributes selected were Data Theme, Location,
    Horizontal Accuracy, Publication Date,
    Responsible Party, Metadata POC
  • TAM Methodology different models and hypotheses
    could be proposed
  • Presentation of unsupervised techniques in a text
    file. Web environment may be more useful
  • Understanding of the open source environment

39
  • Questions or Comments?

40
(No Transcript)
41
(No Transcript)
42
Decomposition of XML File Using Perl
  • metadata"r01_Data_Set_Title" gt
    "idinfo_citation_citeinfo_title"
  • Using the following command
  • traverse all files
  • foreach filename (_at_files)
  • filename s/\\/\//g change all forward
    slashes to back-slashed to allow for proper
    navigation
  • print "\n\n..... Decomposing ",
    basename(filename), " ...........\n"
  • Create structure to traverse XML schema.
    Before going to the next value, however,
  • we need to reset the hash value.
  • tree XMLin(filename)
  • metadata"rMissing" ''
  • metadata"fileName" filename
  • metadata"sMissing" ''
  • foreach key (sort keys SearchList)
  • print "key SearchListkey\n"
  • Item FindItem(SearchListkey)
  • print "Item Item\n\n"

43
National Mapping Accuracy Standards
For Scales 120,000 or greater .033 inches
Scale of Map For Scales 120,000 or lower .02
inches Scale of Map
44
Sample Output from Supervised Techniques
45
Chronbachs Alpha
Chronbachs alpha is computed using the number of
respondents in the set, the variance of the data
and mean of the covariance between all members of
the set. While there is no universal threshold
to determine data consistency, Hair et. al.
(1998) suggested a minimum threshold between .6
and .7. As per Table 10, only 1 of these values
(Perceived Ease of Use) is between .6 and .7
while two of the values (Perceived Usefulness and
Attitude Towards Using) are between .7 and .8.
The Chronbachs Alpha constant for the Intention
to Use component is .807, which is considered
excellent (Nunnally 1978). Given these values,
it can be surmised that the questions posed for
the respondent ser
46
Principal Components
To help understand the individual factors that
contribute to any potential inconsistency,
principal component analysis was performed on
each of the individual questions to help
determine their potential contribution to the
variability of the observed results. Four
factors were calculated, based on the different
components of the research hypotheses to be
tested. After rotation, the Perceived Ease of
Use accounted for 56.33 of the variance. The
Perceived Usefulness components account for
11.93, Attitude Towards Using accounted for
9.24 while the Intention to Use factor accounted
for 7.04. Table 11 shows the items and factor
loadings for the individual factors. Finally,
some other basic correlations were run between
potentially dependent factors such as age and sex
to help determine their potential contribution to
the results. However, no significant correlation
was found between participants age, gender and
even self-described GIS experience versus
dependent variables such as Perceived Ease of
use, Perceived Usefulness, Attitude and Intention
to Use that will be used in the TAM analysis.
47
Potential RS Attributes
About PowerShow.com