Graphical%20Data%20Mining%20for%20Computational%20Estimation%20in%20Materials%20Science%20Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Graphical%20Data%20Mining%20for%20Computational%20Estimation%20in%20Materials%20Science%20Applications

Description:

This work is supported by the Center for Heat Treating ... Scientific domains: Experiments conducted with given ... Avoid visual clutter. Cater to ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Graphical%20Data%20Mining%20for%20Computational%20Estimation%20in%20Materials%20Science%20Applications


1
Graphical Data Mining for Computational
Estimation in Materials Science Applications
  • Aparna Varde
  • Ph.D. Dissertation
  • August 15, 2006
  • Committee Members
  • Prof. Elke Rundensteiner (Advisor)
  • Prof. Carolina Ruiz
  • Prof. David Brown
  • Prof. Neil Heffernan
  • Prof. Richard Sisson Jr. (Head of Materials
    Science, WPI)

This work is supported by the Center for Heat
Treating Excellence and by Department of Energy
Award DE-FC-07-01ID14197
2
Introduction
  • Scientific domains Experiments conducted with
    given input conditions
  • Results plotted as graphs Good visual depictions
  • Experimental results help in analysis Assist
    decision-making
  • Performing experiment Consumes time and resources

3
Motivating Example
  • Heat Treating of Materials
  • Controlled heating cooling of materials to
    achieve mechanical thermal properties
  • Performing experiments involves
  • One time cost 1000s
  • Recurrent costs 100s
  • Time 5 to 6 hours
  • Human labor
  • Desirable to estimate
  • Graphs given input conditions
  • Conditions to achieve given graph

CHTE Experimental Setup
4
Problem Definition
  • To develop an estimation technique with following
    goals
  1. Given input conditions in an experiment, estimate
    resulting graph

2. Given desired graph in an experiment, estimate
conditions to obtain it
5
Proposed Estimation Approach AutoDomainMine
6
Knowledge Discovery in AutoDomainMine
7
Estimation of Graph in AutoDomainMine
8
Estimation of Conditions in AutoDomainMine
9
Main Tasks
Task 1 AutoDomainMine Learning Strategy of
Integrating Clustering and Classification AAAI-0
6 Poster, ACM SIGARTs ICICIS-05
Task 2 Learning Domain-Specific Distance Metrics
for Graphs ACM KDDs MDM-05, MTAP-06 Journal
Task 3 Designing Semantics-Preserving Representati
ves for Clusters ACM SIGMODS IQIS-06, ACM
CIKM-06
10
Task 2 Learning Domain-Specific Distance
Metrics for Graphs
11
Motivation
  • Various distance metrics
  • Absolute position of points
  • Statistical observations
  • Critical features
  • Issues
  • Not known what metrics apply
  • Multiple metrics may be relevant
  • Need for distance metric learning in graphs

Example of domain-specific problem
12
Proposed Distance Metric Learning Approach
LearnMet
  • Given
  • Training set with actual clusters of graphs
  • Additional Input
  • Components distance metrics applicable to graphs
  • LearnMet Metric
  • D ?wiDi

13
Evaluate Accuracy
  • Use pairs of graphs
  • A pair (ga,gb) is
  • TP - same predicted, same actual cluster (g1,
    g2)
  • TN - different predicted, different actual
    clusters (g2,g3)
  • FP - same predicted cluster, different actual
    clusters (g3,g4)
  • FN - different predicted, same actual clusters
    (g4,g5)

14
Evaluate Accuracy (Contd.)
  • How do we compute error for whole set of graphs?
  • For all pairs
  • Error Measure
  • Failure Rate FR
  • FR (FPFN) / (TPTNFPFN)
  • Error Threshold (t)
  • Extent of FR allowed
  • If (FR lt t) then clustering is accurate

15
Adjust the Metric
  • Weight Adjustment Heuristic for each Di
  • New wi wi sfi (DFNi/DFN DFPi/DFP)
    KDDs MDM-05

16
Evaluation of LearnMet
  • Details MTAP-06
  • Effect of pairs per epoch (ppe)
  • G number of graphs, e.g., 25
  • GC2 total number of pairs, e.g., 300
  • Select subset of GC2 pairs per epoch
  • Observations
  • Highest accuracy with middle range of ppe
  • Learning efficiency best with low ppe
  • Average accuracy with LearnMet 86

Accuracy of Learned Metrics over Test Set
Learning Efficiency over Training Set
17
Task 3 Designing Semantics-Preserving
Representatives for Clusters
18
Motivation
  • Different combinations of conditions could lead
    to a single cluster
  • Graphs in a cluster could have variations
  • Need for designing representatives that
  • Incorporate semantics
  • Avoid visual clutter
  • Cater to various users

19
Proposed Approach for Designing Representatives
DesRept
20
Candidates for Conditions
1. Nearest Representative
Set of conditions in Cluster A
Nearest Representative for Cluster A
  • Return set of conditions closest to all others in
    cluster
  • Notion of distance Domain-specific distance
    metric from decision tree paths CIKM-06

21
Candidates for Conditions (Contd.)
2. Summarized Representative
Cluster A
  • Build sub-clusters of condition using domain
    knowledge
  • Return nearest sub-cluster representatives
  • Sort them

Sub-clusters within the Cluster A
Summarized Representative for Cluster
22
Candidates for Conditions (Contd.)
3. Combined Representative
Cluster A
Combined Representative for Cluster A
  • Return all sets of conditions
  • Sort them in ascending order

23
Candidates for Graphs
1. Nearest Representative
  • Select graph that is nearest neighbor for all
    others
  • Notion of distance Domain-specific metric from
    LearnMet

24
Candidates for Graphs (Contd.)
2. Medoid Representative
  • Select graph closest to average of all graphs
  • Average of y-coordinate values since
    x-coordinates are same

25
Candidates for Graphs (Contd.)
3. Summarized Representative
  • Construct average graph with prediction limits
  • Average centroid, Prediction limits
    domain-specific thresholds

26
Candidates for Graphs (Contd.)
3. Combined Representative
  • Construct superimposed graph of all graphs in
    cluster
  • Same x-values, so plot y-values on a common x-axis

27
Effectiveness Measure for Candidates
  • Minimum Description Length Principle
  • Theory Representative, Examples all items in
    cluster
  • Representative Measure Complexity (ease of
    interpretation)
  • Complexity log2 N for graphs, log2 AV for
    conditions,
  • N number of points to store representative
    graph
  • A number of attributes for conditions,
  • V number of values in representative set of
    conditions
  • Examples Measure distance of items from
    representative (information loss)
  • Distance for graphs log2 (1/G)?i1 to G
    D(r,gi)
  • D distance using domain-specific metric
  • G total number of graphs in cluster
  • gi each graph
  • r representative graph
  • Encoding SIGMOD IQIS-06
  • Effectiveness UBCComplexity UBDDistance

28
Evaluation of DesRept Conditions
  • Details
  • Data Set Size 400, Number of Clusters 20
  • Observations
  • Overall winner is Summarized
  • As weight for complexity increases, Nearest wins
  • Designed better than Random

29
Evaluation of DesRept Graphs
  • Details
  • Data Set Size 400, Number of Clusters 20
  • Observations
  • Overall winner is Summarized
  • As weight for complexity increases, Nearest /
    Medoid wins
  • Designed better than Random

30
User Evaluation of AutoDomainMine System
  • Formal user surveys in different applications
  • Evaluation Process
  • Compare estimation with real data in test set
  • If they match estimation is accurate
  • Observations
  • Estimation Accuracy around 90 to 95

Accuracy Estimating Conditions
Accuracy Estimating Graphs
31
Related Work
  • Similarity Search HK-01, WF-00
  • Non-matching conditions could be significant
  • Mathematical Modeling M-95, S-60
  • Existing models not applicable under certain
    situations
  • Case-based Reasoning K-93, AP-03
  • Adaptation of cases not feasible with graphs
  • Learning nearest neighbor in high-dimensional
    spaces HAK-00
  • Focus is dimensionality reduction, do not deal
    with graphs
  • Distance metric learning given basic formula
    XNJR-03
  • Deal with position-based distances for points, no
    graphs involved
  • Similarity search in multimedia databases KB-04
  • Use various metrics in different applications, do
    not learn a single metric
  • Image Rating HH-01
  • User intervention involved in manual rating
  • Semantic Fish Eye Views JP-04
  • Display multiple objects in small space, no
    representatives
  • PDA Displays in Levels of Detail BGMP-01

32
Summary
  • Dissertation Contributions
  • AutoDomainMine Integrating Clustering and
    Classification for Estimation AAAI-06 Poster,
    ACM SIGARTs ICICIS-05
  • LearnMet Learning Domain-Specific Distance
    Metrics for Graphs ACM KDDs MDM-05, MTAP-06
    Journal
  • DesRept Designing Semantics-Preserving
    Representatives for Clusters ACM SIGMODs
    IQIS-06, ACM CIKM-06
  • Trademarked Tool for Computational Estimation in
    Materials Science ASM HTS-05, ASM HTS-03
  • Future Work
  • Image Mining, e.g., Comparing Nanostructures
  • Data Stream Matching, e.g., Stock Market Analysis
  • Visual Displays, e.g., Summarizing Web
    Information

33
Publications
Dissertation-Related Papers 1. Designing
Semantics-Preserving Representatives for
Scientific Input Conditions, A. Varde, E.
Rundensteiner, C. Ruiz, D. Brown, M. Maniruzzaman
and R. Sisson Jr., In CIKM, Arlington, VA, Nov
2006. 2. Integrating Clustering and
Classification for Estimating Process Variables
in Materials Science. A. Varde, E. Rundensteiner,
C. Ruiz, D. Brown, M. Maniruzzaman and R. Sisson
Jr. In AAAI, Poster Track, Boston, MA, Jul
2006. 3. Effectiveness of Domain-Specific
Cluster Representatives for Graphical Plots. A.
Varde, E. Rundensteiner, C. Ruiz, D. Brown, M.
Maniruzzaman and R. Sisson Jr. In ACM SIGMOD
IQIS, Chicago, IL, Jun 2006. 4. LearnMet
Learning Domain-Specific Distance Metrics for
Plots of Scientific Functions. A. Varde, E.
Rundensteiner, C. Ruiz, M. Maniruzzaman and R.
Sisson Jr. Accepted in the International MTAP
Journal, Springer Publications, Special Issue on
Multimedia Data Mining, 2006. 5. Learning
Semantics-Preserving Distance Metrics for
Clustering Graphical Data. A. Varde, E.
Rundensteiner, C. Ruiz, M. Maniruzzaman and R.
Sisson Jr. In ACM KDD MDM, Chicago, IL, Aug 2005,
pp. 107-112. 6. Apriori Algorithm and
Game-of-Life for Predictive Analysis in Materials
Science. A. Varde, M. Takahashi, E.
Rundensteiner, M. Ward, M. Maniruzzaman and R.
Sisson Jr. In KES Journal, IOS Press,
Netherlands, Vol. 8, No. 4, 2004, pp. 213
228. 7. Data Mining over Graphical Results
of Experiments with Domain Semantics. A. Varde,
E. Rundensteiner, C. Ruiz, M. Maniruzzaman and R.
Sisson Jr. In ACM SIGART ICICIS, Cairo, Egypt,
Mar 2005, pp. 603 611.
34
Publications (Contd.)
  • 8. QuenchMiner Decision Support for
    Optimization of Heat Treating Processes. A.
    Varde, M. Takahashi, E. Rundensteiner, M. Ward,
    M. Maniruzzaman and R. Sisson Jr. In IEEE IICAI,
    Hyderabad, India, Dec 2003, pp. 993 1003.
  • 9. Estimating Heat Transfer Coefficients as
    a Function of Temperature by Data Mining. A.
    Varde, E. Rundensteiner, M. Maniruzzaman and R.
    Sisson Jr. In ASM HTS, Pittsburgh, PA, Sep 2005.
  • 10 . The QuenchMiner Expert System for
    Quenching and Distortion Control. A. Varde, E.
    Rundensteiner, M. Maniruzzaman and R. Sisson Jr.
    In ASM HTS, Indianapolis, IN, Sep 2003, pp. 174
    183.
  • Other Papers
  • 11. MEDWRAP Consistent View Maintenance over
    Distributed Multi-Relation Sources. A. Varde and
    E. Rundensteiner. In DEXA. Aix-en-Provence,
    France, Sep 2002, pp. 341 350.
  • 12. SWECCA for Data Warehouse Maintenance. A.
    Varde and E. Rundensteiner. In SCI, Orlando, FL,
    Jul 2002, Vol. 5, pp. 352 357.
  • 13. MatML XML for Information Exchange with
    Materials Property Data, A. Varde, E. Begley, S.
    Fahrenholz-Mann. In ACM KDD DM-SPP, Philadelphia,
    PA, Aug 2006.
  • 14. Semantic Extensions to Domain-Specific
    Markup Languages. A. Varde, E. Rundensteiner, M.
    Mani, M. Maniruzzaman and R. Sisson Jr. In IEEE
    CCCT, Austin, TX, Aug 2004, Vol. 2, pp. 55 60.

35
Acknowledgments
  • First of all, my Advisor Prof. Elke
    Rundensteiner
  • Committee Prof. Carolina Ruiz,Prof. David Brown,
    Prof Neil Heffernan
  • External Member Prof. Richard D. Sisson Jr.,
    Head of Materials Program
  • Director of Metal Processing Institute Prof.
    Diran Apelian
  • Domain Expert Dr. Mohammed Maniruzzaman
  • Members of Center for Heat Treating Excellence
  • CS Department Head Prof. Michael Gennert
  • Former CS Department Head Prof. Micha Hofri
  • WPI Administration (CS, Materials) In particular
    Mrs. Rita Shilansky
  • Reviewers of Conferences and Journals where my
    papers got accepted
  • Members of DSRG, AIRG, KDDRG and Quenching
    Research Group
  • Colleagues and Friends Shuhui, Sujoy, Viren,
    Olly, Mariana, Rimma, Maged, Bin, Lydia, Shimin
    and others
  • Great Thanks to my Family Parents Dr. Sharad
    Varde and Dr. (Mrs.) Varsha Varde, Grandparents
    Mr. D.A. Varde and Mrs. Vimal Varde, Brother
    Ameya Varde and Sister-in-law Deepa Varde
  • All the attendees of my Ph.D. Defense
  • Finally, God for guiding me throughout my
    doctoral journey

36
Thank You
Write a Comment
User Comments (0)
About PowerShow.com