Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 - PowerPoint PPT Presentation

About This Presentation
Title:

Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002

Description:

... as El Chich n and Pinatubo volcano eruptions coincided with El Ni o events ... Without the volcano eruption, the El Nino warming would dominate, resulting in ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 15
Provided by: Compu277
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002


1
Chandrika Kamath and Imola K. FodorCenter for
Applied Scientific ComputingLawrence Livermore
National LaboratoryGatlinburg, TNMarch 26-27,
2002
Dimension Reduction and Sampling First SDM ISIC
All-Hands Meeting
UCRL. This work was performed under the auspices
of the U.S. Department of Energy by University of
California Lawrence Livermore National Laboratory
under contract W-7405-Eng-48.
2
The SDM ISIC aims to minimize the effort
researchers spend in managing their data
  • LLNL is participating in several of the tasks,
    including
  • data mining to improve the management of data
  • Problem data from simulations and experiments is
    high dimensional (i.e. many features)
  • Querying the features can help in understanding
    the data
  • but, searching in a high-dimensional space is
    difficult
  • May want to cluster similar objects for efficient
    access
  • but, clustering is expensive in high dimensions

? We plan to address the problem of high
dimensionality using techniques for dimension
reduction and sampling originally developed in
data mining.
3
Our work on dimension reduction will help both
data management and mining
  • Reducing the dimensions will improve
  • searching (task 3.1, LBNL)
  • clustering (task 2.1, ORNL)
  • Dimension reduction is expensive if many data
    items
  • use a sample of the data items
  • techniques for sampling in presence of rare
    events
  • We will focus on climate and high-energy-physics
    data
  • complements work at ORNL (climate), LBNL (HEP)
  • but, techniques applicable to other data as well

? We only report the .8 FTE work funded under
SciDAC however, our data mining research is more
extensive. See www.llnl.gov/casc/sapphire
4
There are two different ways in which we can view
dimension reduction
  • Reduce the number of features representing a data
    item
  • Reduce the number of basis vectors used to
    describe the data if some of the are
    small, they can be ignored

5
Our work on climate data focuses on reducing the
number of basis vectors
  • Domain expert Dr. Benjamin Santer (LLNL climate)
  • Climate scientists are interested in
    understanding the change in the earths surface
    temperature
  • Simulated and observed data are mixtures of
    volcano, El Niño, and other effects
  • Our goal is to separate the signals corresponding
    to different effects
  • traditional approaches such as principal
    component analysis (PCA) have not worked
  • separation difficult as El Chichón and Pinatubo
    volcano eruptions coincided with El Niño events
  • our approach is to use independent component
    analysis (ICA)

? Dimension reduction supporting scientific
discovery
6
The raw data is as monthly temperatures on a
144x73 spatial grid on 17 vertical levels
January 1979 raw temperatures (Kelvin) on the
144x73 latitude by longitude grid at 1000hPa
pressure level. Data from NCEP.
7
Initially, we applied ICA to global monthly mean
anomaly temperatures
17 vertical levels level1 1000hPa, lowest
altitude level17 10hPa, highest altitude
Time series of global monthly mean anomalies, Jan
1979 - Dec 2000
8
Next, we ran experiments with simulated data to
understand the behavior of ICA
mix
(i) Two original sources
(ii) Two mixed signals from the original
ICA
ICA estimates correctly the shapes of the two
independent components (ICs). With additional
processing, we can also estimate the relative
contributions of the two ICs in the two mixed
signals.
(iii) Sources (ICs) recovered from (ii)
9
Original decomposition of the two mixed signals
(-) sine (--) and volcano (-.)
(i) Signal 1
(ii) Signal 2
10
ICA decomposition of the two mixed signals (-)
sine (--) and volcano (-.)
  • After proper post-processing, ICA estimates
    remarkably well the underlying independent
    components and their appropriate contributions in
    the mixed signals

(i) Signal 1
(ii) Signal 2
11
ICA can also separate noise used as an extra
component in the mixing
3 original sources
mix
3 mixed signals
ICA
3 estimated ICs
12
Original decomposition of 3 mixed signals (-) El
Niño (--), volcano (-.), and noise (..)
(i) Signal 1
(ii) Signal 2
(iii) Signal 3
Cooling in global series at the arrow is in fact
a combination of an ENSO warming and a volcano
cooling. Without the volcano eruption, the El
Nino warming would dominate, resulting in warmer
global temperatures.
13
ICA decomposition of 3 mixed signals (-) El Niño
(--), volcano (-.), and noise (..)
(i) Signal 1
(ii) Signal 2
(iii) Signal 3
Although not perfect in terms of the exact
amplitudes, ICA clearly separates the cooling
effect of the volcano from the warming effect of
El Nino.
14
Our future plans include work with HEP data and
collaborators at ORNL and LBNL
  • Complete the work on the climate problem
  • our results with artificial data are encouraging
  • identify appropriate ICA model for climate data
  • Make the ICA software accessible to SciDAC
    scientists
  • Try ICA and other dimension reduction techniques
    in the context of the STAR high-energy-physics
    data
  • reduce number of features
  • investigate sampling to reduce computation
  • collaborate with LBNL (data, searching)
  • Investigate incremental PCA
  • monitor climate simulations using indices based
    on the principal components
  • collaborate with ORNL (data, clustering)
Write a Comment
User Comments (0)
About PowerShow.com