INTRODUCTION TO SYMBOLIC DATA ANALYSIS - PowerPoint PPT Presentation

About This Presentation
Title:

INTRODUCTION TO SYMBOLIC DATA ANALYSIS

Description:

Title: Symbolic data analysis of complex data Author: diday Last modified by: edwin diday Created Date: 10/20/2011 8:43:50 PM Document presentation format – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 63
Provided by: did49
Category:

less

Transcript and Presenter's Notes

Title: INTRODUCTION TO SYMBOLIC DATA ANALYSIS


1
INTRODUCTION TOSYMBOLIC DATA ANALYSIS
  • E. Diday
  • CEREMADE. ParisDauphine University

TUTORIAL 13 June 2014 Activity Center, Academia
Sinica, Taipei, Taiwan
2
OUTLINE
  • PART 1 BUILDING SYMBOLIC DATA FROM
    STANDARD OR COMPLEX DATA
  • PART 2 SYMBOLIC DATA ANALYSIS
  • Is Symbolic Data Analysis a new paradigm?
  • .PART 3 OPEN DIRECTION OF RESEARH
  • PART 4 SDA SOFTWARES SODAS, SYR and R
  • PART 5 INDUSTRIAL APPLICATIONS

3
PART 1
  • BUILDING SYMBOLIC DATA FROM STANDARD OR COMPLEX
    DATA

4
What is a standard Data Table?
  • It is a set of individuals (i.e. observations)
    described by a set of
  • Numerical variables (as age, weight,..) or
  • Categorical variables (as Nationality, club
    name,).
  • Example

Individuals
Players age height weight Nationality Club Team

Player 1

Messi

Ronaldo

Player n
5
  • What are Complex Data?
  • Any data which cannot be considered as a
  • standard observations x standard variables
  • data table.
  • Example
  • The individuals are Towers of nuclear power
    plants described by
  • Table 1) Observations Cracks .
  • Variables Cracks
    description.
  • Table 2) Observations corrosions.
  • Variables corrosion
    description .
  • Table 3) Observations vertices of a grid.
  • Variables Gap depression
    from the ground.
  • .

6
Why considering classes of individuals as new
individuals?
  • Example
  • if we wish to know what makes a player wins, we
    are interested by a standard data table where the
    individuals are the players (in rows) described
    (in columns) by their standard caracteristic
    variables.
  • If our wish is now to know what makes a team
    wins, we are interested by a data table where the
    teams (in rows) are descibed by caracteristic
    variables of the teams taking care on the
    variability of the players inside each team.
  • The teams can be now considered as new
    individuals of higher level described by symbolic
    variables taking care on the variability of the
    individuals inside each class.

7
From standard data tables to symbolic data tables
Symbolic Data Table describing Teams (i.e.
classes of individuals)
Standard data table describing Football players
(individuals).
in each cell a number (age) or a
category (Nationality)
A symbolic data in each cell (Bar chart age of
the Messi Team)
players X1 Xj
ind1 A

indi Xij

indn
Nationalities Bar chart
Weight interval
Age Bar chart
Some columns are contigency tables
8
SYMBOLIC DATA EXPRESS VARIABILITY INSIDE CLASSES
OF INDIVIDUALS
Here the variation (of weight, nationality, )
concerns the players of each team. Therefore each
cell can contain A number, an interval, a
sequence of categorical values, a sequence of
weighted values as a barchart, a distribution,
THIS NEW KIND OF VARIABLES ARE CALLED
 SYMBOLIC  BECAUSE THEY ARE NOT PURELY
NUMERICAL IN ORDER TO EXPRESS THE INTERNAL
VARIATION INSIDE EACH CLASS.
9
What is the actual failure which has produced the
SDA Paradigm?
  • The failure is that in the actual practice
  • Only the individual kind of observations is
    considered.
  • Therefore these individual observations are only
    described by standard numerical and categorical
    variables.

10
The SDA paradigm shift
  • It is the transition
  • from individual observations described by
    standard variables of numerical or categorical
    values.
  • To classes of individuals (considered as
    higher level observations)
  • Described by symbolic variables, of symbolic
    values (intervals, probability distributions,
    sets of categories or numbers, random
    variables,)
  • taking care on the variability inside the
    classes
  • symbolic values can not be treated as numbers.

11
Building Symbolic Data needs three steps
First Step we have a standard data table TAB1,
where individuals are described by numerical or
categorical random variables Yj .
Second step we have a Table 2 where classes of
individuals are described by random variables Yj
with random variables Yij value.
  • Third step we have a symbolic data table Table
    3 where the random variables Yij are represented
    by
  • Probability distributions, histograms, bar
    charts, percentiles,
  • Intervals Min, Max, interquartil interval etc.
  • Set of numbers or categories
  • Functions as Time Series.

12
VARIABLES
  • Standard variables value
  • numerical (income, profit,),
  • categorical (Countries, Stock-Exchange places,..)
  • Symbolic variables value
  • interval,
  • bar chart,
  • Histogram, etc.

13
Ten examples of Symbolic variables
14
What kind of questions and how are they
structured?
15
How to build symbolic data from standard or
complex data?
  • How to categorize the numerical, ordinal, nominal
    ground variables, in order that the obtained
    symbolic histograms or barchart variables for
    each class?
  • First find the discretisation which
    discriminates as well as possible these classes.
  • Second or simultaneously Maximize the
    correlation between the bins.

16
  • SOME ADVANTAGES of SYMBOLIC DATA
  • Work at the needed level of generality without
    loosing variability.
  • Reduce simple or complex huge data.
  • Reduce number of observations and number of
    variables.
  • Reduce missing data.
  • Ability to extract simplified knowledge and
    decision from complex data.
  • Solve confidentiality (classes are not
    confidential as individuals).
  • Facilitate interpretation of results decision
    trees, factorial analysis new graphic kinds.
  • Extent Data Mining and Statistics to new kinds of
    data with much industrial applications.

17
PART 2SYMBOLIC DATA ANALYSIS
18
SYMBOLIC DATA ANALYSIS TOOLS HAVE BEEN DEVELOPPED
  • Graphical visualisation of Symbolic Data
  • Correlation, Mean, Mean Square, distribution of
    a symbolic variables.
  • Dissimilarities between symbolic descriptions
  • Clustering of symbolic descriptions
  • S-Kohonen Mappings
  • S-Decision Trees
  • S-Principal Component Analysis
  • S-Discriminant Factorial Analysis
  • S-Regression
  • Etc...


19
From standard observations to classes, the
correlation is not the same!
Y2
x
x
x
x
Y1
  • Observations data are uniformly distributed in
    the circle
  • no correlation between Y1 and Y2 for intial
    observations data.
  • A correlation appears between the two variables
    for the centers of a given partition in 4 classes.

20
WHY SYMBOLIC DATA CANNOT BE REDUCED TO A
CLASSICAL STANDARD DATA TABLE?
Symbolic Data Table
Players category Weight Size Nationality
Very good 80, 95 1.70, 1.95 0.7 Eur, 0.3 Afr
Transformation in classical data
Players category Weight Min Weight Max Size Min Size Max Eur Afr
Very good 80 95 1.70 1.95 0. 7 0.3
Concern The initial variables are lost
and the variation is lost!
21
Divisive Clustering or Decision tree
Symbolic Analysis
Classical Analysis
Weight
Max Weight
22
PCA and NETWORK OF BAR CHART DATAof 30 Iris
Fisher Data Clusters
Any symbolic variable (set of bins variables) can
be projected. Here the species variable.
SYROKKO Company afonso_at_syrokko.com
23
The Symbolic Variables contributions are inside
the smallest hyper cube containing the
correlation sphere of the bins
24
Numerical versus symbolical space of
representation
(Y1(Ci ), Y2(Ci )) (a1i , b1i , (a2i , b2i
)
Numerical representation of interval variables
Bi-plot of interval variables
b1
Ci
b2
Ci
x
x
a1
a2
25
Bi-plot of histogram variables
  • The joint probability can be inferred by a copula
    model

Copula
26
PART 3 OPEN DIRECTION OF RESEARH
  • Models of models
  • Law of parameters of laws
  • Laws of vectors of laws.
  • Copulas needed.
  • Four general convergence theorem.
  • Optimisation in non supervised learning
    (hierarchical and pyramidal clustering).

27
From lower level of individual observation to
higher level observation of classes higher
level models are needed
Table 1
Table 2
Individual X1 Xj
ind1

Messi Xij

indn
A symbolic data (age of Messi team)
A number (age of Messi)
  • Xj is a standard random numerical variable
  • Xj is a random variable with histogram value
  • Question if the law of Xj is given what is the
    law of Xj ? (Dirichlet models useful).

28
Why using copula models in Symbolic Data
Analysis?
  • f(i, j, j) is the joint probability of the
    variables j and j for the individual i.
  • In case of independency , we have
  • f(i, j, j) f(i, j). f(i, j),
  • If there is no dépendancy
  • f(i, j, j) Copula(f(i, j). f(i, j))
  • Aim of Copula model in SDA
  • find the Copula which minimises the difference
    with the joint.
  • In order to avoid the restriction to independency
    hypotheses and to reduce the cost of f(i, j, j)
    computing.

29
FOUR THEOREM TO BE PROVED FOR ANY EXTENDED METHOD
TO SYMBOLIC DATA.
M(n, k) is supposed to be a SDA method where k is
the number of classes obtained on n initial
individuals THEOREME 1  If the k classes are
fixed and n tends towards infinity, then M(n, k)
converges towards a stable position. THEOREME 2 
If k increases until getting a single individual
by class, then M(n, k) converges towards a
standard one. THEOREME 3  I k and n increases
simulataneously towards infinity, then M(n, k)
converges towards a stableposition. THEOREME 4 If
the k laws associated to the k classes are
considered as a sample of a law of laws, then
M(n, k) applied to this sample converges to M(n,
k) applied to this law. Exemples  Théorème 1
il a été démontré dans Diday, Emilion (CRAS,
Choquet 1998), pour les treillis de Galois à
mesure que la taille de la population augmente
les classes (décrites par des vecteurs de
distributions), sorganisent dans un treillis de
Galois qui converge. Emilion (CRAS, 2002) donne
aussi un théorème dans le cas de mélanges de lois
de lois utilisant les martingales et un modèle de
Dirichlet. Théorème 2 Par ex, lACP classique
MO est un cas particulier de lACP notée M(n, k)
construite sur les vecteurs dintervalles. Théorèm
e 3 cest le cadre de données qui arrivent
séquentiellement (de type  Data Stream ) et
des algorithmes de type one pass (voir par ex
Diday, Murty (2005)). Théorème 4 Dans le cas
d'une classification hiérarchique ou pyramidale
2D, 3D etc. la convergence signifie que les
grands paliers et leur structure se stabilisent.
Dans le cas dune ACP la convergence signifie que
les axes factoriels se stabilisent.
30
Optimisation in clustering
d is the given dissimilarity
Ultrametric dissimilarity U
Hierarchies
W d - U
Each class is described by symbolic data
Pyramides
Robinsonian dissimilarity R
3D Spatial Pyramid
S1
W d - R
S2
Yadidean dissimilarity Y
C3
C2
A 1 B1 C1
W d - Y
31
PART 4 SDA SOFTWARES
  • SODAS
  • RSDA
  • SYR

32
SoftwareTo build symbolic data from standard or
complex data and analyze symbolic data, different
software packages exist today.SODAS - academic
free package, though registration required and a
code needed for installation, http//www.info.fund
p.ac.be/asso/sodaslink.htmMuch Symbolic data
data bases can be found at http//www.ceremade.dau
phine.fr/SODAS/ RSDA academic free packages
are available on CRAN oldemar.rodriguez_at_gmail.co
mSYR professional package, see
afonso_at_syrokko.com
33
SODAS SOFTWARE
CARTE DE KOHONEN DE CONCEPTS
ANALYSE FACTORIELLE ACP de variables à valeur
intervalle

Superposition de deux deux étoîles associées à
deux classes de la pyramides
Arbre de décision sur variables à valeur
histogramme ou intervalle
The objective of SCLUST is the clustering of
symbolic objects by a dynamic algorithm based on
symbolic data tables. The aim is to build a
partition of SOs into a predefined number of
classes. Each class has a prototype in the form
of a SO. The optimality criterion used is based
on the sum of proximities between the individuals
and the prototypes of the clusters.
Pyramide classifiante
34
FROM DATA BASE TO SYMBOLIC DATA IN SODAS
Individuals
Classes
Relational Data Base
QUERY
Description of individuals
Columns symbolic variables
Classes
Class description
Symbolic Data Table
Cells contain Symbolic Data
35
SYR SOFTWARE
  • Produce a Symbolic Data Table from complex data.
  • Manage Symbolic Data Tables sort rows and
    columns by discriminant power
  • Analyse Symbolic data tables SPCA,Sclustering
  • Produce network, rules and decision trees.

36
SYR SYMBOLIC DATA TABLE MANAGEMENT
SYMBOLIC DATA TABLE
  • Sorting rows by min, max of intervals or
    frequencies of barchart is possible.
  • Sorting variables by discriminate power of the
    concepts is also possible.

SYROKKO Company eliezer_at_syrokko.com
37
PART 5 INDUSTRIAL APPLICATIONS
38
Time Series Data table Anomaly detection on a
bridge LCPC (Laboratoire Central Des Ponts et
Chaussées) and SNCF Data
Sensor 1 Sensor 2
Sensor 3 .
Sensor N
Trains
39
HIERARCHICAL DATA
  • Symbolic procedure
  • From numerical description of pigs to symbolic
    description of Farms
  • Numerical variables
  • and
  • Categorical variables
  • are transformed in Bar Chart of the
    frequencies based on 30 animals,
  • Or in interval value variables

19 variables
Description of pig respiratory diseases
125 farms x 30 animals
Median score (continuous var.)
Animal frequencies (categorical var.)
64 variables
Description of pig respiratory diseases
125 farms
C. Fablet, S. Bougeard (AFSSA)
40
Step 1 Symbolic Description of Farms
SYROKKO Company afonso_at_syrokko.com
41
Nuclear Power PlantFind Correlations Between 3
Standard Data Tables of Different observation
units and different Variables
42
NUCLEAR POWER PLANT Nuclear thermal power station
Inspection
PB FIND CORRELATIONS BETWEEN 3 CLASSICAL DATA
TABLES OF DIFFERENT UNITS AND VARIABLES Table 1)
Observations Cracks . Variables Cracks
description. Table 2) Observations vertices of a
grid. Variables Gap deviation at different
periods compared to the initial model position.
Table 3) Observations vertices of a grid.
Variables Gap depression from the ground. ARE
Transformed in ONE Symbolic Data Table where the
classes the towers. On this new table SDA can be
applied.
43
FROM COMPLEX DATA TO SYMBOLIC DATA
44
Towers on PCA first axes
  • PCA on chooosen symbolic variables
  • Three clusters.visualisation
  • Interval and bar chart variables can be seen..
  • A network of the strongest links can be
    represented.

NETSYR results (SYR software)
45
Symbolic variables projection inside the
hypercube of the correlation sphere
46
Telephone calls text mining in order to discover
themes without using semantic
INITIAL DATA 2 814 446 rows
Documents Words
Doc1 bonjour
Doc1 oui
Doc1 monsieur

Doc2 panne
  • Each calling session is called a document.
  • We start after lemmatisation with a table of
  • 31454 documents
  • 2258 words

Correspondence between documents and words.
47
First Stepsbuilding overlapping clusters of
documents and words CLUSTSYR
70 x 2258

2 814 446 rows Correspondence documents, words
31454 documents x 2258 words
70 Overlapping Clusters of Documents described by
the tf-idf of 2258 words.
2258 x 70
80 x 70
80 overlapping clusters of words described by
their tf-idf in the 70 clusters of Docs.
2258 Words described by their tf-idf on the 70
clusters of Docs.
48
Next step STATSYREach cluster of documents
is described by the 80 clusters of words called
themes
Themes
Classes of documents
WORDS in Each Theme
49
GRAPHICAL REPRESENTATIONby NETSYR from SYR
software
GRAPHICAL REPRESENTATION of themes , document
classes, by Pie Charts And their Bar chart
description. Overlapping Clusters SOCIAL
NEWORK Based on dissimilarities ANNOTATION of
Themes and Document classes Moving, Zooming
We obtain finally a clear representation of the
main themes , their classes and their links
failures, budget,addresses, vacation etc..
50
A Survey on Security
  • A sample of people of three regions (Vex, Val,
    Plai) have answered to three questions
  • Gender M or W,
  • Security priority to
  • Fight Against Unemployment (FAU),
  • Juvenile Delinquency (JD)
  • Drug addict (D)),
  • Death penalty (Yes or No).

Gender, Security , D. Penalty are   barchart
value variables  M, W, FAU, JDare  bins 
51
From barchart symbolic variables to Metabin
latent variables
Region Gender Gender Insecurity Insecurity Insecurity Death Penalty Death Penalty
- M W FAU JD D Yes No
Vex 0.8 0.2 0.4 0.5 0.1 0.5 0.5
Val 0.7 0.3 0.5 0.2 0.3 0.4 0.6
Plai 0.3 0.7 0.7 0.1 0.2 0.1 0.9
Table 1 Initial bar chart data table
Region S1cor S1cor S1cor S2cor S2cor S2cor S3cor S3cor S3cor
M JD Yes W FAU No NU D NU
Vex 0.8 0.5 0.5 0.2 0.4 0.5 NU 0.1 NU
Val 0.7 0.2 0.4 0.3 0.5 0.6 NU 0.3 NU
Plai 0.3 0.1 0.1 0.7 0.7 0.9 NU 0.2 NU
Table 2 Metabin latent variables
52
CONCLUSION
  • If you have standard units described by numerical
    and (or) categorical variables, these variables
    induce classes described by symbolic variables
    taking care of their internal variation. Then SDA
    can be applied on these new units in order to get
    complementary and enhancing results by extending
    standard analysis to symbolic analysis.
  • Symbolic data have to be build from given
    standard or complex data.
  • Symbolic data cannot be reduced to standard data.
  • Complex data can be simplified in symbolic data.
  • Big Data bases can be reduced in symbolic data
  • Symbolic data are not only distributions, they
    are the numbers of the future.

53
Références
  • Basic books and papers
  • Bock H.H., Diday E. (editors and co-authors) (
    2000) Analysis of Symbolic Data.Exploratory
    methods for extracting statistical information
    from complex data. Springer Verlag, Heidelberg,
    425 pages, ISBN 3-540-66619-2.
  • L. Billard, E. Diday (2003) "From the statistics
    of data to the statistic of knowledge Symbolic
    Data Analysis". JASA . Journal of the American
    Statistical Association. Juin, Vol. 98, N 462.
  • E. Diday, M. Noirhomme (eds and co-authors)
    (2008) Symbolic Data Analysis and the SODAS
    software. 457 pages. Wiley. ISBN
    978-0-470-01883-5.
  • Billard, L. and Diday, E. (2006). Symbolic Data
    Analysis Conceptual Statistics and Data Mining.
    321 pages. Wiley series in computational
    statistics. Wiley, Chichester, ISBN
    0-470-09016-2.
  • Noirhomme-Fraiture, M. and Brito, P. (2012) Far
    beyond the classical data models symbolic data
    analysis. Statistical Analysis and Data Mining 4
    (2), 157-170.
  • Lazare N. (2013) "Symbolic Data Analysis". CHANCE
    magazine. Editors Letter Vol. 26, No. 3.

54
Building Symbolic Data and representation
Referencies
  • Stéphan V., Hébrail G.,Lechevallier Y. (2000)
     Generation of symbolic objects from relationnal
    data base . Chapter in book Analysis of
    Symbolic Data Exploratory Methods for Extracting
    Statistical Information from Complex Data (eds.
    H.-H.Bock and E. Diday). Springer-Verlag, Berlin,
    103-124.
  • Chiun-How, K., Chih-Wen, O., Yin-Jing, T.,
    Chuan-kai, Yang, Chun-houh, Chen (2012) A
    Symbolic Database for TIMSS. Arroyo J., Maté
    C., Brito P. Noihomme M. eds, 3rd Workshop in
    Symbolic Data Analysis. Universidad Compiutense
    de Madrid. http//www.sda-workshop.org/.
  • E. Diday, F. Afonso, R. Haddad (2013) The
    symbolic data analysis paradigm, discriminate
    discretization and financial application. In
    Advances in Theory and Applications of High
    Dimensional and Symbolic Data Analysis, HDSDA
    2013. Revue des Nouvelles Technologies de
    l'Information vol. RNTI-E-25, pp. 1-14

55
SOME SYMBOLIC DATA ANALYSIS REFERENCIES
  • In Pricipal Component Analysis
  • Cazes P., Chouakria A., Diday E., Schektman Y.
    (1997). Extension de lanalyse en composantes
    principales à des données de type intervalle,
    Rev. Statistique Appliquées, Vol. XLV Num. 3, pp.
    5-24, France. 29.
  • Cazes P. (2002) Analyse factorielle dun tableau
    de lois de probabilité. Revue de statistique
    appliquée, tome 50, n0 3.
  • Diday E. (2013) "Principal Component Analysis for
    bar charts and Metabins tables". Statistical
    Analysis and Data Mining. Article first published
    online 20 May 2013. DOI 10.1002/sam.11188. 2013
    Wiley. Statistical Analysis and Data Mining,6,5,
    403-430.
  • Ichino, M. (2011). The quantile method for
    symbolic principal component analysis.
    Statistical Analysis and Data Mining, Wiley.
    184-198.
  • Makosso-Kallyth S. and Diday E. (2012) Adaptation
    of interval PCA to symbolic histogram variables.
    Advances in Data Analysis and Classification
    (ADAC). July, Volume 6, Issue 2, pp 147-159.
  • Rademacher, J., Billard , L., (2012) Principal
    component analysis for interval data. Wiley
    interdisciplinary Reviews Computational
    Statistics .Volume 4, Issue 6, pp. 535540.
  • Shimizu N., Nakano J. (2012) Histograms Principal
    Component Analysis. Arroyo J., Maté C., Brito P.
    Noihomme M. eds, 3rd Workshop in Symbolic Data
    Analysis. Universidad Compiutense de Madrid.
    http//www.sda-workshop.org/
  • Wang H., Guan R., Wu J. (2012a). CIPCA
    Complete-Information-based Principal Component
    Analysis for interval-valued data,
    Neurocomputing, Volume 86, Pages 158-169.

56
Symbolic Data Analysis references
  • In Symbolic Forecasting
  • Arroyo, J. and Maté, C. (2009). Forecasting
    histogram time series with k-nearest neighbors'
    methods. International Journal of Forecasting 25,
    192207.
  • García-Ascanio, C. Maté, C. (2010). Electric
    power demand forecasting using interval time
    series A comparison between VAR and iMLP. Energy
    Policy 38, 715-725
  • Han, A., Hong, Y., Lai, K.K., Wang, S. (2008).
    Interval time series analysis with an application
    to the sterling-dollar exchange rate. Journal of
    Systems Science and Complexity, 21 (4), 550-565.
  • He, L.T. and C. Hu (2009). Impacts of Interval
    Computing on Stock Market Variability
    Forecasting. Computational Economics 33, 263-276.
  • In Symbolic rule extraction
  • Afonso, F. et Diday, E. (2005). Extension de
    lalgorithme Apriori et des regles dassociation
    aux cas des donnees symboliques diagrammes et
    intervalles. Revue RNTI, Extraction et Gestion
    des Connaissances (EGC 2005), Vol. 1, pp 205-210,
    Cepadues, 2005.

57
Symbolic Data Analysis referencies
  • In Symbolic Decision Tree
  • Ciampi, A., Diday, E., Lebbe, J., Perinel, E. et
    Vignes, R. (2000). Growing a tree classifier with
    imprecise data. Pattern Recognition letters 21
    787-803.
  • Mballo C., Diday E. (2006)  The criterion of
    Smirnov-Kolmogorov for binary decision tree 
    application to interval valued variables.
    Intelligent Data Analysis. Volume 10, Number 4 .
    pp 325 341
  • Winsberg S., Diday E., Limam M. (2006). A tree
    structured classifier for symbolic class
    description. Compstat 2006. Physica-Verlag.
  • Bravo, M. et Garcia-Santesmases, J. (2000).
    Symbolic Object Description of Strata by
    Segmentation Trees, Computational Statistics,
    1513-24, Physica-Verlag.

58
Symbolic Data Analysis references
  • In Clustering
  • De Carvalho F., Souza R., Chavent M., and
    Lechevallier Y. (2006) Adaptive Hausdorff
    distances and dynamic clustering of symbolic
    interval data. Pattern Recognition Letters Volume
    27, Issue 3, February 2006, Pages 167-179.
  • De Souza R.M.C.R, De Carvalho F.A.T. (2004).
    Clustering of interval data based on City-Block
    distances. Pattern Recognition Letters, 25,
    353365.
  • Diday E. (2008) Spatial classification. DAM
    (Discrete Applied Mathematics) Volume 156, Issue
    8, Pages 1271-1294.
  • Diday, E., Murty, N. (2005) "Symbolic Data
    Clustering" in Encyclopedia of Data Warehousing
    and Mining . John Wong editor . Idea Group
    Reference Publisher.
  • Irpino, A. and Verde, R. (2008) Dynamic
    clustering of interval data using a
    Wasserstein-based distance. Pattern Recognition
    Letters 29, 1648-1658.
  • In Multidimensional Scaling
  • Terada, Y., Yadohisa, H. (2011) Multidimensional
    scaling with hyperbox model for percentile
    dissimilarities, In Watada, J., Phillips-Wren,
    G., Jain, L. C., and Howlett, R. J. (Eds.)
    Intelligent Decision Technologies Springer
    Verlag, 779788
  • Groenen, P.J.F.,Winsberg, S., Rodriguez, O.,
    Diday, E. (2006). I-Scal Multidimensional
    scaling of interval dissimilarities.
    Computational Statistics and Data Analysis 51,
    360378.

59
Some Symbolic Data Analysis references
  • In Self Organizing map
  • Hajjar C., Hamdan H. (2011). Self-organizing map
    based on L2 distance for interval-valued data. In
    SACI 2011, 6th IEEE International Symposium on
    Applied Computational Intelligence and
    Informatics (Timisoara, Romania), pp. 317322.P.
  • In Dissimilarities between Symbolic Data
  • Kim, J. and Billard, L. (2013) Dissimilarity
    measures for histogram-valued observations,
    Communications in Statistics-Theory and Method,
    42, 283-303.
  • Verde, R., Irpino, A. (2010). Ordinary Least
    Squares for Histogram Data Based on Wasserstein
    Distance, in Proc. COMPSTAT2010, Y.
    Lechevallier and G.Saporta (Eds).PP.581-589.
    Physica Verlag Heidelberg.

60
Some Symbolic Data Analysis references
  • In Regression and Canonical analysis extended to
    Symbolic Data
  • Dias, S., Brito, P., (2011). A New Linear
    Regression Model for Histogram-Valued Variables.
    In Proceedings of the 58th ISI World Statistics
    Congress (Dublin, Ireland).
  • Lauro, C., Verde, R. , Irpino, A. (2008).
    Generalized canonical analysis, in Symbolic Data
    Analysis and the Sodas Software, E. Diday and M.
    Noirhomme. Fraiture (Eds.), 313-330, Wiley,
    Chichester.
  • Tenenhaus A., Diday E., Emilion R., Afonso F.
    (2013) Regularized General Canonical Correlation
    Analysis Extended To Symbolic Data. ADAC
    (publication on the way).
  • Neto, E.A, De Carvalho F.A.T. (2010). Constrained
    linear regression models for symbolic
    interval-valued variables. Computational
    Statistics and Data Analysis 54, 333-347.
  • Wang H., Guan R., Wu J. (2012c). Linear
    regression of interval-valued data based on
    complete information in hypercubes, Journal of
    Systems Science and Systems Engineering, Volume
    21, Issue 4, Page 422-442.

61
Some Symbolic Data Models referencies
  • P. Bertrand, F. Goupil (2000)  Descriptive
    Statistics for symbolic data . In H.H. Bock, E.
    Diday (Eds) Analysis of Symbolic
    Data . Springer-Verlag, pp. 106-124. 
  • Brito, P. and Duarte Silva, A.P. (2012).
    Modelling interval data with Normal and
    Skew-Normal distributions. Journal of Applied
    Statistics, 39 (1), 3-20.
  • E. Diday, M. Vrac (2005) "Mixture decomposition
    of distributions by Copulas in the symbolic data
    analysis framework". Discrete Applied Mathematics
    (DAM). Volume 147, Issue1, 1 April, pp. 27-41.
  • E. Diday (2011) Modélisation de données
    symboliques et application au cas des
    intervalles. Journées Nationales de la Société
    Francophone de Classification. Orléans
  • E. Diday (2002) From Schweizer to Dempster
    mixture decomposition of distributions by copulas
    in the symbolic data analysis framework IPMU
    2002, July, Annecy, France
  • Diday E., Emilion R. (1997) "Treillis de Galois
    Maximaux et Capacités de Choquet" . C.R. Acad.
    Sc. t.325, Série 1, p 261-266. Présenté par G.
    Choquet en Analyse Mathématiques
  • Diday E., R. Emilion (2003) Maximal and
    stochastic Galois lattices. Discrete appliedMath.
    Journal. Vol. 27 (2), pp. 271-284.
  • Emilion R., Classification et mélanges de
    processus. C.R. Acad. Sci. Paris, 335, série I,
    189-193 (2002).
  • Emilion R., Unsupervised Classification and
    Analysis of objects described by nonparametric
    probability distributions. Statistical Analysis
    and Data Mining (SAM), Vol 5, 5, 388-398 (2012).
  • J. Le-Rademacher, L. Billard (2011) Likelihood
    functions and some maximum likelihood estimators
    for symbolic data. Journal of Statistical
    Planning and Inference 141 15931602. Elsevier.
  • T. Soubdhan, R. Emilion, R. Calif (2009)
    Classification of daily solar radiation
    distributions. Solar Energy 83 (2009)
    10561063. Elsevier.

62
Some SDA Industrial Applications
  • Afonso F., Diday E., Badez N., Genest Y. (2010)
    Symbolic Data Analysis of Complex Data
    Application to nuclear power plant. COMPSTAT2010
    , Paris.
  • Bezerra B., Carvalho F. (2011) Symbolic data
    analysis tools for recommendation systems. Knowl.
    Inf. Syst 01/2011 26385-418. DOI10.1007/s10115-
    009-0282-3.
  • Bouteiller V., Toque C., A., Cherrier J-F.,
    Diday E., Cremona C. (2011) Non-destructive
    electrochemical characterizations of reinforced
    concrete corrosion basic and symbolic data
    analysis. Corros Rev . Walter de Gruyter Berlin
    Boston. DOI 10.1515/corrrev-2011-002.
  • Courtois, A., Genest, G., Afonso, F., Diday, E.,
    Orcesi, A., (2012) In service inspection of
    reinforced concrete cooling towers EDFs
    feedback ,IALCCE 2012, Vienna, Austria
  • Cury, A., Crémona, C., Diday, E. (2010).
    Application of symbolic data analysis for
    structural modification assessment. Engineering
    Structures Journal. Vol 32, pp 762-775.
  • Christelle Fablet, Edwin Diday, Stephanie
    Bougeard, Carole Toque, Lynne Billard (2010).
    Classification of Hierarchical-Structured Data
    with Symbolic Analysis. Application to Veterinary
    Epidemiology. COMPSTAT2010 , Paris.
  • Haddad R., Afonso F., Diday E., (2011) Approche
    symbolique pour l'extraction de thématiques
    Application à un corpus issu d'appels
    téléphoniques. In actes des XVIIIèmes Rencontres
    de la Sociéte francophone de Classification.
    Université d'Orléans
  • Laaksonen, S. (2008). Peoples Life Values and
    Trust Components in Europe - Symbolic Data
    Analysis for 20-22 Countries. In. Edwin Diday and
    Monique Noirhomme-Fraiture, Symbolic Data
    Analysis and the SODAS Software", Chapter 22, pp.
    405-419. Wiley and Sons Chichester, UK.
  • Quantin C., Billard L., Touati M., Andreu N.,
    Cottin Y., Zeller M., Afonso F., Battaglia G.,
    Seck D., Le Teuff G., and Diday E.. (2011)
    Classification and Regression Trees on Aggregate
    Data Modeling An Application in Acute Myocardial
    Infarction. Journal of Probability and Statistics
    Volume 2011 (2011), 19 pages.
  • Terraza V, Toque C. (2013) Mutual Fund Rating
    A Symbolic Data Approach. In "Understanding
    Investment Funds Insights from Performance and
    Risk Analysis". Edited by Virginie Terraza and
    Hery Razafitombo . Economics Finance Collection
    2013. The Palgrave Macmilan editor. UK.
  • He, L.T. and C. Hu (2009). Impacts of Interval
    Computing on Stock Market Variability
    Forecasting. Computational Economics 33, 263-276.
  • E. Diday, F. Afonso, R. Haddad (2013) The
    symbolic data analysis paradigm, discriminate
    discretization and financial application, in
    Advances in Theory and Applications of High
    Dimensional and Symbolic Data Analysis, HDSDA
    2013. Revue des Nouvelles Technologies de
    l'Information vol. RNTI-E-25, pp. 1-14
  • Han, A., Hong, Y., Lai, K.K., Wang, S. (2008).
    Interval time series analysis with an application
    to the sterling-dollar exchange rate. Journal of
    Systems Science and Complexity, 21 (4), 550-565.
Write a Comment
User Comments (0)
About PowerShow.com