INTRODUCTION TOSYMBOLIC DATA ANALYSIS

- E. Diday
- CEREMADE. ParisDauphine University

TUTORIAL 13 June 2014 Activity Center, Academia

Sinica, Taipei, Taiwan

OUTLINE

- PART 1 BUILDING SYMBOLIC DATA FROM

STANDARD OR COMPLEX DATA - PART 2 SYMBOLIC DATA ANALYSIS
- Is Symbolic Data Analysis a new paradigm?
- .PART 3 OPEN DIRECTION OF RESEARH
- PART 4 SDA SOFTWARES SODAS, SYR and R
- PART 5 INDUSTRIAL APPLICATIONS

PART 1

- BUILDING SYMBOLIC DATA FROM STANDARD OR COMPLEX

DATA

What is a standard Data Table?

- It is a set of individuals (i.e. observations)

described by a set of - Numerical variables (as age, weight,..) or
- Categorical variables (as Nationality, club

name,). - Example

Individuals

Players age height weight Nationality Club Team

Player 1

Messi

Ronaldo

Player n

- What are Complex Data?
- Any data which cannot be considered as a
- standard observations x standard variables
- data table.
- Example
- The individuals are Towers of nuclear power

plants described by - Table 1) Observations Cracks .
- Variables Cracks

description. - Table 2) Observations corrosions.
- Variables corrosion

description . - Table 3) Observations vertices of a grid.
- Variables Gap depression

from the ground. - .

Why considering classes of individuals as new

individuals?

- Example
- if we wish to know what makes a player wins, we

are interested by a standard data table where the

individuals are the players (in rows) described

(in columns) by their standard caracteristic

variables. - If our wish is now to know what makes a team

wins, we are interested by a data table where the

teams (in rows) are descibed by caracteristic

variables of the teams taking care on the

variability of the players inside each team. - The teams can be now considered as new

individuals of higher level described by symbolic

variables taking care on the variability of the

individuals inside each class.

From standard data tables to symbolic data tables

Symbolic Data Table describing Teams (i.e.

classes of individuals)

Standard data table describing Football players

(individuals).

in each cell a number (age) or a

category (Nationality)

A symbolic data in each cell (Bar chart age of

the Messi Team)

players X1 Xj

ind1 A

indi Xij

indn

Nationalities Bar chart

Weight interval

Age Bar chart

Some columns are contigency tables

SYMBOLIC DATA EXPRESS VARIABILITY INSIDE CLASSES

OF INDIVIDUALS

Here the variation (of weight, nationality, )

concerns the players of each team. Therefore each

cell can contain A number, an interval, a

sequence of categorical values, a sequence of

weighted values as a barchart, a distribution,

THIS NEW KIND OF VARIABLES ARE CALLED

SYMBOLIC BECAUSE THEY ARE NOT PURELY

NUMERICAL IN ORDER TO EXPRESS THE INTERNAL

VARIATION INSIDE EACH CLASS.

What is the actual failure which has produced the

SDA Paradigm?

- The failure is that in the actual practice
- Only the individual kind of observations is

considered. - Therefore these individual observations are only

described by standard numerical and categorical

variables.

The SDA paradigm shift

- It is the transition
- from individual observations described by

standard variables of numerical or categorical

values. - To classes of individuals (considered as

higher level observations) - Described by symbolic variables, of symbolic

values (intervals, probability distributions,

sets of categories or numbers, random

variables,) - taking care on the variability inside the

classes - symbolic values can not be treated as numbers.

Building Symbolic Data needs three steps

First Step we have a standard data table TAB1,

where individuals are described by numerical or

categorical random variables Yj .

Second step we have a Table 2 where classes of

individuals are described by random variables Yj

with random variables Yij value.

- Third step we have a symbolic data table Table

3 where the random variables Yij are represented

by - Probability distributions, histograms, bar

charts, percentiles, - Intervals Min, Max, interquartil interval etc.
- Set of numbers or categories
- Functions as Time Series.

VARIABLES

- Standard variables value
- numerical (income, profit,),
- categorical (Countries, Stock-Exchange places,..)
- Symbolic variables value
- interval,
- bar chart,
- Histogram, etc.

Ten examples of Symbolic variables

What kind of questions and how are they

structured?

How to build symbolic data from standard or

complex data?

- How to categorize the numerical, ordinal, nominal

ground variables, in order that the obtained

symbolic histograms or barchart variables for

each class? - First find the discretisation which

discriminates as well as possible these classes. - Second or simultaneously Maximize the

correlation between the bins.

- SOME ADVANTAGES of SYMBOLIC DATA
- Work at the needed level of generality without

loosing variability. - Reduce simple or complex huge data.
- Reduce number of observations and number of

variables. - Reduce missing data.
- Ability to extract simplified knowledge and

decision from complex data. - Solve confidentiality (classes are not

confidential as individuals). - Facilitate interpretation of results decision

trees, factorial analysis new graphic kinds. - Extent Data Mining and Statistics to new kinds of

data with much industrial applications.

PART 2SYMBOLIC DATA ANALYSIS

SYMBOLIC DATA ANALYSIS TOOLS HAVE BEEN DEVELOPPED

- Graphical visualisation of Symbolic Data
- Correlation, Mean, Mean Square, distribution of

a symbolic variables. - Dissimilarities between symbolic descriptions
- Clustering of symbolic descriptions
- S-Kohonen Mappings
- S-Decision Trees
- S-Principal Component Analysis
- S-Discriminant Factorial Analysis
- S-Regression
- Etc...

From standard observations to classes, the

correlation is not the same!

Y2

x

x

x

x

Y1

- Observations data are uniformly distributed in

the circle - no correlation between Y1 and Y2 for intial

observations data. - A correlation appears between the two variables

for the centers of a given partition in 4 classes.

WHY SYMBOLIC DATA CANNOT BE REDUCED TO A

CLASSICAL STANDARD DATA TABLE?

Symbolic Data Table

Players category Weight Size Nationality

Very good 80, 95 1.70, 1.95 0.7 Eur, 0.3 Afr

Transformation in classical data

Players category Weight Min Weight Max Size Min Size Max Eur Afr

Very good 80 95 1.70 1.95 0. 7 0.3

Concern The initial variables are lost

and the variation is lost!

Divisive Clustering or Decision tree

Symbolic Analysis

Classical Analysis

Weight

Max Weight

PCA and NETWORK OF BAR CHART DATAof 30 Iris

Fisher Data Clusters

Any symbolic variable (set of bins variables) can

be projected. Here the species variable.

SYROKKO Company afonso_at_syrokko.com

The Symbolic Variables contributions are inside

the smallest hyper cube containing the

correlation sphere of the bins

Numerical versus symbolical space of

representation

(Y1(Ci ), Y2(Ci )) (a1i , b1i , (a2i , b2i

)

Numerical representation of interval variables

Bi-plot of interval variables

b1

Ci

b2

Ci

x

x

a1

a2

Bi-plot of histogram variables

- The joint probability can be inferred by a copula

model

Copula

PART 3 OPEN DIRECTION OF RESEARH

- Models of models
- Law of parameters of laws
- Laws of vectors of laws.
- Copulas needed.
- Four general convergence theorem.
- Optimisation in non supervised learning

(hierarchical and pyramidal clustering).

From lower level of individual observation to

higher level observation of classes higher

level models are needed

Table 1

Table 2

Individual X1 Xj

ind1

Messi Xij

indn

A symbolic data (age of Messi team)

A number (age of Messi)

- Xj is a standard random numerical variable
- Xj is a random variable with histogram value
- Question if the law of Xj is given what is the

law of Xj ? (Dirichlet models useful).

Why using copula models in Symbolic Data

Analysis?

- f(i, j, j) is the joint probability of the

variables j and j for the individual i. - In case of independency , we have
- f(i, j, j) f(i, j). f(i, j),
- If there is no dépendancy
- f(i, j, j) Copula(f(i, j). f(i, j))
- Aim of Copula model in SDA
- find the Copula which minimises the difference

with the joint. - In order to avoid the restriction to independency

hypotheses and to reduce the cost of f(i, j, j)

computing.

FOUR THEOREM TO BE PROVED FOR ANY EXTENDED METHOD

TO SYMBOLIC DATA.

M(n, k) is supposed to be a SDA method where k is

the number of classes obtained on n initial

individuals THEOREME 1 If the k classes are

fixed and n tends towards infinity, then M(n, k)

converges towards a stable position. THEOREME 2

If k increases until getting a single individual

by class, then M(n, k) converges towards a

standard one. THEOREME 3 I k and n increases

simulataneously towards infinity, then M(n, k)

converges towards a stableposition. THEOREME 4 If

the k laws associated to the k classes are

considered as a sample of a law of laws, then

M(n, k) applied to this sample converges to M(n,

k) applied to this law. Exemples Théorème 1

il a été démontré dans Diday, Emilion (CRAS,

Choquet 1998), pour les treillis de Galois à

mesure que la taille de la population augmente

les classes (décrites par des vecteurs de

distributions), sorganisent dans un treillis de

Galois qui converge. Emilion (CRAS, 2002) donne

aussi un théorème dans le cas de mélanges de lois

de lois utilisant les martingales et un modèle de

Dirichlet. Théorème 2 Par ex, lACP classique

MO est un cas particulier de lACP notée M(n, k)

construite sur les vecteurs dintervalles. Théorèm

e 3 cest le cadre de données qui arrivent

séquentiellement (de type Data Stream ) et

des algorithmes de type one pass (voir par ex

Diday, Murty (2005)). Théorème 4 Dans le cas

d'une classification hiérarchique ou pyramidale

2D, 3D etc. la convergence signifie que les

grands paliers et leur structure se stabilisent.

Dans le cas dune ACP la convergence signifie que

les axes factoriels se stabilisent.

Optimisation in clustering

d is the given dissimilarity

Ultrametric dissimilarity U

Hierarchies

W d - U

Each class is described by symbolic data

Pyramides

Robinsonian dissimilarity R

3D Spatial Pyramid

S1

W d - R

S2

Yadidean dissimilarity Y

C3

C2

A 1 B1 C1

W d - Y

PART 4 SDA SOFTWARES

- SODAS
- RSDA
- SYR

SoftwareTo build symbolic data from standard or

complex data and analyze symbolic data, different

software packages exist today.SODAS - academic

free package, though registration required and a

code needed for installation, http//www.info.fund

p.ac.be/asso/sodaslink.htmMuch Symbolic data

data bases can be found at http//www.ceremade.dau

phine.fr/SODAS/ RSDA academic free packages

are available on CRAN oldemar.rodriguez_at_gmail.co

mSYR professional package, see

afonso_at_syrokko.com

SODAS SOFTWARE

CARTE DE KOHONEN DE CONCEPTS

ANALYSE FACTORIELLE ACP de variables à valeur

intervalle

Superposition de deux deux étoîles associées à

deux classes de la pyramides

Arbre de décision sur variables à valeur

histogramme ou intervalle

The objective of SCLUST is the clustering of

symbolic objects by a dynamic algorithm based on

symbolic data tables. The aim is to build a

partition of SOs into a predefined number of

classes. Each class has a prototype in the form

of a SO. The optimality criterion used is based

on the sum of proximities between the individuals

and the prototypes of the clusters.

Pyramide classifiante

FROM DATA BASE TO SYMBOLIC DATA IN SODAS

Individuals

Classes

Relational Data Base

QUERY

Description of individuals

Columns symbolic variables

Classes

Class description

Symbolic Data Table

Cells contain Symbolic Data

SYR SOFTWARE

- Produce a Symbolic Data Table from complex data.
- Manage Symbolic Data Tables sort rows and

columns by discriminant power - Analyse Symbolic data tables SPCA,Sclustering
- Produce network, rules and decision trees.

SYR SYMBOLIC DATA TABLE MANAGEMENT

SYMBOLIC DATA TABLE

- Sorting rows by min, max of intervals or

frequencies of barchart is possible. - Sorting variables by discriminate power of the

concepts is also possible.

SYROKKO Company eliezer_at_syrokko.com

PART 5 INDUSTRIAL APPLICATIONS

Time Series Data table Anomaly detection on a

bridge LCPC (Laboratoire Central Des Ponts et

Chaussées) and SNCF Data

Sensor 1 Sensor 2

Sensor 3 .

Sensor N

Trains

HIERARCHICAL DATA

- Symbolic procedure
- From numerical description of pigs to symbolic

description of Farms - Numerical variables
- and
- Categorical variables
- are transformed in Bar Chart of the

frequencies based on 30 animals, - Or in interval value variables

19 variables

Description of pig respiratory diseases

125 farms x 30 animals

Median score (continuous var.)

Animal frequencies (categorical var.)

64 variables

Description of pig respiratory diseases

125 farms

C. Fablet, S. Bougeard (AFSSA)

Step 1 Symbolic Description of Farms

SYROKKO Company afonso_at_syrokko.com

Nuclear Power PlantFind Correlations Between 3

Standard Data Tables of Different observation

units and different Variables

NUCLEAR POWER PLANT Nuclear thermal power station

Inspection

PB FIND CORRELATIONS BETWEEN 3 CLASSICAL DATA

TABLES OF DIFFERENT UNITS AND VARIABLES Table 1)

Observations Cracks . Variables Cracks

description. Table 2) Observations vertices of a

grid. Variables Gap deviation at different

periods compared to the initial model position.

Table 3) Observations vertices of a grid.

Variables Gap depression from the ground. ARE

Transformed in ONE Symbolic Data Table where the

classes the towers. On this new table SDA can be

applied.

FROM COMPLEX DATA TO SYMBOLIC DATA

Towers on PCA first axes

- PCA on chooosen symbolic variables
- Three clusters.visualisation
- Interval and bar chart variables can be seen..
- A network of the strongest links can be

represented.

NETSYR results (SYR software)

Symbolic variables projection inside the

hypercube of the correlation sphere

Telephone calls text mining in order to discover

themes without using semantic

INITIAL DATA 2 814 446 rows

Documents Words

Doc1 bonjour

Doc1 oui

Doc1 monsieur

Doc2 panne

- Each calling session is called a document.
- We start after lemmatisation with a table of
- 31454 documents
- 2258 words

Correspondence between documents and words.

First Stepsbuilding overlapping clusters of

documents and words CLUSTSYR

70 x 2258

2 814 446 rows Correspondence documents, words

31454 documents x 2258 words

70 Overlapping Clusters of Documents described by

the tf-idf of 2258 words.

2258 x 70

80 x 70

80 overlapping clusters of words described by

their tf-idf in the 70 clusters of Docs.

2258 Words described by their tf-idf on the 70

clusters of Docs.

Next step STATSYREach cluster of documents

is described by the 80 clusters of words called

themes

Themes

Classes of documents

WORDS in Each Theme

GRAPHICAL REPRESENTATIONby NETSYR from SYR

software

GRAPHICAL REPRESENTATION of themes , document

classes, by Pie Charts And their Bar chart

description. Overlapping Clusters SOCIAL

NEWORK Based on dissimilarities ANNOTATION of

Themes and Document classes Moving, Zooming

We obtain finally a clear representation of the

main themes , their classes and their links

failures, budget,addresses, vacation etc..

A Survey on Security

- A sample of people of three regions (Vex, Val,

Plai) have answered to three questions - Gender M or W,
- Security priority to
- Fight Against Unemployment (FAU),
- Juvenile Delinquency (JD)
- Drug addict (D)),
- Death penalty (Yes or No).

Gender, Security , D. Penalty are barchart

value variables M, W, FAU, JDare bins

From barchart symbolic variables to Metabin

latent variables

Region Gender Gender Insecurity Insecurity Insecurity Death Penalty Death Penalty

- M W FAU JD D Yes No

Vex 0.8 0.2 0.4 0.5 0.1 0.5 0.5

Val 0.7 0.3 0.5 0.2 0.3 0.4 0.6

Plai 0.3 0.7 0.7 0.1 0.2 0.1 0.9

Table 1 Initial bar chart data table

Region S1cor S1cor S1cor S2cor S2cor S2cor S3cor S3cor S3cor

M JD Yes W FAU No NU D NU

Vex 0.8 0.5 0.5 0.2 0.4 0.5 NU 0.1 NU

Val 0.7 0.2 0.4 0.3 0.5 0.6 NU 0.3 NU

Plai 0.3 0.1 0.1 0.7 0.7 0.9 NU 0.2 NU

Table 2 Metabin latent variables

CONCLUSION

- If you have standard units described by numerical

and (or) categorical variables, these variables

induce classes described by symbolic variables

taking care of their internal variation. Then SDA

can be applied on these new units in order to get

complementary and enhancing results by extending

standard analysis to symbolic analysis. - Symbolic data have to be build from given

standard or complex data. - Symbolic data cannot be reduced to standard data.

- Complex data can be simplified in symbolic data.
- Big Data bases can be reduced in symbolic data
- Symbolic data are not only distributions, they

are the numbers of the future.

Références

- Basic books and papers
- Bock H.H., Diday E. (editors and co-authors) (

2000) Analysis of Symbolic Data.Exploratory

methods for extracting statistical information

from complex data. Springer Verlag, Heidelberg,

425 pages, ISBN 3-540-66619-2. - L. Billard, E. Diday (2003) "From the statistics

of data to the statistic of knowledge Symbolic

Data Analysis". JASA . Journal of the American

Statistical Association. Juin, Vol. 98, N 462. - E. Diday, M. Noirhomme (eds and co-authors)

(2008) Symbolic Data Analysis and the SODAS

software. 457 pages. Wiley. ISBN

978-0-470-01883-5. - Billard, L. and Diday, E. (2006). Symbolic Data

Analysis Conceptual Statistics and Data Mining.

321 pages. Wiley series in computational

statistics. Wiley, Chichester, ISBN

0-470-09016-2. - Noirhomme-Fraiture, M. and Brito, P. (2012) Far

beyond the classical data models symbolic data

analysis. Statistical Analysis and Data Mining 4

(2), 157-170. - Lazare N. (2013) "Symbolic Data Analysis". CHANCE

magazine. Editors Letter Vol. 26, No. 3.

Building Symbolic Data and representation

Referencies

- Stéphan V., Hébrail G.,Lechevallier Y. (2000)

Generation of symbolic objects from relationnal

data base . Chapter in book Analysis of

Symbolic Data Exploratory Methods for Extracting

Statistical Information from Complex Data (eds.

H.-H.Bock and E. Diday). Springer-Verlag, Berlin,

103-124. - Chiun-How, K., Chih-Wen, O., Yin-Jing, T.,

Chuan-kai, Yang, Chun-houh, Chen (2012) A

Symbolic Database for TIMSS. Arroyo J., Maté

C., Brito P. Noihomme M. eds, 3rd Workshop in

Symbolic Data Analysis. Universidad Compiutense

de Madrid. http//www.sda-workshop.org/. - E. Diday, F. Afonso, R. Haddad (2013) The

symbolic data analysis paradigm, discriminate

discretization and financial application. In

Advances in Theory and Applications of High

Dimensional and Symbolic Data Analysis, HDSDA

2013. Revue des Nouvelles Technologies de

l'Information vol. RNTI-E-25, pp. 1-14

SOME SYMBOLIC DATA ANALYSIS REFERENCIES

- In Pricipal Component Analysis
- Cazes P., Chouakria A., Diday E., Schektman Y.

(1997). Extension de lanalyse en composantes

principales à des données de type intervalle,

Rev. Statistique Appliquées, Vol. XLV Num. 3, pp.

5-24, France. 29. - Cazes P. (2002) Analyse factorielle dun tableau

de lois de probabilité. Revue de statistique

appliquée, tome 50, n0 3. - Diday E. (2013) "Principal Component Analysis for

bar charts and Metabins tables". Statistical

Analysis and Data Mining. Article first published

online 20 May 2013. DOI 10.1002/sam.11188. 2013

Wiley. Statistical Analysis and Data Mining,6,5,

403-430. - Ichino, M. (2011). The quantile method for

symbolic principal component analysis.

Statistical Analysis and Data Mining, Wiley.

184-198. - Makosso-Kallyth S. and Diday E. (2012) Adaptation

of interval PCA to symbolic histogram variables.

Advances in Data Analysis and Classification

(ADAC). July, Volume 6, Issue 2, pp 147-159. - Rademacher, J., Billard , L., (2012) Principal

component analysis for interval data. Wiley

interdisciplinary Reviews Computational

Statistics .Volume 4, Issue 6, pp. 535540. - Shimizu N., Nakano J. (2012) Histograms Principal

Component Analysis. Arroyo J., Maté C., Brito P.

Noihomme M. eds, 3rd Workshop in Symbolic Data

Analysis. Universidad Compiutense de Madrid.

http//www.sda-workshop.org/ - Wang H., Guan R., Wu J. (2012a). CIPCA

Complete-Information-based Principal Component

Analysis for interval-valued data,

Neurocomputing, Volume 86, Pages 158-169.

Symbolic Data Analysis references

- In Symbolic Forecasting
- Arroyo, J. and Maté, C. (2009). Forecasting

histogram time series with k-nearest neighbors'

methods. International Journal of Forecasting 25,

192207. - García-Ascanio, C. Maté, C. (2010). Electric

power demand forecasting using interval time

series A comparison between VAR and iMLP. Energy

Policy 38, 715-725 - Han, A., Hong, Y., Lai, K.K., Wang, S. (2008).

Interval time series analysis with an application

to the sterling-dollar exchange rate. Journal of

Systems Science and Complexity, 21 (4), 550-565. - He, L.T. and C. Hu (2009). Impacts of Interval

Computing on Stock Market Variability

Forecasting. Computational Economics 33, 263-276. - In Symbolic rule extraction
- Afonso, F. et Diday, E. (2005). Extension de

lalgorithme Apriori et des regles dassociation

aux cas des donnees symboliques diagrammes et

intervalles. Revue RNTI, Extraction et Gestion

des Connaissances (EGC 2005), Vol. 1, pp 205-210,

Cepadues, 2005.

Symbolic Data Analysis referencies

- In Symbolic Decision Tree
- Ciampi, A., Diday, E., Lebbe, J., Perinel, E. et

Vignes, R. (2000). Growing a tree classifier with

imprecise data. Pattern Recognition letters 21

787-803. - Mballo C., Diday E. (2006) The criterion of

Smirnov-Kolmogorov for binary decision tree

application to interval valued variables.

Intelligent Data Analysis. Volume 10, Number 4 .

pp 325 341 - Winsberg S., Diday E., Limam M. (2006). A tree

structured classifier for symbolic class

description. Compstat 2006. Physica-Verlag. - Bravo, M. et Garcia-Santesmases, J. (2000).

Symbolic Object Description of Strata by

Segmentation Trees, Computational Statistics,

1513-24, Physica-Verlag.

Symbolic Data Analysis references

- In Clustering
- De Carvalho F., Souza R., Chavent M., and

Lechevallier Y. (2006) Adaptive Hausdorff

distances and dynamic clustering of symbolic

interval data. Pattern Recognition Letters Volume

27, Issue 3, February 2006, Pages 167-179. - De Souza R.M.C.R, De Carvalho F.A.T. (2004).

Clustering of interval data based on City-Block

distances. Pattern Recognition Letters, 25,

353365. - Diday E. (2008) Spatial classification. DAM

(Discrete Applied Mathematics) Volume 156, Issue

8, Pages 1271-1294. - Diday, E., Murty, N. (2005) "Symbolic Data

Clustering" in Encyclopedia of Data Warehousing

and Mining . John Wong editor . Idea Group

Reference Publisher. - Irpino, A. and Verde, R. (2008) Dynamic

clustering of interval data using a

Wasserstein-based distance. Pattern Recognition

Letters 29, 1648-1658. - In Multidimensional Scaling
- Terada, Y., Yadohisa, H. (2011) Multidimensional

scaling with hyperbox model for percentile

dissimilarities, In Watada, J., Phillips-Wren,

G., Jain, L. C., and Howlett, R. J. (Eds.)

Intelligent Decision Technologies Springer

Verlag, 779788 - Groenen, P.J.F.,Winsberg, S., Rodriguez, O.,

Diday, E. (2006). I-Scal Multidimensional

scaling of interval dissimilarities.

Computational Statistics and Data Analysis 51,

360378.

Some Symbolic Data Analysis references

- In Self Organizing map
- Hajjar C., Hamdan H. (2011). Self-organizing map

based on L2 distance for interval-valued data. In

SACI 2011, 6th IEEE International Symposium on

Applied Computational Intelligence and

Informatics (Timisoara, Romania), pp. 317322.P. - In Dissimilarities between Symbolic Data
- Kim, J. and Billard, L. (2013) Dissimilarity

measures for histogram-valued observations,

Communications in Statistics-Theory and Method,

42, 283-303. - Verde, R., Irpino, A. (2010). Ordinary Least

Squares for Histogram Data Based on Wasserstein

Distance, in Proc. COMPSTAT2010, Y.

Lechevallier and G.Saporta (Eds).PP.581-589.

Physica Verlag Heidelberg.

Some Symbolic Data Analysis references

- In Regression and Canonical analysis extended to

Symbolic Data - Dias, S., Brito, P., (2011). A New Linear

Regression Model for Histogram-Valued Variables.

In Proceedings of the 58th ISI World Statistics

Congress (Dublin, Ireland). - Lauro, C., Verde, R. , Irpino, A. (2008).

Generalized canonical analysis, in Symbolic Data

Analysis and the Sodas Software, E. Diday and M.

Noirhomme. Fraiture (Eds.), 313-330, Wiley,

Chichester. - Tenenhaus A., Diday E., Emilion R., Afonso F.

(2013) Regularized General Canonical Correlation

Analysis Extended To Symbolic Data. ADAC

(publication on the way). - Neto, E.A, De Carvalho F.A.T. (2010). Constrained

linear regression models for symbolic

interval-valued variables. Computational

Statistics and Data Analysis 54, 333-347. - Wang H., Guan R., Wu J. (2012c). Linear

regression of interval-valued data based on

complete information in hypercubes, Journal of

Systems Science and Systems Engineering, Volume

21, Issue 4, Page 422-442.

Some Symbolic Data Models referencies

- P. Bertrand, F. Goupil (2000) Descriptive

Statistics for symbolic data . In H.H. Bock, E.

Diday (Eds) Analysis of Symbolic

Data . Springer-Verlag, pp. 106-124. - Brito, P. and Duarte Silva, A.P. (2012).

Modelling interval data with Normal and

Skew-Normal distributions. Journal of Applied

Statistics, 39 (1), 3-20. - E. Diday, M. Vrac (2005) "Mixture decomposition

of distributions by Copulas in the symbolic data

analysis framework". Discrete Applied Mathematics

(DAM). Volume 147, Issue1, 1 April, pp. 27-41. - E. Diday (2011) Modélisation de données

symboliques et application au cas des

intervalles. Journées Nationales de la Société

Francophone de Classification. Orléans - E. Diday (2002) From Schweizer to Dempster

mixture decomposition of distributions by copulas

in the symbolic data analysis framework IPMU

2002, July, Annecy, France - Diday E., Emilion R. (1997) "Treillis de Galois

Maximaux et Capacités de Choquet" . C.R. Acad.

Sc. t.325, Série 1, p 261-266. Présenté par G.

Choquet en Analyse Mathématiques - Diday E., R. Emilion (2003) Maximal and

stochastic Galois lattices. Discrete appliedMath.

Journal. Vol. 27 (2), pp. 271-284. - Emilion R., Classification et mélanges de

processus. C.R. Acad. Sci. Paris, 335, série I,

189-193 (2002). - Emilion R., Unsupervised Classification and

Analysis of objects described by nonparametric

probability distributions. Statistical Analysis

and Data Mining (SAM), Vol 5, 5, 388-398 (2012). - J. Le-Rademacher, L. Billard (2011) Likelihood

functions and some maximum likelihood estimators

for symbolic data. Journal of Statistical

Planning and Inference 141 15931602. Elsevier. - T. Soubdhan, R. Emilion, R. Calif (2009)

Classification of daily solar radiation

distributions. Solar Energy 83 (2009)

10561063. Elsevier.

Some SDA Industrial Applications

- Afonso F., Diday E., Badez N., Genest Y. (2010)

Symbolic Data Analysis of Complex Data

Application to nuclear power plant. COMPSTAT2010

, Paris. - Bezerra B., Carvalho F. (2011) Symbolic data

analysis tools for recommendation systems. Knowl.

Inf. Syst 01/2011 26385-418. DOI10.1007/s10115-

009-0282-3. - Bouteiller V., Toque C., A., Cherrier J-F.,

Diday E., Cremona C. (2011) Non-destructive

electrochemical characterizations of reinforced

concrete corrosion basic and symbolic data

analysis. Corros Rev . Walter de Gruyter Berlin

Boston. DOI 10.1515/corrrev-2011-002. - Courtois, A., Genest, G., Afonso, F., Diday, E.,

Orcesi, A., (2012) In service inspection of

reinforced concrete cooling towers EDFs

feedback ,IALCCE 2012, Vienna, Austria - Cury, A., Crémona, C., Diday, E. (2010).

Application of symbolic data analysis for

structural modification assessment. Engineering

Structures Journal. Vol 32, pp 762-775. - Christelle Fablet, Edwin Diday, Stephanie

Bougeard, Carole Toque, Lynne Billard (2010).

Classification of Hierarchical-Structured Data

with Symbolic Analysis. Application to Veterinary

Epidemiology. COMPSTAT2010 , Paris. - Haddad R., Afonso F., Diday E., (2011) Approche

symbolique pour l'extraction de thématiques

Application à un corpus issu d'appels

téléphoniques. In actes des XVIIIèmes Rencontres

de la Sociéte francophone de Classification.

Université d'Orléans - Laaksonen, S. (2008). Peoples Life Values and

Trust Components in Europe - Symbolic Data

Analysis for 20-22 Countries. In. Edwin Diday and

Monique Noirhomme-Fraiture, Symbolic Data

Analysis and the SODAS Software", Chapter 22, pp.

405-419. Wiley and Sons Chichester, UK. - Quantin C., Billard L., Touati M., Andreu N.,

Cottin Y., Zeller M., Afonso F., Battaglia G.,

Seck D., Le Teuff G., and Diday E.. (2011)

Classification and Regression Trees on Aggregate

Data Modeling An Application in Acute Myocardial

Infarction. Journal of Probability and Statistics

Volume 2011 (2011), 19 pages. - Terraza V, Toque C. (2013) Mutual Fund Rating

A Symbolic Data Approach. In "Understanding

Investment Funds Insights from Performance and

Risk Analysis". Edited by Virginie Terraza and

Hery Razafitombo . Economics Finance Collection

2013. The Palgrave Macmilan editor. UK. - He, L.T. and C. Hu (2009). Impacts of Interval

Computing on Stock Market Variability

Forecasting. Computational Economics 33, 263-276. - E. Diday, F. Afonso, R. Haddad (2013) The

symbolic data analysis paradigm, discriminate

discretization and financial application, in

Advances in Theory and Applications of High

Dimensional and Symbolic Data Analysis, HDSDA

2013. Revue des Nouvelles Technologies de

l'Information vol. RNTI-E-25, pp. 1-14 - Han, A., Hong, Y., Lai, K.K., Wang, S. (2008).

Interval time series analysis with an application

to the sterling-dollar exchange rate. Journal of

Systems Science and Complexity, 21 (4), 550-565.