Computational and Statistical Learning Group CASTLE

About This Presentation

Title:

Computational and Statistical Learning Group CASTLE

Description:

Alex Rojas (MS 2001, now at CMU) Adriana Lopez (MS 2002-now at U. Pittsburgh) ... Caroline Rodriguez (PR): Data preprocessing-Visualization ... – PowerPoint PPT presentation

Number of Views:199

Avg rating:3.0/5.0

Slides: 21

Provided by: edgar9

Category:

more less

Transcript and Presenter's Notes

Title: Computational and Statistical Learning Group CASTLE

1
Computational and Statistical Learning Group
(CASTLE)

Professor Edgar Acuna
Mathematics Department
University of Puerto Rico at Mayaguez
Partially supported by ONR Grant N0014-03-0359

2
Our research is focused in the development of
computational and statistical methods for
knowledge discovery in databases related to
Pattern Recognition, Machine Learning,
Bioinformatics, and Data Mining
3
CASTLEs Members1
4
CASTLEs Members2
5
CASTLEs Alummi

Alex Rojas (MS 2001, now at CMU)
Adriana Lopez (MS 2002-now at U. Pittsburgh)
Jose Vega (PhD 2004, now at the UPR-Medical
School)
Santiago Velasco (MS 2004, now at UPRM

6
CASTLEs collaborators

Ana Patricia Ortiz, Puerto Rico Cancer Center.
Idhaliz Flores, Ponce Medical School, Puerto
Rico.
Jose Vega ( University of Puerto Rico-School of
Medicine)

7
Research topics

Data preprocessing
Treatment of missing values in Data Mining
Normalization in Data Mining
Feature selection procedures for high dimensional
data Wrappers, Filters and hybrid methods with
application to Bionformatics. Feature selection
based on rough sets.
Feature extraction procedures for high
dimensional data, such as Partial least Squares
and supervised Principal components, with
application to Bioniformatics and Chemometrics.
Instance selection procedures, such as
progressive sampling, to make possible the
application of statistical methods and machine
learning techniques.

8
Research topics 2

Regularization Methods. Use of shrunken
estimators for logistic regression trying to
work with classifiers that use all the features
without performing feature selection and feature
extraction. Application in Bioinformatics and
Chemometrics.
Outlier detection procedures, such as
distance-based outliers and density local
outliers. We will apply these procedures to
network intrusion detection.
Visualization procedures for Data Mining. We are
enhancing graphics such as parallel coordinate
plots, surveyplot, and star coordinates that will
allow us to perform exploratory data analysis
previously to the application of a knowledge
discovery technique.

9
Research Topics3

Parallel Data Mining. The use of parallel
computation will allow us to deal with very large
datasets. In particular we are developing
parallel algorithms to perform data preprocessing
tasks in very large datasets. Visualization and
computation of meta-classifiers using parallelism
is also being considered.
Unsupervised learning to find features that
behave similarly in various conditions and to
find subgroups of instances that similar to each
other. In particular we are interested in
validation of clustering algorithms.
We are investigating extensions of data mining
tasks to the multi-relational case (multiple
tables).
Interface between the R statistical computing
language and SQL in order to manipulate large
datasets.
Building of a visual event programming on R to
perform mainly data preprocessing tasks.

10
Lab Equipment

155 square feet room.
A cluster of five dual Pentium Xeon processors
Dell workstations. Each of them running at 3.06
GHz and with 3MB of RAM memory.
A Color laserjet 4050 HP printer

11
Recent Publications1

1 Lozano, E. and Acuna, E. (2005). A 3D
extension of the Star coordinates display.
2 Lozano, E. and Acuna, E. (2005) Parallel
algoritms for distance-based and density
local-based outliers. To appear in the
Proceedings of the ICDM05 organized by the IEEE.
3 Vega, J. and Acuna, E. (2005).
Generalizations of PLS for dimensionality
reduction in supervised classification.
Proceedings of the Fourth International
Conference in Statistics and Related fields.
Hawaii.
4 Acuna, E. and Rodriguez, C. (2004). The
effect of outliers on the misclassification error
rate. Submitted to the IEEE Transactions on
Knowledge and Data Engineering.
5 Acuna, E. and Rodriguez, C. (2004). The
treatment of missing values and its effect in the
classifier accuracy. In D. Banks, L. House, F.R.
McMorris, P. Arabie, W. Gaul (Eds).Classification,
Clustering and Data Mining Applications.
Springer-Verlag Berlin-Heidelberg, 639-648.

12
Recent Publications2

6 Acuña E. and Coaquira, F. (2003). A
comparison of feature selection procedures for
classifiers based on kernel density estimation.
Proceedings of the International conference on
computer, communication and control technologies,
CCCT03. Vol I. p. 462-467. Orlando, Florida.
7 Daza, L. and Acuña, E.. (2003) Combining
classifiers based on Gaussian Mixtures.
Proceedings of the International conference on
computer, communication and control technologies,
CCCT03. Vol I. p. 473-478. Orlando, Florida.
8 Lozano, E. and Acuña, E. (2002) Parallel
computation of kernel density estimates
classifiers and their ensembles. Proceedings of
the International conference on computer,
communication and control technologies, CCCT03.
Vol I. p. 473-478. Orlando, Florida.

13
Recent Publications3

9 Acuña, E , (2003) A comparison of filters and
wrappers for feature selection in supervised
classification. Proceedings of the Interface
2003 Computing Science and Statistics. Vol 34.
10 Acuña, E. Rojas, A., and Coaquira, F.
(2002). The effect of feature selection on
combining classifiers based on kernel density
estimates. In K. Jajuga, A. Sokodowski, H.-H Bock
(Eds). Classification, Clustering and Data
Analysis. Springer, Heidelberg, 161-168.
11 Acuña, E., (2002) Combining Classifiers
based on Kernel density classifiers and Gaussian
mixtures. Proceedings of the Interface 2002
Computing Science and Statistics. Vol 33.

14
Software

The Dprep package A library of 68 R functions to
perform mainly data preprocessing tasks
including range normalization, discretization,
handling of missing values, outlier detection,
feature selection, and visualization.

15
Castle Members
Elio Lozano Parallel Data mining
Caroline Rodriguez (PR) Data preprocessing-Visual
ization
16
Castle Members
Frida Coaquira Rough sets for KDD
Luis Daza Instance Selection
17
Castle Members
Trilce Encarnation Databases
Marggie Gonzalez Cluster Validation
18
Castle Members

Jaime Porras Supervised Principal component for
classification
Karen Prieto Shrunken estimators for logistic
regression
19
Castle Members
Carlos Lopez Bayesian networks classifiers
Carmen Saldana Statistics
20
Castle Members
Roxana Aparicio Databases
Sindy Diaz Statistics

Write a Comment

User Comments (0)

About PowerShow.com

Computational and Statistical Learning Group CASTLE - PowerPoint PPT Presentation

Computational and Statistical Learning Group CASTLE

Alex Rojas (MS 2001, now at CMU) Adriana Lopez (MS 2002-now at U. Pittsburgh) ... Caroline Rodriguez (PR): Data preprocessing-Visualization ... – PowerPoint PPT presentation