Computational and Statistical Learning Group CASTLE - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Computational and Statistical Learning Group CASTLE

Description:

Alex Rojas (MS 2001, now at CMU) Adriana Lopez (MS 2002-now at U. Pittsburgh) ... Caroline Rodriguez (PR): Data preprocessing-Visualization ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 21
Provided by: edgar9
Category:

less

Transcript and Presenter's Notes

Title: Computational and Statistical Learning Group CASTLE


1
Computational and Statistical Learning Group
(CASTLE)
  • Professor Edgar Acuna
  • Mathematics Department
  • University of Puerto Rico at Mayaguez
  • Partially supported by ONR Grant N0014-03-0359

2
Our research is focused in the development of
computational and statistical methods for
knowledge discovery in databases related to
Pattern Recognition, Machine Learning,
Bioinformatics, and Data Mining
3
CASTLEs Members1
4
CASTLEs Members2
5
CASTLEs Alummi
  • Alex Rojas (MS 2001, now at CMU)
  • Adriana Lopez (MS 2002-now at U. Pittsburgh)
  • Jose Vega (PhD 2004, now at the UPR-Medical
    School)
  • Santiago Velasco (MS 2004, now at UPRM

6
CASTLEs collaborators
  • Ana Patricia Ortiz, Puerto Rico Cancer Center.
  • Idhaliz Flores, Ponce Medical School, Puerto
    Rico.
  • Jose Vega ( University of Puerto Rico-School of
    Medicine)

7
Research topics
  • Data preprocessing
  • Treatment of missing values in Data Mining
  • Normalization in Data Mining
  • Feature selection procedures for high dimensional
    data Wrappers, Filters and hybrid methods with
    application to Bionformatics. Feature selection
    based on rough sets.
  • Feature extraction procedures for high
    dimensional data, such as Partial least Squares
    and supervised Principal components, with
    application to Bioniformatics and Chemometrics.
  • Instance selection procedures, such as
    progressive sampling, to make possible the
    application of statistical methods and machine
    learning techniques.

8
Research topics 2
  • Regularization Methods. Use of shrunken
    estimators for logistic regression trying to
    work with classifiers that use all the features
    without performing feature selection and feature
    extraction. Application in Bioinformatics and
    Chemometrics.
  • Outlier detection procedures, such as
    distance-based outliers and density local
    outliers. We will apply these procedures to
    network intrusion detection.
  • Visualization procedures for Data Mining. We are
    enhancing graphics such as parallel coordinate
    plots, surveyplot, and star coordinates that will
    allow us to perform exploratory data analysis
    previously to the application of a knowledge
    discovery technique.

9
Research Topics3
  • Parallel Data Mining. The use of parallel
    computation will allow us to deal with very large
    datasets. In particular we are developing
    parallel algorithms to perform data preprocessing
    tasks in very large datasets. Visualization and
    computation of meta-classifiers using parallelism
    is also being considered.
  • Unsupervised learning to find features that
    behave similarly in various conditions and to
    find subgroups of instances that similar to each
    other. In particular we are interested in
    validation of clustering algorithms.
  • We are investigating extensions of data mining
    tasks to the multi-relational case (multiple
    tables).
  • Interface between the R statistical computing
    language and SQL in order to manipulate large
    datasets.
  • Building of a visual event programming on R to
    perform mainly data preprocessing tasks.

10
Lab Equipment
  • 155 square feet room.
  • A cluster of five dual Pentium Xeon processors
    Dell workstations. Each of them running at 3.06
    GHz and with 3MB of RAM memory.
  • A Color laserjet 4050 HP printer

11
Recent Publications1
  • 1 Lozano, E. and Acuna, E. (2005). A 3D
    extension of the Star coordinates display.
  • 2 Lozano, E. and Acuna, E. (2005) Parallel
    algoritms for distance-based and density
    local-based outliers. To appear in the
    Proceedings of the ICDM05 organized by the IEEE.
  • 3 Vega, J. and Acuna, E. (2005).
    Generalizations of PLS for dimensionality
    reduction in supervised classification.
    Proceedings of the Fourth International
    Conference in Statistics and Related fields.
    Hawaii.
  • 4 Acuna, E. and Rodriguez, C. (2004). The
    effect of outliers on the misclassification error
    rate. Submitted to the IEEE Transactions on
    Knowledge and Data Engineering.
  • 5 Acuna, E. and Rodriguez, C. (2004). The
    treatment of missing values and its effect in the
    classifier accuracy. In D. Banks, L. House, F.R.
    McMorris, P. Arabie, W. Gaul (Eds).Classification,
    Clustering and Data Mining Applications.
    Springer-Verlag Berlin-Heidelberg, 639-648.

12
Recent Publications2
  • 6 Acuña E. and Coaquira, F. (2003). A
    comparison of feature selection procedures for
    classifiers based on kernel density estimation.
    Proceedings of the International conference on
    computer, communication and control technologies,
    CCCT03. Vol I. p. 462-467. Orlando, Florida.
  • 7 Daza, L. and Acuña, E.. (2003) Combining
    classifiers based on Gaussian Mixtures.
    Proceedings of the International conference on
    computer, communication and control technologies,
    CCCT03. Vol I. p. 473-478. Orlando, Florida.
  • 8 Lozano, E. and Acuña, E. (2002) Parallel
    computation of kernel density estimates
    classifiers and their ensembles. Proceedings of
    the International conference on computer,
    communication and control technologies, CCCT03.
    Vol I. p. 473-478. Orlando, Florida.

13
Recent Publications3
  • 9 Acuña, E , (2003) A comparison of filters and
    wrappers for feature selection in supervised
    classification. Proceedings of the Interface
    2003 Computing Science and Statistics. Vol 34.
  • 10 Acuña, E. Rojas, A., and Coaquira, F.
    (2002). The effect of feature selection on
    combining classifiers based on kernel density
    estimates. In K. Jajuga, A. Sokodowski, H.-H Bock
    (Eds). Classification, Clustering and Data
    Analysis. Springer, Heidelberg, 161-168.
  • 11 Acuña, E., (2002) Combining Classifiers
    based on Kernel density classifiers and Gaussian
    mixtures. Proceedings of the Interface 2002
    Computing Science and Statistics. Vol 33.

14
Software
  • The Dprep package A library of 68 R functions to
    perform mainly data preprocessing tasks
    including range normalization, discretization,
    handling of missing values, outlier detection,
    feature selection, and visualization.

15
Castle Members
Elio Lozano Parallel Data mining
Caroline Rodriguez (PR) Data preprocessing-Visual
ization
16
Castle Members
Frida Coaquira Rough sets for KDD
Luis Daza Instance Selection
17
Castle Members
Trilce Encarnation Databases
Marggie Gonzalez Cluster Validation
18
Castle Members

Jaime Porras Supervised Principal component for
classification
Karen Prieto Shrunken estimators for logistic
regression
19
Castle Members
Carlos Lopez Bayesian networks classifiers
Carmen Saldana Statistics
20
Castle Members
Roxana Aparicio Databases
Sindy Diaz Statistics
Write a Comment
User Comments (0)
About PowerShow.com