A Report on the NSF ICORS 2002 Computational Workshop - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

A Report on the NSF ICORS 2002 Computational Workshop

Description:

... discussed smoothing algorithms for quantile regression which will become part ... Smoothing algorithm for regression quantile. 28. More Computational Issues ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 35

Provided by: agora1

Category:

more less

Transcript and Presenter's Notes

Title: A Report on the NSF ICORS 2002 Computational Workshop

1
A Report on the NSF ICORS 2002 Computational
Workshop

Arnold Stromberg
Department of Statistics
With support from the
National Science Foundation

ICORS 2003 Antwerp, Belgium July 18, 2003
2
Background

The NSF sponsered International workshop on
Computational Methods for Robust Statistics was
held immediately following ICORS 2002 in
Vancouver, BC, Canada from May 18 to May 20, 2002

3
Invited Presentations

Doug Martin of Insighful, Inc., the makers of
Splus, discussed the need for emphasis on
computation of robust statistical techniques
Colin Chen of SAS, Inc. gave a demonstration of
Proc Robustreg which he wrote to do robust
regression. It will be available with SAS,
Version 9 in Fall, 2003

4
Comments Software Availability

SAS users, primarily applied statisticians, can
now do robust regression, thus applying what they
learned in graduate school.
Nonstatisticians, who will never use SAS, are
starting to use R, and by extension, Splus.
We must write code in R. Good methods will be
incorporated by Splus, and eventually SAS.

5
Research Directions

Literally hundreds of statistical procedures are
in need of robustification. By providing
computational tools, especially in R,
statisticians can compare classical and robust
methods.
The key is to make R code user friendly so
nonstatisticians can use them.

6
Why make your code user friendly?

Disadvantages
It takes extra time.
It doesnt help my career.
It isnt fundable.
Writing code is no fun.
I cant publish it.
No one will use it.

7
Why make your code user friendly?

Advantages
It takes time, but is publishable in Journal of
Statistical Software.
www.jstatsoft.org
Abstracts published in Journal of Computational
and Graphical Statistics (JCGS)

8
Journal of Statistical Software

Types of Papers JSS will publish
Manuals, user's guides, and other forms of
description of statistical software.
The code for new statistical software.
Data sets that are of use to statisticians.
Reviews and comparisons of statistical software.

9
Publishing in JSS

The typical JSS paper will have a section
explaining the statistical technique, a section
explaining the code, a section with the actual
code, and a section with examples. All sections
will be made browsable as well as downloadable.
The papers and code should be accessible to a
broad community of practitioners, teachers, and
researchers in the field of statistics.

10
Why make your code user friendly?

More advantages
It does help your career because
It increases publications.
It is fundable! NSF wants innovative and useful
projects. Useful means, others can and do use
it. That means user friendly code. NSF rarely
funds straight theory anymore.
Nonstatistician who are evaluating you appreciate
it.
Many statisticians appreciate it.

11
Why make your code user friendly?

More Advantages
Writing code may not be fun, but seeing
researchers use your methods is lots of fun!
If you method is useful, researchers will use it
if they know about it and have the tools.
Nonstatisticians dont read statistics journals
so we must publish in their journals!

12
Collaborators in other fields

Why you need them
They have real problems.
They at least double your funding options.
They at least double your publication options.
They will support you and your department

13
The Case of Robust Regression

Everyone agrees its useful.
Nearly 40 years after M-estimates, they are in
SAS!
What happened to the 10 year rule?

14
Why Did it take 40 Years?

Only theory mattered.
Computationally difficult at first.
No two statisticians could agree on the best
robust method.
Computationally harder with high breakdown.
SAS mentality.

15
Funding Statistical Research

No one funds straight theory.
NSF wants useful and innovative.
Useful Collaborators
NIH want medical applications.
Not the next robust regression estimator,
although JASA might accept it.
Everyone funds conferences and workshops!

16
COMMENTS
17
Collaborations resulting from the workshop

Ma, Y., Genton, M. G. (2002) "A semiparametric
class of generalized skew-elliptical
distributions," Institute of Statistics Mimeo
Series 2541,under review.
Nora Muler is working together with Victor Yohai
on robust estimators for GARCH models. I proposed
at ICORS 2002 a robust estimator for the general
GARCH(p,q) model that has pq1 parameters to
estimate. The algorithm in our paper is
implemented only for the GARCH(1,1) case. I was
discussing how to generalize the algorithm to the
general GARCH(p,q) case.

18
Collaborations resulting from the workshop

Robust Methods for Microarray Data Analysis by
Hanga Galfalvy, Steven Grambow,Johanna Hardin and
Arnold Stromberg was started and extensive
progress has since be made.
Rocke, D.M., and D.L. Woodruff, "Multivariate
Outlier Detection and Cluster Identification",
Working Paper

19
Collaborations resulting from the workshop

Chen, et. al. extensively discussed smoothing
algorithms for quantile regression which will
become part of SAS soon.
Discussions lead to Salibian-Barreras
Estimating the p-values of robust tests for the
linear model. Now under revision for JSPI

20
Workshop Benefits

Salibian-Barrera (2003). Estimating the p-values
of robust tests for the linear model. Now under
revision for JSPI
Attending the workshop assisted Matias
Salibian-Barrera and two colleagues application
for a large grant for a computer lab here (over
980K)... The agencies were CFI
(www.innovation.ca) and OIT (www.oit.on.ca). They
were awarded the grant in October 2002.

21
Computational Issues

Discussed possibilities for new algorithmic
strategies for efficient computation of robust
estimators such as LTS
Discussed current research on theoretical
properties of algorithmic estimators... in
particular, the recent work of Hawkins and Olive.
Discussed the impact of new robust procedures in
SAS and Splus on data analysis as well as the
impact on the ability to develop and run
simulations for research in robust methods.
Discussed potential applications of traditional
robust estimators to be emerging field of
microarray/gene expression analysis.

22
More Computational Issues

Many robust techniques are only computable for
small data sets, but the larger the data set, the
more likely robust techniques should be used.
Success stories
Fast LTS, Fast MCD.
Others?????????

23
More Computational Issues

The need for computation of robust singular value
decomposition for large matrices (with
application to microarray data).
The need for computation of robust variogram
estimator in spatial statistics
The need for investigation of similarities
between support vector machines and robust
regression
The need to detect outliers in asymmetric
distributions

24
More Computational Issues

differences between SAS's and Splus' s
implementations of robust regression
The appropriateness of using the robust Wald or
robust F tests when using ANOVA to compare two
nested robust regression models

25
More Computational Issues

how to deal with categorical variables for high
breakdown methods
how to handle multiple root problems for
re-descending M-estimators
Issues in robust SVD. One thing proposed is to
use norm L1 instead of the usual norm L2, the
problem is the orthogonality property is lost.

26
More Computational Issues

In the Skew-Symmetric type distributions, the
coefficients in the skewing function is very
sensitive to even small amount of outliers. In
fact, a small amount of outlier will demand
increasing the order of the polynomial in the
skewing function, yet the extra coefficients are
very hard to estimate.

27
More Computational Issues

Subsampling strategies
Empirical analysis of estimator performance
Robust metrics
Compute high breakdown value estimates with both
continuous and categorical variable
Compute multivariate robust estimates
Smoothing algorithm for regression quantile

28
More Computational Issues

Fast and robust bootstrap methods for robust
regression estimates
Fast and robust estimates for p-values for robust
regression
Fast computation of MM-regression estimates for
high-dimensional data...Maybe related to
Hoaglin-Mosteller-Tukey's sweeping method?

29
Would you do it again?

I would definitely be interested in
participating in another workshop. Attendance at
the 2002 workshop proved to be invaluable. I
learned about new research going on in the robust
statistics field, initiated several new research
collaborations with other conference attendees,
and had the opportunity to meet several key
researchers in the robust field. All in all, it
was an excellent experience.

30
Would you do it again?

Yes, especially after SAS and Splus release their
robust procedures formally.

31
Workshop Participants

Chen, Colin (Lin) . SAS INSTITUTE, INC.
Galfalvy Hanga C. New York State Psychiatric
Institute
Garcia Ben Marta Universidad de Buenos Aires
Genton Marc North Carolina State University
Grambow Steve Duke University Medical Centre

32
Workshop Participants

Hardin,Johanna, PostDoc, Fred Hutchinson Cancer
Research Center, Seattle Washington
He Xuming, Professor, Department of Statistics,
University of Illinois
Kafadar, Karen, Professor, Department of
Mathematics, University of Colorado at Denver
Lin Nan
Ma,Yuanyuan North Carolina State Univerisity

33
Workshop Participants

Muler,Nora, Junior Faculty,Universidad Torcuato
di Tella, Argentina
Stromberg Arnold, Professor, Department of
Statistics, University of Kentucky
Tyler, David, Professor, Department of
Statistics, Rutgers University"
Vanden Branden Karlien, Graduate Student,
Katholieke Universiteit Leuven, Belgium.
Werner, Mark, Graduate Student, University of
Colorado at Denver

34
Workshop Participants

Woodruff ,David, Professor, Department of
Mathematics, University of California at Davis
Zamar, Ruben, Professor, Department of
Statistics, University of British Columbia
Ekblom Hakan, Professor, Lulea University of
Technology, Sweden
Sinha, Sanjoy Assistant Professor, University of
Winnipeg, Canada

Write a Comment

User Comments (0)