A Report on the NSF ICORS 2002 Computational Workshop - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

A Report on the NSF ICORS 2002 Computational Workshop

Description:

... discussed smoothing algorithms for quantile regression which will become part ... Smoothing algorithm for regression quantile. 28. More Computational Issues ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 35
Provided by: agora1
Category:

less

Transcript and Presenter's Notes

Title: A Report on the NSF ICORS 2002 Computational Workshop


1
A Report on the NSF ICORS 2002 Computational
Workshop
  • Arnold Stromberg
  • Department of Statistics
  • With support from the
  • National Science Foundation

ICORS 2003 Antwerp, Belgium July 18, 2003
2
Background
  • The NSF sponsered International workshop on
    Computational Methods for Robust Statistics was
    held immediately following ICORS 2002 in
    Vancouver, BC, Canada from May 18 to May 20, 2002

3
Invited Presentations
  • Doug Martin of Insighful, Inc., the makers of
    Splus, discussed the need for emphasis on
    computation of robust statistical techniques
  • Colin Chen of SAS, Inc. gave a demonstration of
    Proc Robustreg which he wrote to do robust
    regression. It will be available with SAS,
    Version 9 in Fall, 2003

4
Comments Software Availability
  • SAS users, primarily applied statisticians, can
    now do robust regression, thus applying what they
    learned in graduate school.
  • Nonstatisticians, who will never use SAS, are
    starting to use R, and by extension, Splus.
  • We must write code in R. Good methods will be
    incorporated by Splus, and eventually SAS.

5
Research Directions
  • Literally hundreds of statistical procedures are
    in need of robustification. By providing
    computational tools, especially in R,
    statisticians can compare classical and robust
    methods.
  • The key is to make R code user friendly so
    nonstatisticians can use them.

6
Why make your code user friendly?
  • Disadvantages
  • It takes extra time.
  • It doesnt help my career.
  • It isnt fundable.
  • Writing code is no fun.
  • I cant publish it.
  • No one will use it.

7
Why make your code user friendly?
  • Advantages
  • It takes time, but is publishable in Journal of
    Statistical Software.
  • www.jstatsoft.org
  • Abstracts published in Journal of Computational
    and Graphical Statistics (JCGS)

8
Journal of Statistical Software
  • Types of Papers JSS will publish
  • Manuals, user's guides, and other forms of
    description of statistical software.
  • The code for new statistical software.
  • Data sets that are of use to statisticians.
  • Reviews and comparisons of statistical software.

9
Publishing in JSS
  • The typical JSS paper will have a section
    explaining the statistical technique, a section
    explaining the code, a section with the actual
    code, and a section with examples. All sections
    will be made browsable as well as downloadable.
    The papers and code should be accessible to a
    broad community of practitioners, teachers, and
    researchers in the field of statistics.

10
Why make your code user friendly?
  • More advantages
  • It does help your career because
  • It increases publications.
  • It is fundable! NSF wants innovative and useful
    projects. Useful means, others can and do use
    it. That means user friendly code. NSF rarely
    funds straight theory anymore.
  • Nonstatistician who are evaluating you appreciate
    it.
  • Many statisticians appreciate it.

11
Why make your code user friendly?
  • More Advantages
  • Writing code may not be fun, but seeing
    researchers use your methods is lots of fun!
  • If you method is useful, researchers will use it
    if they know about it and have the tools.
    Nonstatisticians dont read statistics journals
    so we must publish in their journals!

12
Collaborators in other fields
  • Why you need them
  • They have real problems.
  • They at least double your funding options.
  • They at least double your publication options.
  • They will support you and your department

13
The Case of Robust Regression
  • Everyone agrees its useful.
  • Nearly 40 years after M-estimates, they are in
    SAS!
  • What happened to the 10 year rule?

14
Why Did it take 40 Years?
  • Only theory mattered.
  • Computationally difficult at first.
  • No two statisticians could agree on the best
    robust method.
  • Computationally harder with high breakdown.
  • SAS mentality.

15
Funding Statistical Research
  • No one funds straight theory.
  • NSF wants useful and innovative.
  • Useful Collaborators
  • NIH want medical applications.
  • Not the next robust regression estimator,
    although JASA might accept it.
  • Everyone funds conferences and workshops!

16
COMMENTS
17
Collaborations resulting from the workshop
  • Ma, Y., Genton, M. G. (2002) "A semiparametric
    class of generalized skew-elliptical
    distributions," Institute of Statistics Mimeo
    Series 2541,under review.
  • Nora Muler is working together with Victor Yohai
    on robust estimators for GARCH models. I proposed
    at ICORS 2002 a robust estimator for the general
    GARCH(p,q) model that has pq1 parameters to
    estimate. The algorithm in our paper is
    implemented only for the GARCH(1,1) case. I was
    discussing how to generalize the algorithm to the
    general GARCH(p,q) case.

18
Collaborations resulting from the workshop
  • Robust Methods for Microarray Data Analysis by
    Hanga Galfalvy, Steven Grambow,Johanna Hardin and
    Arnold Stromberg was started and extensive
    progress has since be made.
  • Rocke, D.M., and D.L. Woodruff, "Multivariate
    Outlier Detection and Cluster Identification",
    Working Paper

19
Collaborations resulting from the workshop
  • Chen, et. al. extensively discussed smoothing
    algorithms for quantile regression which will
    become part of SAS soon.
  • Discussions lead to Salibian-Barreras
    Estimating the p-values of robust tests for the
    linear model. Now under revision for JSPI

20
Workshop Benefits
  • Salibian-Barrera (2003). Estimating the p-values
    of robust tests for the linear model. Now under
    revision for JSPI
  • Attending the workshop assisted Matias
    Salibian-Barrera and two colleagues application
    for a large grant for a computer lab here (over
    980K)... The agencies were CFI
    (www.innovation.ca) and OIT (www.oit.on.ca). They
    were awarded the grant in October 2002.

21
Computational Issues
  • Discussed possibilities for new algorithmic
    strategies for efficient computation of robust
    estimators such as LTS
  • Discussed current research on theoretical
    properties of algorithmic estimators... in
    particular, the recent work of Hawkins and Olive.
  • Discussed the impact of new robust procedures in
    SAS and Splus on data analysis as well as the
    impact on the ability to develop and run
    simulations for research in robust methods.
  • Discussed potential applications of traditional
    robust estimators to be emerging field of
    microarray/gene expression analysis.

22
More Computational Issues
  • Many robust techniques are only computable for
    small data sets, but the larger the data set, the
    more likely robust techniques should be used.
  • Success stories
  • Fast LTS, Fast MCD.
  • Others?????????

23
More Computational Issues
  • The need for computation of robust singular value
    decomposition for large matrices (with
    application to microarray data).
  • The need for computation of robust variogram
    estimator in spatial statistics
  • The need for investigation of similarities
    between support vector machines and robust
    regression
  • The need to detect outliers in asymmetric
    distributions

24
More Computational Issues
  • differences between SAS's and Splus' s
    implementations of robust regression
  • The appropriateness of using the robust Wald or
    robust F tests when using ANOVA to compare two
    nested robust regression models

25
More Computational Issues
  • how to deal with categorical variables for high
    breakdown methods
  • how to handle multiple root problems for
    re-descending M-estimators
  • Issues in robust SVD. One thing proposed is to
    use norm L1 instead of the usual norm L2, the
    problem is the orthogonality property is lost.

26
More Computational Issues
  • In the Skew-Symmetric type distributions, the
    coefficients in the skewing function is very
    sensitive to even small amount of outliers. In
    fact, a small amount of outlier will demand
    increasing the order of the polynomial in the
    skewing function, yet the extra coefficients are
    very hard to estimate.

27
More Computational Issues
  • Subsampling strategies
  • Empirical analysis of estimator performance
  • Robust metrics
  • Compute high breakdown value estimates with both
    continuous and categorical variable
  • Compute multivariate robust estimates
  •   Smoothing algorithm for regression quantile

28
More Computational Issues
  • Fast and robust bootstrap methods for robust
    regression estimates
  • Fast and robust estimates for p-values for robust
    regression
  • Fast computation of MM-regression estimates for
    high-dimensional data...Maybe related to
    Hoaglin-Mosteller-Tukey's sweeping method?

29
Would you do it again?
  • I would definitely be interested in
    participating in another workshop. Attendance at
    the 2002 workshop proved to be invaluable. I
    learned about new research going on in the robust
    statistics field, initiated several new research
    collaborations with other conference attendees,
    and had the opportunity to meet several key
    researchers in the robust field. All in all, it
    was an excellent experience.

30
Would you do it again?
  • Yes, especially after SAS and Splus release their
    robust procedures formally.

31
Workshop Participants
  • Chen, Colin (Lin) . SAS INSTITUTE, INC.
  • Galfalvy Hanga C. New York State Psychiatric
    Institute
  • Garcia Ben Marta Universidad de Buenos Aires
  • Genton Marc North Carolina State University
  • Grambow Steve Duke University Medical Centre

32
Workshop Participants
  • Hardin,Johanna, PostDoc, Fred Hutchinson Cancer
    Research Center, Seattle Washington
  • He Xuming, Professor, Department of Statistics,
    University of Illinois
  • Kafadar, Karen, Professor, Department of
    Mathematics, University of Colorado at Denver
  • Lin Nan
  • Ma,Yuanyuan North Carolina State Univerisity

33
Workshop Participants
  • Muler,Nora, Junior Faculty,Universidad Torcuato
    di Tella, Argentina
  • Stromberg Arnold, Professor, Department of
    Statistics, University of Kentucky
  • Tyler, David, Professor, Department of
    Statistics, Rutgers University"
  • Vanden Branden Karlien, Graduate Student,
    Katholieke Universiteit Leuven, Belgium.
  • Werner, Mark, Graduate Student, University of
    Colorado at Denver

34
Workshop Participants
  • Woodruff ,David, Professor, Department of
    Mathematics, University of California at Davis
  • Zamar, Ruben, Professor, Department of
    Statistics, University of British Columbia
  • Ekblom Hakan, Professor, Lulea University of
    Technology, Sweden
  • Sinha, Sanjoy Assistant Professor, University of
    Winnipeg, Canada
Write a Comment
User Comments (0)
About PowerShow.com