Codes for astrostatistics: StatCodes - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Codes for astrostatistics: StatCodes

Description:

Title: Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State Created Date: 2/27/2004 1:12:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 10
Provided by: userPaMs
Category:

less

Transcript and Presenter's Notes

Title: Codes for astrostatistics: StatCodes


1
Codes for astrostatisticsStatCodes
VOStatEric FeigelsonPenn State
2
Vast range of statistical problemsin modern
astronomy
  • Poisson processes point processes, time series
    analysis
  • Image analysis MLE deconvolution, adaptive
    smoothing, wavelet analyses
  • Multivariate analysis classification (w/ meas
    errors)
  • Survival analysis (censoring truncation w/ meas
    errors)
  • Parametric models Model selection, non-linear
    regression
  • Non-parametric methods
  • Confidence limits bootstrap resampling
  • Prior knowledge Bayesian inference
  • (see talk at PhysStat 2003 conference)

3
The problem
  • Astronomers are insufficiently trained in
  • modern applied statistics ..
  • but even if they knew what to do, they
  • inadequate access to computer codes.

4
  • Astronomers never use large commercial
    statistical packages like SAS, SPSS, Statistica
  • Some astronomers sometimes use UNIX-based
    command-line systems like MatLab or S-Plus.
  • Astronomers like mini-codes in Numerical Recipes
    often write their own codes. Many like IDL
    which has simple statistics.
  • NASA/NSF observatories produce huge data analysis
    codes (IRAF, AIPS, CIAO, ) which by policy avoid
    proprietary codes
  • A few specialized stand-along astrostat codes
    written under NASA funding ROSTAT, ASURV,
    SLOPES, StatPy
  • Altogether this is a very bad situation
  • vast statistical needs
  • with very inadequate codes

5
The rise of the Virtual Observatory
  • Vast collections of calibrated data (images,
    spectra, time series), extracted catalogs
    (rowssources, columnsproperties), and source
    bibliographies emerged during the 1990s.
  • NASA Science Archive Centers (MAST, HEASARC,
    IRSA, LAMDA), bibliographic databases (ADS,
    SIMBAD, NED), more are being transformed into a
    federated (though still distributed
    heterogeneous) system. XML metadata (VOTable),
    SOAP protocols, for data mining extraction.
  • but originally no plan for visualization
  • statistical analysis of extracted datasets

6
StatCodes A partial solution
  • In late-1990s, the Penn State group created a Web
    metasite with annotated links to 200 open source
    packages codes of utility to astronomers.
  • Quite successful 50-100 hits/day for 7 years.
  • Multivariate time series methods most popular.
  • But the collection of on-line codes was
  • very inhomogeneous and incomplete

7
RFinally a broad public-domain statistical
software system emerges
  • Based on the successful commercial UNIX-based
  • S/S-Plus, R has an interactive command-line feel
  • (like IDL), flexible data I/O, acceptable
    graphics,
  • integration to C/Fortran/Python/, and quite a
    lot of
  • sophisticated statistical methods.
  • Core R 2000-page manual with 200
    functionalities, some very complex advanced
  • CRAN 300 add-on packages, dozens useful to
    astronomers. Some are themselves full systems.

8
VOStat A Web service
  • Web form interface providing simple statistical R
    functions with VOTable inputs
  • Same R functions provided through a more
    sophisticated Java-based grid-computing mode.

Heavy data
Dispersed VO
Heavy statistical computation
Requests
User
VOStat server
Answers
data bases
9
VOStat may be a big improvement but
  • Generic Web-based services are inherently
    inflexible limited. VOStat may serve to entice
    the astronomer to download R perform the real
    analysis at home.
  • Astronomers need training in advanced methods
    before using them with R. Penn State has just
    created a Center for Astrostatistics to develop
    curriculum, conduct tutorials, provide template R
    code, etc.
  • R/CRAN does not serve huge VO datasets or some
    special astrostat needs. New methodological/code
    development underway (CMU, Cornell, PSU, UCIrv,)
Write a Comment
User Comments (0)
About PowerShow.com