The AAVSO Data Validation Project Kerriann H. Malatesta Sara J. Beck Gamze Menali AAVSO Headquarters - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

The AAVSO Data Validation Project Kerriann H. Malatesta Sara J. Beck Gamze Menali AAVSO Headquarters

Description:

Sara Beck, Katherine Davis, Kerriann Malatesta, Gamze Menali, and Sarah Sechelski, Validators ... Sara Beck working on visual scrutiny of data. Julian Date and ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 40
Provided by: aav2
Category:

less

Transcript and Presenter's Notes

Title: The AAVSO Data Validation Project Kerriann H. Malatesta Sara J. Beck Gamze Menali AAVSO Headquarters


1
The AAVSO Data Validation Project Kerriann H.
MalatestaSara J. BeckGamze MenaliAAVSO
Headquarters25 Birch StreetCambridge, MA 02138
2
The AAVSO International Database
  • Home to nearly 12.5 million observations since
    1911
  • Contains over 5,000 known and suspected variable
    stars
  • Contributions from over 6,000 observers worldwide
    since 1911

Mira
3
Data Validation Project
Two-year, NASA funded project to validate or
error check over 9.5 million observations in the
AAVSO International Database from 1911 through
2001
4
Important Notes
  • The goal of the project was not to produce a
    pretty light curve free of scatter
  • Data were never deleted from the permanent
    archives
  • No changes were made to the data without
    justification from the original observer report

5
Goals of Data Validation
  • To find, investigate, and resolve discrepant
    observations
  • To flag observations so far removed from other
    observers measurements that they would
    negatively affect the analysis of data

Eta Car
6
Pre-Validation
  • Assemble a knowledgeable data validation team,
    with each member referred to as a validator
  • Establish a list of stars to validate
  • Develop a standardized set of rules and
    procedures
  • Create programs to assist with the validation
    process
  • Upgrade hardware

7
Validation Team
  • Janet A. Mattei, Director and Project Principal
    Investigator
  • Elizabeth O. Waagen, Interim Director and Interim
    Project Principal Investigator
  • Rebecca Pellock, Project Team Leader
  • Sara Beck, Katherine Davis, Kerriann Malatesta,
    Gamze Menali, and Sarah Sechelski, Validators

8
Validation Team
In addition, Aaron Price, Michael Saladyga, and
Matthew Templeton provided hardware, programming,
and processing help
9
Stars Included
  • The stars chosen for validation were of many
    types
  • Eclipsing binary, RR Lyrae, and comparison star
    data were not included in this project
  • Stars were divided by class and distributed for
    validation amongst the team members

Artistic impression of a Cataclysmic Variable
10
Step 1 Digitization Error Check
  • Designation
  • Star name
  • Comment field code
  • Observer initials

The majority of these problems could be resolved
by comparison with the original observers report
11
Designation
  • Sources of error
  • Mismatches between star names and designations as
    written by the observer
  • Data entry software in the early days
  • Special HQ program that was run on all reports
    from a few prolific observers

12
Designation
  • Error detection
  • Viewing the light curve of a star and flagging
    discrepant observations
  • Records with designation problems are often
    wildly off the mean so they are very noticeable

13
Designation
  • In the case of the special HQ program, a more
    systematic approach to errors could be taken

14
Star Name
  • Error source
  • Primarily caused by typographical errors by the
    observer and/or keypuncher
  • Error detection
  • Program that searches for name/designation
    discrepancies

15
Comment Fields
  • Error source
  • Standardization of the codes and a change in the
    placement of the fields
  • Error detection
  • Found by running programs on the database that
    looked for non-standard entries

16
Observer Initials
  • Sources of error
  • Card read errors
  • Data entry technician misread or mistyped the
    observer initials written on a report
  • The observer used something other than their
    official AAVSO initials

17
Observer Initials
  • Error detection
  • Problems were easily found by comparing the data
    archives with the Master Observer File

18
Digitization Error Corrections
  • It was always confirmed that problematic
    observations in the archives matched the original
    report
  • Once investigated against the original report,
    the validator corrected the data archives to
    reflect their findings

Home of the archived paper observer reports
19
Step 2 Visual Scrutiny of the Light Curve
  • Intense quality-control visual inspection of the
    stars light curve
  • Check for problems, such as JD and magnitude
    errors, and to flag any remaining discrepant
    points

Sara Beck working on visual scrutiny of data
20
Julian Date and Magnitude Errors
  • There are two major causes of JD errors
  • Punch-card errors (1967-1981)
  • Observers using a JD calendar from the wrong year
    or looking at the wrong month
  • Magnitude errors
  • Commonly typographical mistakes

21
Rules for Visual Scrutiny
Rules were established to avoid subjective
validation styles
  • Lookup rules when to check observations against
    the original report
  • Editing rules instructions intended to produce
    homogeneity of editing style amongst validators

22
Lookup Rules
Refer to the original paper reports if
  • There was a systematic shift in time from the
    mean curve for an obvious string of data points

23
Lookup Rules
Refer to the original paper reports if
  • There was a systematic shift in time from the
    mean curve for an obvious string of data points
  • Any points fell within the observing gap at a
    date not reached by other observers

24
Lookup Rules
Refer to the original paper reports if
  • There was a systematic shift in time from the
    mean curve for an obvious string of data points
  • Any points fell within the observing gap at a
    date not reached by other observers
  • There was an obvious misreport of a magnitude

25
Lookup Rules
Refer to the original paper reports if
  • There was a systematic shift in time from the
    mean curve for an obvious string of data points
  • Any points fell within the observing gap at a
    date not reached by other observers
  • There was an obvious misreport of a magnitude
  • There were reports of bright data points prior to
    the discovery of novae or supernovae

26
Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
  • A significantly bright fainter-than observation

27
Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
  • A significantly bright fainter-than observation
  • A fainter-than observation that fell below the
    mean curve

28
Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
  • A significantly bright fainter-than observation
  • A fainter-than observation that fell below the
    mean curve
  • A positive observation that fell outside a 2
    magnitude spread centered on the mean

29
Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
  • A significantly bright fainter-than observation
  • A fainter-than observation that fell below the
    mean curve
  • A positive observation that fell outside a 2
    magnitude spread centered on the mean
  • An unfiltered CCD observation of a red star

30
Visual Scrutiny The First Look
The data underwent two rounds of visual scrutiny.
The first round was to
  • Look for, investigate, and correct any problems
    detected in viewing the light curve
  • Flag any remaining truly discordant points in
    accordance with the editing rules

31
A before and after look at data validation
32
Visual Scrutiny A Second Look
  • The data then underwent a second round of viewing
    to look for any obvious oversights
  • The second phase was performed at least one day
    later to avoid eye fatigue

SN 1987A
33
Step 3 Validation Flag
  • Each point in the dataset was marked with a
    letter code indicating the validated status
  • The data were then ready for download from the
    AAVSO web site

34
Accessing the Validated Data
The data are available online through
  • The AAVSO Light Curve Generator
  • Downloaded
  • data file

35
Accessing the Validated Data
  • Since the data began to be made available via the
    AAVSO web site in 2003, over 4,000 downloads of
    validated data have been made
  • Also receive over 700 requests per day through
    the Light Curve Generator and the Quick Look File

T Pyx
36
Summary and Conclusions
  • At completion of the project in September 2004,
    nearly 10 million observations contributed by
    over 6,000 observers worldwide were made
    available via the AAVSO web site

Long-term light curve of SS Cyg
37
70 of the discrepant observations were repaired
by comparison with observer report
Prior to 1994, it was not specified what kind of
correction was made to an observation No
typographical error found
38
Summary and Conclusions
  • Completed within the 2-year deadline and took
    9,324 staff hours to accomplish
  • Has made the accessibility of data, in most
    cases, nearly instantaneous via the web
  • Significantly cut down the number of data
    requests filled in-house

39
Validation Plans for the Future
  • Working with Caltech to make the validated data
    available through the NASA/IPAC Infrared Science
    Archive web site
  • Filling in the gap of unvalidated data from
    January 2002 to the most recent month of archived
    data
  • Validation of Eclipsing Binary, RR Lyrae, and
    suspicious comparison stars
Write a Comment
User Comments (0)
About PowerShow.com