Title: The AAVSO Data Validation Project Kerriann H. Malatesta Sara J. Beck Gamze Menali AAVSO Headquarters
1The AAVSO Data Validation Project Kerriann H.
MalatestaSara J. BeckGamze MenaliAAVSO
Headquarters25 Birch StreetCambridge, MA 02138
2The AAVSO International Database
- Home to nearly 12.5 million observations since
1911 - Contains over 5,000 known and suspected variable
stars - Contributions from over 6,000 observers worldwide
since 1911
Mira
3Data Validation Project
Two-year, NASA funded project to validate or
error check over 9.5 million observations in the
AAVSO International Database from 1911 through
2001
4Important Notes
- The goal of the project was not to produce a
pretty light curve free of scatter - Data were never deleted from the permanent
archives - No changes were made to the data without
justification from the original observer report
5Goals of Data Validation
- To find, investigate, and resolve discrepant
observations - To flag observations so far removed from other
observers measurements that they would
negatively affect the analysis of data
Eta Car
6Pre-Validation
- Assemble a knowledgeable data validation team,
with each member referred to as a validator - Establish a list of stars to validate
- Develop a standardized set of rules and
procedures - Create programs to assist with the validation
process - Upgrade hardware
7Validation Team
- Janet A. Mattei, Director and Project Principal
Investigator - Elizabeth O. Waagen, Interim Director and Interim
Project Principal Investigator - Rebecca Pellock, Project Team Leader
- Sara Beck, Katherine Davis, Kerriann Malatesta,
Gamze Menali, and Sarah Sechelski, Validators
8Validation Team
In addition, Aaron Price, Michael Saladyga, and
Matthew Templeton provided hardware, programming,
and processing help
9Stars Included
- The stars chosen for validation were of many
types - Eclipsing binary, RR Lyrae, and comparison star
data were not included in this project - Stars were divided by class and distributed for
validation amongst the team members
Artistic impression of a Cataclysmic Variable
10Step 1 Digitization Error Check
- Designation
- Star name
- Comment field code
- Observer initials
The majority of these problems could be resolved
by comparison with the original observers report
11Designation
- Sources of error
- Mismatches between star names and designations as
written by the observer - Data entry software in the early days
- Special HQ program that was run on all reports
from a few prolific observers
12Designation
- Error detection
- Viewing the light curve of a star and flagging
discrepant observations - Records with designation problems are often
wildly off the mean so they are very noticeable
13Designation
- In the case of the special HQ program, a more
systematic approach to errors could be taken
14Star Name
- Error source
- Primarily caused by typographical errors by the
observer and/or keypuncher - Error detection
- Program that searches for name/designation
discrepancies
15Comment Fields
- Error source
- Standardization of the codes and a change in the
placement of the fields - Error detection
- Found by running programs on the database that
looked for non-standard entries
16Observer Initials
- Sources of error
- Card read errors
- Data entry technician misread or mistyped the
observer initials written on a report - The observer used something other than their
official AAVSO initials
17Observer Initials
- Error detection
- Problems were easily found by comparing the data
archives with the Master Observer File
18Digitization Error Corrections
- It was always confirmed that problematic
observations in the archives matched the original
report - Once investigated against the original report,
the validator corrected the data archives to
reflect their findings
Home of the archived paper observer reports
19Step 2 Visual Scrutiny of the Light Curve
- Intense quality-control visual inspection of the
stars light curve
- Check for problems, such as JD and magnitude
errors, and to flag any remaining discrepant
points
Sara Beck working on visual scrutiny of data
20Julian Date and Magnitude Errors
- There are two major causes of JD errors
- Punch-card errors (1967-1981)
- Observers using a JD calendar from the wrong year
or looking at the wrong month - Magnitude errors
- Commonly typographical mistakes
21Rules for Visual Scrutiny
Rules were established to avoid subjective
validation styles
- Lookup rules when to check observations against
the original report - Editing rules instructions intended to produce
homogeneity of editing style amongst validators
22Lookup Rules
Refer to the original paper reports if
- There was a systematic shift in time from the
mean curve for an obvious string of data points
23Lookup Rules
Refer to the original paper reports if
- There was a systematic shift in time from the
mean curve for an obvious string of data points - Any points fell within the observing gap at a
date not reached by other observers
24Lookup Rules
Refer to the original paper reports if
- There was a systematic shift in time from the
mean curve for an obvious string of data points - Any points fell within the observing gap at a
date not reached by other observers - There was an obvious misreport of a magnitude
25Lookup Rules
Refer to the original paper reports if
- There was a systematic shift in time from the
mean curve for an obvious string of data points - Any points fell within the observing gap at a
date not reached by other observers - There was an obvious misreport of a magnitude
- There were reports of bright data points prior to
the discovery of novae or supernovae
26Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
- A significantly bright fainter-than observation
27Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
- A significantly bright fainter-than observation
- A fainter-than observation that fell below the
mean curve
28Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
- A significantly bright fainter-than observation
- A fainter-than observation that fell below the
mean curve - A positive observation that fell outside a 2
magnitude spread centered on the mean
29Editing Rules
Any unresolved discordant observation could be
flagged, only if it was
- A significantly bright fainter-than observation
- A fainter-than observation that fell below the
mean curve - A positive observation that fell outside a 2
magnitude spread centered on the mean - An unfiltered CCD observation of a red star
30Visual Scrutiny The First Look
The data underwent two rounds of visual scrutiny.
The first round was to
- Look for, investigate, and correct any problems
detected in viewing the light curve - Flag any remaining truly discordant points in
accordance with the editing rules
31A before and after look at data validation
32Visual Scrutiny A Second Look
- The data then underwent a second round of viewing
to look for any obvious oversights - The second phase was performed at least one day
later to avoid eye fatigue
SN 1987A
33Step 3 Validation Flag
- Each point in the dataset was marked with a
letter code indicating the validated status
- The data were then ready for download from the
AAVSO web site
34Accessing the Validated Data
The data are available online through
- The AAVSO Light Curve Generator
35Accessing the Validated Data
- Since the data began to be made available via the
AAVSO web site in 2003, over 4,000 downloads of
validated data have been made - Also receive over 700 requests per day through
the Light Curve Generator and the Quick Look File
T Pyx
36Summary and Conclusions
- At completion of the project in September 2004,
nearly 10 million observations contributed by
over 6,000 observers worldwide were made
available via the AAVSO web site
Long-term light curve of SS Cyg
3770 of the discrepant observations were repaired
by comparison with observer report
Prior to 1994, it was not specified what kind of
correction was made to an observation No
typographical error found
38Summary and Conclusions
- Completed within the 2-year deadline and took
9,324 staff hours to accomplish - Has made the accessibility of data, in most
cases, nearly instantaneous via the web - Significantly cut down the number of data
requests filled in-house
39Validation Plans for the Future
- Working with Caltech to make the validated data
available through the NASA/IPAC Infrared Science
Archive web site - Filling in the gap of unvalidated data from
January 2002 to the most recent month of archived
data - Validation of Eclipsing Binary, RR Lyrae, and
suspicious comparison stars