GERMINATE A Plant Data Management System - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

GERMINATE A Plant Data Management System

Description:

melongena. Solanum. 1448. CGN17571. NLD037. TUR. 19960109. Yuvorlak Patlican; PI 167373. eggplant ... melongena. Solanum. 9894. CGN18606. NLD037. TUR. 19960617 ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 26
Provided by: jenni236
Category:

less

Transcript and Presenter's Notes

Title: GERMINATE A Plant Data Management System


1
GERMINATEA Plant Data Management System
  • Jennifer Lee (UoD)
  • Guy Davenport (JIC)
  • Andy Flavell (UoD)
  • Dave Marshall (SCRI)
  • Robbie Waugh (SCRI)
  • Jo Dicks (JIC)
  • Noel Ellis (JIC)
  • Mike Ambrose (JIC)
  • Theo van Hintum (CGN)

2
GERMINATE Project
  • http//bioinf.scri.sari.ac.uk/germinate/
  • Developed in PostgreSQL
  • Expected to be released under the GNU public
    license open source
  • Core of the system is based on FAO/IPGRI multi
    crop passport descriptors
  • Additional generic tables to accommodate other
    data types.

3
GERMINATE Project
4
The Database
  • Designed to potentially hold any type of data
    associated with plants.
  • Accession is the entry level to all data. This
    permits complex queries using vastly different
    dataset.
  • Comments related to any item in the database can
    be stored and accessed easily.

5
PK Primary Key FK Foreign Key U Unique
Index I Index
6
PK Primary Key FK Foreign Key U Unique
Index I Index
7
Loading Data
  • Most data can be loaded directly from Excel
    spreadsheets or MS Access tables.
  • The data is either copied directly into the
    database (MS Access tables) or modified by a Perl
    script (for Excel spreadsheets) from a standard
    format. The data is then moved into the
    appropriate tables.

8
(No Transcript)
9
Passport Data
10
Passport Data
11
PK Primary Key FK Foreign Key U Unique
Index I Index
12
Some Curation Necessary
Taxonomy example
13
Collecting Example
Example Accession CGN03335
AccessionCollecting
Collecting
CollectingSites
Collecting and CollectingSites are not updated.
AccessionCollecting is if accession_id and
collecting_id match.
14
Curation
  • Pre-processing interface to allow users to see
    what information is in the database before they
    submit their data.
  • Web based curation tool.
  • Developed at Iowa State University
  • Submitted data must be validated before it is
    available to the public.

15
Data Interaction
PK Primary Key FK Foreign Key U Unique
Index I Index
16
Data Association Levels
PK Primary Key FK Foreign Key U Unique
Index I Index
17
Genetic Data
  • Data stored as 2D array
  • Accessions as dimension 0
  • Markers as dimension 1
  • Data is stored as an integer and decoded in the
    EnumUnits and EnumUnitsArrays table
  • We are using an Allele Index approach which will
    simplify queries significantly in cases where the
    marker type is not important, but only the
    relative allele values are required.
  • Experiment information includes author, date, an
    experiment name and brief description.
  • Method information includes a name and
    description (In general more general than
    experiment, multiple experiments are expected to
    use the same method.) and a link to the units.
  • Units indicate the marker type.

18
Database
Genetic Data in GERMINATE
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension1
dimension0
data (integer data)
Original Data
accessions (reference data)
markers (string data)
integer_data (enum_index)
dataset_id
index_id
string_data
dataset_id
index_id
reference_id
dataset_id
index_id
table_id
5 -gt Accessions table reference_id accession_id
Accessions
Datasets
Metadatasets
accession_id
germinate_id
instcode
accenumb
experiment_id
metadataset_id
data_type_id
dataset_id
method_id
dimension
dataset_id
dataset_discription
size
19
Decoding the Allele Index
enum_table_array_text
Relative comparison
enum_table_array_int
allele_index
enum_index
allele_value
text
unit_id
enum_index
allele_value
enum_index
enum_value
int
array
unit 7 data_value
unit 8 data_value
unit 9 data_value
unit_id
enum_index
enum_value
array
  • Allows expansion of allele values to any ploidy
    level.
  • Match of enum_index between unit types do not
    indicate match in allele phase.
  • Much more efficient than storing the allele
    values as allele, while still permitting
    searching of individual alleles in any polyploidy
    level.
  • If marker type is not important in query the
    methods, units and enum tables can be bypassed in
    a join query to speed up queries.

20
Genetic Map Data
  • 3 sets of data
  • Population
  • Stored in Pedigree table, reference to
    individuals in reference table which links
    population to the dataset.
  • Data used to create map
  • Stored similar to genetic data
  • Map

21
Genetic Map Data
Original Data
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension0
dimension1
loci
position
linkage groups
string data
real data
string data
string_data
dataset_id
dataset_id
real_data
dataset_id
index_id
index_id
string_id
index_id
Additional information can be stored as added
dimensions to the dataset.
22
Image data
  • Store either images or links to images for access
    from an interface.
  • We should be able to map images
  • For example in the case of microarray images the
    spots can be mapped to the accession/sample and
    bring up information on that accession.

23
(No Transcript)
24
Interface
  • Currently we have a light weight Perl-CGI
    interface
  • Working towards a more flexible interface that
    would allow complex query formation from users.
  • Return results as objects will allow navigation
    through data without searching again.

25
Acknowledgments
  • University of Dundee
  • Andy Flavell
  • Scottish Crop Research Institute (SCRI)
  • David Marshall
  • Robbie Waugh
  • Centre for Genetic Resources, The Netherlands
    (CGN)
  • Theo van Hintum
  • John Innes Centre (JIC)
  • Guy Davenport
  • Jo Dicks
  • Mike Ambrose
  • Noel Ellis
  • Funding
  • BBSRC Grant 94/BEP17084, the Bioinformatics and
    E-science program
Write a Comment
User Comments (0)
About PowerShow.com