Data Integration in the Life Sciences - PowerPoint PPT Presentation

About This Presentation
Title:

Data Integration in the Life Sciences

Description:

Generally done in a centralized fashion and as often as desired, having no ... Enables the application designer to develop generic applications that grow as ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 73
Provided by: kengri
Learn more at: https://www.iscb.org
Category:

less

Transcript and Presenter's Notes

Title: Data Integration in the Life Sciences


1
Data Integration in the Life Sciences
  • Kenneth Griffiths and Richard Resnick

2
Tutorial Agenda
  • 130 145 Introduction
  • 145 200 Tutorial Survey
  • 200 300 Approaches to Integration
  • 300 305 Bio Break
  • 305 400 Approaches to Integration (cont.)
  • 400 415 Question and Answer
  • 415 430 Break
  • 430 500 Metadata Session
  • 500 530 Domain-specific example (GxP)
  • 530 Wrap-up

3
Life Science Data
Recent focus on genetic data genomics the study
of genes and their function. Recent advances in
genomics are bringing about a revolution in our
understanding of the molecular mechanisms of
disease, including the complex interplay of
genetic and environmental factors. Genomics is
also stimulating the discovery of breakthrough
healthcare products by revealing thousands of new
biological targets for the development of drugs,
and by giving scientists innovative ways to
design new drugs, vaccines and DNA diagnostics.
Genomics-based therapeutics include "traditional"
small chemical drugs, protein drugs, and
potentially gene therapy. The Pharmaceutical
Research and Manufacturers of America -
http//www.phrma.org/genomics/lexicon/g.html
Study of genes and their function Understanding
molecular mechanisms of disease Development of
drugs, vaccines, and diagnostics
4
The Study of Genes...
  • Chromosomal location
  • Sequence
  • Sequence Variation
  • Splicing
  • Protein Sequence
  • Protein Structure

5
and Their Function
  • Homology
  • Motifs
  • Publications
  • Expression
  • HTS
  • In Vivo/Vitro Functional Characterization

6
Understanding Mechanisms of Disease
7
Development of Drugs, Vaccines, Diagnostics
  • Differing types of Drugs, Vaccines, and
    Diagnostics
  • Small molecules
  • Protein therapeutics
  • Gene therapy
  • In vitro, In vivo diagnostics
  • Development requires
  • Preclinical research
  • Clinical trials
  • Long-term clinical research
  • All of which often feeds back into ongoing
    Genomics research and discovery.

8
The Industrys Problem
  • Too much unintegrated data
  • from a variety of incompatible sources
  • no standard naming convention
  • each with a custom browsing and querying
    mechanism (no common interface)
  • and poor interaction with other data sources

9
What are the Data Sources?
  • Flat Files
  • URLs
  • Proprietary Databases
  • Public Databases
  • Data Marts
  • Spreadsheets
  • Emails

10
Sample Problem Hyperprolactinemia
  • Over production of prolactin
  • prolactin stimulates mammary gland development
    and milk production
  • Hyperprolactinemia is characterized by
  • inappropriate milk production
  • disruption of menstrual cycle
  • can lead to conception difficulty

11
Understanding transcription factors for prolactin
production
Show me all genes in the public literature that
are putatively related to hyperprolactinemia,
have more than 3-fold expression differential
between hyperprolactinemic and normal pituitary
cells, and are homologous to known transcription
factors.
(Q1?Q2?Q3)
12
(No Transcript)
13
Approaches to Integration
  • In order to ask this type of question across
    multiple domains, data integration at some level
    is necessary. When discussing the different
    approaches to data integration, a number of key
    issues need to be addressed
  • Accessing the original data sources
  • Handling redundant as well as missing data
  • Normalizing analytical data from different data
    sources
  • Conforming terminology to industry standards
  • Accessing the integrated data as a single logical
    repository
  • Metadata (used to traverse domains)

14
Approaches to Integration (cont.)
  • So if one agrees that the preceding issues are
    important, where are they addressed? In the
    client application, the middleware, or the
    database? Where they are addressed can make a
    huge difference in usability and performance.
    Currently there are a number of approaches for
    data integration
  • Federated Databases
  • Data Warehousing
  • Indexed Data Sources
  • Memory-mapped Data Structures

15
Federated Database Approach
Show me all genes that are homologous to known
transcription factors
Show me all genes that have more than 3-fold
expression differential between
hyperprolactinemic and normal cells
Show me all genes in the public literature that
are putatively related to hyperprolactinemia
16
Advantages to Federated Database Approach
  • quick to configure
  • architecture is easy to understand - no knowledge
    of the domain is necessary
  • achieves a basic level of integration with
    minimal effort
  • can wrapper and plug in new data sources as they
    come into existence

17
Problems with Federated Database Approach
  • Integration of queries and query results occurs
    at the integrated application level, requiring
    complex low-level logic to be embedded at the
    highest level
  • Naming conventions across systems must be adhered
    to or query results will be inaccurate - imposes
    constraints on original data sources
  • Data sources are not necessarily clean
    integrating dirty data makes integrated dirty
    data.
  • No query optimization across multiple systems can
    be performed
  • If one source system goes down, the entire
    integrated application may fail
  • Not readily suitable for data mining, generic
    visualization tools
  • Relies on CORBA or other middleware technology,
    shown to have performance (and reliability?)
    problems

18
Solving Federated Database Problems
Semantic Cleaning Layer
Middleware (CORBA, DCOM, etc)
PubMed
Proprietary App
Medline
LITERATURE
19
Data Warehousing for Integration
  • Data warehousing is a process as much as it is a
    repository. There are a couple of primary
    concepts behind data warehousing
  • ETL (Extraction, Transformation, Load)
  • Component-based (datamarts)
  • Typically utilizes a dimensional model
  • Metadata-driven

20
Data Warehousing
E (Extraction) T (Transformation) L (Load)
21
Data-level Integration Through Data Warehousing
22
Data Staging
  • Storage area and set of processes that
  • extracts source data
  • transforms data
  • cleans incorrect data, resolves missing elements,
    standards conformance
  • purges fields not needed
  • combines data sources
  • creates surrogate keys for data to avoid
    dependence on legacy keys
  • builds aggregates where needed
  • archives/logs
  • loads and indexes data
  • Does not provide query or presentation services

23
Data Staging (cont.)
  • Sixty to seventy percent of development is here
  • Engineering is generally done using database
    automation and scripting technology
  • Staging environment is often an RDBMS
  • Generally done in a centralized fashion and as
    often as desired, having no effect on source
    systems
  • Solves the integration problem once and for all,
    for most queries

24
Warehouse Development and Deployment
Two development paradigms Top-down warehouse
design conceptualize the entire warehouse, then
build, tends to take 2 years or more, and
requirements change too quickly Bottom-up design
and deployment pivoted around completely
functional subsections of the Warehouse
architecture, takes 2 months, enables modular
development.
25
Warehouse Development and Deployment (cont.)
  • The Data Mart
  • A logical subset of the complete data warehouse
  • represents a completable project
  • by itself is a fully functional data warehouse
  • A Data Warehouse is the union of all constituent
    data marts.
  • Enables bottom-up development

26
Warehouse Development and Deployment (cont.)
  • Examples of data marts in Life Science
  • Sequence/Annotation - brings together sequence
    and annotation from public and proprietary dbs
  • Expression Profiling datamart - integrates
    multiple TxP approaches (cDNA, oligo)
  • High-throughput screening datamart - stores HTS
    information on proprietary high-throughput
    compound screens
  • Clinical trial datamart - integrates clinical
    trial information from multiple trials
  • All of these data marts are pieced together along
    conformed entities as they are developed, bottom
    up

27
Advantages of Data-level Integration Through
Data Warehousing
  • Integration of data occurs at the lowest level,
    eliminating the need for integration of queries
    and query results
  • Run-time semantic cleaning services are no longer
    required - this work is performed in the data
    staging environment
  • FAST!
  • Original source systems are left completely
    untouched, and if they go down, the Data
    Warehouse still functions
  • Query optimization across multiple systems data
    can be performed
  • Readily suitable for data mining by generic
    visualization tools

28
Issues with Data-level Integration Through Data
Warehousing
  • ETL process can take considerable time and effort
  • Requires an understanding of the domain to
    represent relationships among objects correctly
  • More scalable when accompanied by a Metadata
    repository which provides a layer of abstraction
    over the warehouse to be used by the application.
    Building this repository requires additional
    effort.

29
Indexing Data Sources
  • Indexes and links a large number of data sources
    (e.g., files, URLs)
  • Data integration takes place by using the results
    of one query to link and jump to a keyed record
    in another location
  • Users have the ability to develop custom
    applications by using a vendor-specific language

30
Indexed Data Source Architecture
Index Traversal Support Mechanism
31
Indexed Data Sources Pros and Cons
  • Advantages
  • quick to set up
  • easy to understand
  • achieves a basic level of integration with
    minimal effort
  • Disadvantages
  • does not clean and normalize the data
  • does not have a way to directly integrate data
    from relational DBMSs
  • difficult to browse and mine
  • sometimes requires knowledge of a vendor-specific
    language

32
Memory-mapped Integration
  • The idea behind this approach is to integrate the
    actual analytical data in memory and not in a
    relational database system
  • Performance is fast since the application
    retrieves the data from memory rather than disk
  • True data integration is achieved for the
    analytical data but the descriptive or
    complementary data resides in separate databases

33
Memory Map Architecture
Sample/Source Information
Sequence DB 2
Descriptive Information
Sequence DB 1
Data Integration Layer
CORBA
34
Memory Maps Pros and Cons
  • Disadvantages
  • typically does not put non-analytical data (gene
    names, tissue types, etc.) through the ETL
    process
  • not easily extensible when adding new databases
    with descriptive information
  • performance hit when accessing anything outside
    of memory (tough to optimize)
  • scalability restricted by memory limitations of
    machine
  • difficult to mine due to complicated architecture
  • Advantages
  • true analytical data integration
  • quick access
  • cleans analytical data
  • simple matrix representation

35
The Need for Metadata
  • For all of the previous approaches, one
    underlying concept plays a critical role to their
    success Metadata.
  • Metadata is a concept that many people still do
    not fully understand. Some common questions
    include
  • What is it?
  • Where does it come from?
  • Where do you keep it?
  • How is it used?

36
Metadata
The data about the data
  • Describes data types, relationships, joins,
    histories, etc.
  • A layer of abstraction, much like a middle layer,
    except...
  • Stored in the same repository as the data,
    accessed in a consistent database-like way

37
Metadata (cont.)
Back-end metadata - supports the
developers Source system metadata versions,
formats, access stats, verbose information Busines
s metadata schedules, logs, procedures,
definitions, maps, security Database metadata -
data models, indexes, physical logical design,
security Front-end metadata - supports the
scientist and application Nomenclature metadata -
valid terms, mapping of DB field names to
understandable names Query metadata - query
templates, join specifications, views, can
include back-end metadata Reporting/visualization
metadata - template definitions, association
maps, transformations Application security
metadata - security profiles at the application
level
38
Metadata Benefits
  • Enables the application designer to develop
    generic applications that grow as the data grows
  • Provides a repository for the scientist to become
    better informed on the nature of the information
    in the database
  • Is a high-performance alternative to developing
    an object-relational layer between the database
    and the application
  • Extends gracefully as the database extends

39
(No Transcript)
40
Integration Technologies
  • Technologies that support integration efforts
  • Data Interchange
  • Object Brokering
  • Modeling techniques

41
Data Interchange
  • Standards for inter-process and inter-domain
    communication
  • Two types of data
  • Data the actual information that is being
    interchanged
  • Metadata the information on the structural and
    semantic aspects of the Data
  • Examples
  • EMBL format
  • ASN.1
  • XML

42
XML Emerges
  • Allows uniform description of data and metadata
  • Metadata described through DTDs
  • Data conforms to metadata description
  • Provides open source solution for data
    integration between components
  • Lots of support in CompSci community
    (proportional to cardinality of Perl modules
    developed)
  • XMLCGI - a module to convert CGI parameters to
    and from XML
  • XMLDOM - a Perl extension to XMLParser. It
    adds a new 'Style' to XMLParser,called 'Dom',
    that allows XMLParser to build an Object
    Oriented data structure with a DOM Level 1
    compliant interface.
  • XMLDumper - a simple package to experiment with
    converting Perl data structures to XML and
    converting XML to perl data structures.
  • XMLEncoding - a subclass of XMLParser, parses
    encoding map XML files.
  • XMLGenerator is an extremely simple module to
    help in the generation of XML.
  • XMLGrove - provides simple objects for parsed
    XML documents. The objects may be modified but no
    checking is performed.
  • XMLParser - a Perl extension interface to James
    Clark's XML parser, expat
  • XMLQL - an early implementation of a note
    published by the W3C called "XML-QL A Query
    Language for XML".
  • XMLXQL - a Perl extension that allows you to
    perform XQL queries on XML object trees.

43
XML in Life Sciences
  • Lots of momentum in Bio community
  • GFF (Gene Finding Features)
  • GAME (Genomic Annotation Markup Elements)
  • BIOML (BioPolymer markup language)
  • EBIs XML format for gene expression data
  • Will be used to specify ontological descriptions
    of Biology data

44
XML DTDs
  • Interchange format defined through a DTD
    Document Type Definition
  • lt!ELEMENT bioxml-gameseq_relationship
    (bioxml-gamespan, bioxml-gamealignment?)gt
    lt!ATTLIST bioxml-gameseq_relationship seq
    IDREF IMPLIED type (query subject peer
    subseq) IMPLIED gt
  • And data conforms to DTD
  • ltseq_relationship seq"seq1 "type"query"gt
  • ltspangt
  • ltbegingt10lt/begingt
  • ltendgt15lt/endgt
  • lt/spangt
  • lt/seq_relationshipgt

ltseq_relationship seq"seq2" type"subject"gt
ltspangt ltbegingt20lt/begingt ltendgt25lt/endgt
lt/spangt ltalignmentgt query atgccg
subject atgacg
lt/alignmentgtlt/seq_relationshipgt
45
XML Summary
Benefits
Drawbacks
  • Metadata and data have same format
  • HTML-like
  • Broad support in CompSci and Biology
  • Sufficiently flexible to represent any data model
  • XSL style sheets map from one DTD to another
  • Doesnt allow for abstraction or partial
    inheritance
  • Interchange can be slow in certain data migration
    tasks

46
Object Brokering
  • The details of data can often be encapsulated in
    objects
  • Only the interfaces need definition
  • Forget DTDs and data description
  • Mechanisms for moving objects around based solely
    on their interfaces would allow for seamless
    integration

47
Enter CORBA
  • Common Object Request Broker Architecture
  • Applications have access to method calls through
    IDL stubs
  • Makes a method call which is transferred through
    an ORB to the Object implementation
  • Implementation returns result back through ORB

48
CORBA IDL
  • IDL Interface Definition Language
  • Like C/Java headers, but with slightly more
    type flexibility

49
CORBA Summary
Benefits
Drawbacks
  • Distributed
  • Component-based architecture
  • Promotes reuse
  • Doesnt require knowledge of implementation
  • Platform independent
  • Distributed
  • Level of abstraction is sometimes not useful
  • Can be slow to broker objects
  • Different ORBS do different things
  • Unreliable?
  • OMG website is brutal

50
Modeling Techniques
  • E-R Modeling
  • Optimized for transactional data
  • Eliminates redundant data
  • Preserves dependencies in UPDATEs
  • Doesnt allow for inconsistent data
  • Useful for transactional systems
  • Dimensional Modeling
  • Optimized for queryability and performance
  • Does not eliminate redundant data, where
    appropriate
  • Constraints unenforced
  • Models data as a hypercube
  • Useful for analytical systems

51
Illustrating Dimensional Data Space
Nomenclature x, y, z, and t are
dimensions temperature is a fact the data
space is a hypercube of size 4
52
Dimensional Modeling Primer
  • Represents the data domain as a collection of
    hypercubes that share dimensions
  • Allows for highly understandable data spaces
  • Direct optimizations for such configurations are
    provided through most DBMS frameworks
  • Supports data mining and statistical methods such
    as multi-dimensional scaling, clustering,
    self-organizing maps
  • Ties in directly with most generalized
    visualization tools
  • Only two types of entities - dimensions and facts

53
Dimensional Modeling Primer -Relational
Representation
  • Contains a table for each dimension
  • Contains one central table for all facts, with a
    multi-part key
  • Each dimension table has a single part primary
    key that corresponds to exactly one of the
    components of the multipart key in the fact table.

The Star Schema the basic component of
Dimensional Modeling
54
Dimensional Modeling Primer -Relational
Representation
  • Each dimension table most often contains
    descriptive textual information about a
    particular scientific object. Dimension tables
    are typically the entry points into a datamart.
    Examples Gene, Sample, Experiment
  • The fact table relates the dimensions that
    surround it, expressing a many-to-many
    relationship. The more useful fact tables also
    contain facts about the relationship --
    additional information not stored in any of the
    dimension tables.

X Dimension
PK
The Star Schema the basic component of
Dimensional Modeling
Temperature Fact
FK
Y Dimension
FK
PK
CK
Z Dimension
FK
FK
PK
PK
Time Dimension
55
Dimensional Modeling Primer -Relational
Representation
  • Dimension tables are typically small, on the
    order of 100 to 100,000 records. Each record
    measures a physical or conceptual entity.
  • The fact table is typically very large, on the
    order of 1,000,000 or more records. Each record
    measures a fact around a grouping of physical or
    conceptual entities.

X Dimension
PK
The Star Schema the basic component of
Dimensional Modeling
Temperature Fact
FK
Y Dimension
FK
PK
CK
Z Dimension
FK
FK
PK
PK
Time Dimension
56
Dimensional Modeling Primer -Relational
Representation
  • Neither dimension tables nor fact tables are
    necessarily normalized!
  • Normalization increases complexity of design,
    worsens performance with joins
  • Non-normalized tables can easily be understood
    with SELECT and GROUP BY
  • Database tablespace is therefore required to be
    larger to store the same data - the gain in
    overall performance and understandability
    outweighs the cost of extra disks!

X Dimension
PK
The Star Schema the basic component of
Dimensional Modeling
Temperature Fact
FK
Y Dimension
FK
PK
CK
Z Dimension
FK
FK
PK
PK
Time Dimension
57
Sequence Clustering
Case in Point
Run run_id who when purpose
Show me all sequences in the same cluster as
sequence XA501 from my last run.
Sequence seq_id bases length
  • PROBLEMS
  • not browsable (confusing)
  • poor query performance
  • little or no data mining support

58
Dimensionally Speaking
Sequence Clustering
Show me all sequences in the same cluster as
sequence XA501 from my last run.
CONCEPTUAL IDEA - The Star Schema A historical,
denormalized, subject-oriented view of scientific
facts -- the data mart. A centralized fact table
stores the single scientific fact of sequence
membership in cluster and a subcluster. Smaller
dimensional tables around the fact table
represent key scientific objects (e.g., sequence).
Membership Facts seq_id cluster_id subcluster_id r
un_id paramset_id run_date run_initiator seq_star
t seq_end seq_orientation cluster_size subcluster_
size
  • Benefits
  • Highly browsable, understandable model for
    scientists
  • Vastly improved query performance
  • Immediate data mining support
  • Extensible database componentry model

59
Dimensional Modeling - Strengths
  • Predictable, standard framework allows database
    systems and end user query tools to make strong
    assumptions about the data
  • Star schemas withstand unexpected changes in user
    behavior -- every dimension is equivalent
    symmetrically equal entry points into the fact
    table.
  • Gracefully extensible to accommodate unexpected
    new data elements and design decisions
  • High performance, optimized for analytical queries

60
The Need for Standards
  • In order for any integration effort to be
    successful, there needs to be agreement on
    certain topics
  • Ontologies concepts, objects, and their
    relationships
  • Object models how are the ontologies represented
    as objects
  • Data models how the objects and data are stored
    persistently

61
Standard Bio-Ontologies
  • Currently, there are efforts being undertaken to
    help identify a practical set of technologies
    that will aid in the knowledge management and
    exchange of concepts and representations in the
    life sciences.
  • GO Consortium http//genome-www.stanford.edu/GO/
  • The third annual Bio-Ontologies meeting is being
    held after ISMB 2000 on August 24th.

62
Standard Object Models
  • Currently, there is an effort being undertaken to
    develop object models for the different domains
    in the Life Sciences. This is primarily being
    done by the Life Science Research (LSR) working
    group within the OMG (Object Management Group).
    Please see their homepage for further details
  • http//www.omg.org/homepages/lsr/index.html

63
In Conclusion
  • Data integration is the problem to solve to
    support human and computer discovery in the Life
    Sciences.
  • There are a number of approaches one can take to
    achieve data integration.
  • Each approach has advantages and disadvantages
    associated with it. Particular problem spaces
    require particular solutions.
  • Regardless of the approach, Metadata is a
    critical component for any integrated repository.
  • Many technologies exist to support integration.
  • Technologies do nothing without syntactic and
    semantic standards.

64
(No Transcript)
65
Accessing Integrated Data
  • Once you have an integrated repository of
    information, access tools enable future
    experimental design and discovery. They can be
    categorized into four types
  • browsing tools
  • query tools
  • visualization tools
  • mining tools

66
Browsing
  • One of the most critical requirements that is
    overlooked is the ability to browse the
    integrated repository since users typically do
    not know what is in it and are not familiar with
    other investigators projects. Requirements
    include
  • ability to view summary data
  • ability to view high level descriptive
    information on a variety of objects (projects,
    genes, tissues, etc.)
  • ability to dynamically build queries while
    browsing (using a wizard or drag and drop
    mechanism)

67
Querying
  • Along with browsing, retrieving the data from the
    repository is one of the most underdeveloped
    areas in bioinformatics. All of the
    visualization tools that are currently available
    are great at visualizing data. But if users
    cannot get their data into these tools, how
    useful are they? Requirements include
  • ability to intelligently help the user build
    ad-hoc queries (wizard paradigm, dynamic
    filtering of values)
  • provide a power user interface for analysts
    (query templates with the ability to edit the
    actual SQL)
  • should allow users to iterate over the queries so
    they do not have to build them from scratch each
    time
  • should be tightly integrated with the browser to
    allow for easier query construction

68
Visualizing
  • There are a number of visualization tools
    currently available to help investigators analyze
    their data. Some are easier to use than others
    and some are better suited for either smaller or
    larger data sets. Regardless, they should all
    provide the ability to
  • be easy to use
  • save templates which can be used in future
    visualizations
  • view different slices of the data simultaneously
  • apply complex statistical rules and algorithms to
    the data to help elucidate associations and
    relationships

69
Data Mining
  • Life science has large volumes of data that, in
    its rawest form, is not easy to use to help drive
    new experimentation. Ideally, one would like to
    automate data mining tools to extract
    information by allowing them to take advantage
    of a predicable database architecture. This is
    more easily attainable using dimensional modeling
    (star schemas), however, since E-R schemas are
    very different from database to database and do
    not conform to any standard architecture.

70
(No Transcript)
71
Database Schemas for 3 independent Genomics
systems
Homology Data
ORGANISM
SEQUENCE_DATABASE
Organism_Key
Seq_DB_Key
Seq_DB_Key
Seq_DB_Name
Species
GE_RESULTS
QUALIFIER
SCORE
Results_Key
Qualifier_Key
Score_Key
PARAMETER_SET
Analysis_Key
SEQUENCE
PARAMETER_SET
Map_Key
Alignment_Key
Parametet_Set_Key
Parameter_Set_Key
Sequence_Key
Chip_Key
Parameter_Set_Key
P_Value
Qualifier_Key
Gene_Name
Algorithm_key
Score
Map_Key
RNA_Source_Key
Percent_Homology
Qualifier_Key
Expression_Level
Seq_DB_Key
GENOTYPE
Absent_Present
RNA_SOURCE
Type
Fold_Change
Genotype_Key
ALIGNMENT
RNA_Source_Key
Name
Type
Alignment_Key
Name
ALGORITHM
Treatment_Key
Algorithm_key
Algorithm_key
Genotype_Key
Sequence_Key
Cell_Line_Key
Algorithm_Name
Tissue_Key
TREATMENT
CELL_LINE
Disease_Key
Treatmemt_Key
Cell_Line_Key
Species
Name
Name
ALLELE
CHIP
ANALYSIS
Allele_Key
MAP_POSITION
DISEASE
TISSUE
Chip_Key
Analysis_Key
Disease_Key
Tissue_Key
Map_Key
Map_Key
Chip_Name
Allele_Name
Analysis_Decision
Name
Name
Species
Base_Change
SNP_FREQUENCY
Frequency_Key
STS_SOURCE
PCR_PROTOCOL
Linkage_Key
Source_Key
Protocol_Key
Gene Expression
Population_Key
Method_Key
Allele_Key
Source_Key
Allele_Frequency
SNP_METHOD
Buffer_Key
Method_Key
Linkage
SNP_POPULATION
Linkage_Key
PCR_BUFFER
Population_Key
Disease_Link
Buffer_Key
Sample_Size
SNP Data
Linkage_Distance
72
The Warehouse
Three star schemas of heterogenous data joined
through a conformed dimension
Gene Expression
Conformed sequence dimension
SNP Data
Homology Data
Write a Comment
User Comments (0)
About PowerShow.com