Title: Bioinformatics: Definitions, Challenges and Impact on Health Care Systems
1Bioinformatics Definitions, Challenges and
Impact on Health Care Systems
- Joyce Mitchell, Ph.D.
- University of Utah
- Sept 29, 2005
- NLMs Woods Hole Informatics Course
2Outline for Talk
- What is Bioinformatics?
- Health Informatics compared to Bioinformatics
- Problems considered in Bioinformatics
- Genomics, proteomics, transcriptomics, etc
- Genomics data and patient care
- Impact of Bioinformatics on Health Information
Systems
3Central Dogma of Molecular Biology
Transcription
DNA
RNA
Protein
Phenotype
Phenotype
Translation
Replication
This happens in Cells.
41. What is Bioinformatics?
5NIH Working Definition
- Bioinformatics Research, development, or
application of computational tools and approaches
for expanding the use of biological, medical,
behavioral or health data, including those to
acquire, store, organize, archive, analyze, or
visualize such data. - http//www.bisti.nih.gov/CompuBioDef.pdf
6Another Definition
- An interdisciplinary area at the intersection of
biological, computer, and information sciences
necessary to manage, process, and understand
large amounts of data, for instance from the
sequencing of the human genome, or from large
databases containing information about plants and
animals for use in discovering and developing new
drugs. www.isye.gatech.edu/tg/publications/ecolo
gy/eolss/node2.html
7Another definitionNCBI (National Center for
Biotechnology Information
- Bioinformatics is the field of science in which
biology, computer science, and information
technology merge into a single discipline. The
ultimate goal of the field is to enable the
discovery of new biological insights and to
create a global perspective from which unifying
principles in biology can be discerned. There are
sub-disciplines in bioinformatics. - http//www.ncbi.nlm.nih.gov/About/primer/bioinform
atics.html
82. Health Informatics Compared to Bioinformatics
- Same methods, different application domains
9Different Areas of Strengths
- Bioinformatics has much more data available on
the Internet than Health Informatics - Much more progress on database integration across
multiple data sources - Health Informatics has much more need for
aggregation of national statistics - Much more progress on terminologies for
integration of data
10Bioinformatics Health Informatics
- Bioinformatics is the study of the flow of
information in biological sciences. - Health Informatics is the study of the flow of
information in patient care. - These two field are on a collision course as
genomics data becomes used in patient care. - Russ Altman,MD, Ph.D., Stanford Univ.
113. Problems Considered in Bioinformatics
12Omes and Omics
- Genomics
- Primarily sequences (DNA and RNA)
- Databanks and search algorithms
- Proteomics
- Sequences (Protein)
- Mass spectrometry, X-ray crystallography
- Databanks, knowledge bases, terminologies
- Functional Genomics (transcriptomics)
- Microarray data
- Databanks, analysis tools, traversal techniques
- Systems Biology (metabolomics)
- Metabolites and interacting systems
(interactomics) - Graphs, visualization, modeling, networks of
entities
13Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
14Genome and Genomics
- Genome entire complement of DNA in a species
- Both nuclear and mitochondrial/chloroplast
- Variants among individuals
- Genomics study of the sequence, structure and
function of the genome. Study of whole sets of
genes rather than single genes. - Comparative genomics study of the differences
among species. Usually covers evolutionary
studies of differences conservation over time.
15A Genome Database (e.g., GenBank)
- Consists of long strings of DNA bases ATCG..
- Consists of annotations of this database to
attach meaning to the sequence data. - Example entry from GenBank
- http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val
NM_000410doptgb Hemochromatosis gene HFE
16Human Genome Project
- Human Genome Project - International research
effort - Determine sequence of human genome and other
model organisms - Began 1990, completed 2003
- Next steps for 20,000 genes
- Function and regulation of all genes
- Significance of variations between people
- Cures, therapies, genomic healthcare
17The Human Genome Project has catalyzed striking
paradigm changes in biology - biology is an
information science.
- Leroy Hood, MD, PhD
- Institute for Systems Biology
- Seattle, Washington
18Genomes In Public Databases
- Published complete genomes
- Ongoing prokaryotic genomes
- Ongoing eukaryotic genomes
1560
http//www.genomesonline.org/
19Genomics activities
- Sequence the genes and chromosomes done by
breaking the DNA into parts - Map the location of various gene entities to
establish their order - Compare the sequences with other known sequences
to determine similarity - Across species, conserved sequence motifs
- Predict secondary structure of proteins
- Create large databases GenBank, EMBL, DDBJ
- Develop algorithms and similarity measures
- BLAST and its many forms
20Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype Tissues Organs Organisms
Genomics
Proteomics
Transcriptomics Functional Genetics
21Proteome and Proteomics
- Proteome the entire set of proteins (and other
gene products) made by the genome. - Proteomics study of the interactions among
proteins in the proteome, including networks of
interacting proteins and metabolic
considerations. Also includes differences in
developmental stages, tissues and organs.
22Protein Functions
- Catalysis
- Transport
- Nutrition and storage
- Contraction and mobility
- Structural elements
- Cytoskeleton
- Basement membranes
- Defense mechanisms
- Regulation
- Genetic
- Hormonal
- Buffering capacity
23Protein Databases
- SwissProt
- PIR
http//www.pir.uniprot.org/ - GENE http//www.ncbi.nlm.nih.gov/gene
- InterPro http//www.ebi.ac.uk/interpro/
- Correspond to (and derived from) Genome data
bases - All connected by Reference Sequences (NCBI)
UniProt
24Gene/Protein Database entries
- HFE record in Entrez GENE (NCBI)
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
genecmdretrievedoptGraphicslist_uids3077
25Structure Function Determination
- X-ray crystallography
- Nuclear magnetic resonance spectroscopy and
tandem MS/MS - Computational modeling
- Sequence alignment from others
- Homology modeling
26Structure Databases
- Contain experimentally determined and predicted
structures of biological molecules - Most structures determined by X-ray
crystallography, NMR - Example MMDB molecular modeling db
http//www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.sh
tml - HFE Entry
- http//www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv
.cgi?form6dbtDoptsuid9816
27Protein Interaction Databases
- Record observations of protein-protein
interactions in cells - Attempts to detail interactions observed in
thousands of small-scale experiments described in
published articles - Examples
- BIND Biomolecular Interaction Network Database
- DIP Database of Interacting Proteins
- MIPS Munich Information Center for Protein
Sequences - PRONET Protein interaction on the Web
- Many others, both academic and commercial
28Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
29Proteome vs Transcriptome
- Functional genomics (transcriptomics) looks at
the timing and regulation of the gene products
(both RNA and proteins) - This is different from looking at what gene
products can be produced it looks at the
circumstances under which production occurs. - Involves experimental conditions.
30Functional Genomics Microarrays
- Transcriptome and transcriptomics
- High throughput technique designed to measure the
increase in RNA (or sometimes proteins, tissues,
etc) in a cell in response to an experiment. - Also called gene expression analysis
- Microarrays called gene chips (although now
there are protein and tissue chips)
31How Do Microarrays Work?
- Conceptual description
- Set of targets (cDNA, proteins, tissues, etc) are
immobilized in predetermined positions on a
substrate - Solution containing tagged molecules capable of
binding to the targets is placed over the targets - Binding occurs between targets and tagged
molecules. - Fluorescent tags allow you to visualize which
targets have been bound (and tell you something
about the molecules that were present in your
solution).
32Animation of Microarrays
- http//www.bio.davidson.edu/courses/genomics/chip/
chip.html
33How Do Microarrays Work?
- Conceptual description
- Set of targets (cDNA, proteins, tissues, etc) are
immobilized in predetermined positions on a
substrate - Solution containing tagged molecules capable of
binding to the targets is placed over the targets - Binding occurs between targets and tagged
molecules. - Fluorescent tags allow you to visualize which
targets have been bound (and tell you something
about the molecules that were present in your
solution).
34How Spotted Arrays Work
- Result
- Spots where cDNA from the reference sample
hybridized look green - Spots where cDNA from the experimental sample
hybridized look red - Spots where cDNA from both samples hybridized
look yellow (greenredyellow) - Spots with little/no cDNA hybridized look black
35(No Transcript)
36(No Transcript)
37Uses of Expression Profiling
- Pharmaceutical research
- ID drug targets by comparing expression profile
of drug-treated cells with those of cells
containing mutations in genes encoding known drug
targets - Disease Dx and Tx
- Distinguish morphologically similar cancers
- DLBCL (Poulsen et al (2005) Microarray-based
classification of diffuse large B-cell lymphomas
European Journal of Haematology 74(6)453-65.)) - Therapy potential
- Rabson AB, Weissmann D. From microarray to
bedside targeting NF-kappaB for therapy of
lymphomas. Clin Cancer Res. 2005 Jan 111(1)2-6.
38Future Applications
- Diagnostic tool to screen for infective agents
- Chip imprinted with set of pathogenic genomes
used to identify bacterial, viral, or parasite
genomic material in patients body fluids - Diagnostic chip to check for mutations involved
in drug-gene interactions.
39Experimental Design (2)
- A fundamental challenge of microarray
experiments underdetermined systems
Kohane IS, Kho AT, Butte AJ. Microarrays for an
Integrative Genomics. (The MIT Press Cambridge,
MA 2003), p. 11.
40MGED Microarray gene expression data
http//www.mged.org/Workgroups/MAGE/mage.html
http//www.mged.org/Workgroups/MIAME/miame.html
41Public Microarray Data Repositories
- Major public repositories
- GEO (NCBI)
- http//www.ncbi.nlm.nih.gov/geo/
- ArrayExpress (EBI)
- http//www.ebi.ac.uk/arrayexpress/
42Standards and Repositories
- Brazma, A, et al. Minimum information about a
microarray experiment (MIAME)-toward standards
for microarray data. Nature Genetics. 2001
Dec29(4)373. - http//www.nature.com/cgi-taf/DynaPage.taf?file/
ng/journal/v29/n4/full/ng1201-365.html - Ball, CA, et al. Submission of Microarray Data to
Public Repositories. PLoS Biology. 2004
September 2 (9) e317 - http//www.pubmedcentral.nih.gov/articlerender.fc
gi?toolpubmedpubmedid15340489
43Controlled Vocabularies
- Genomics, proteomics, and especially microarray
techniques have created a large need for
controlled vocabularies to assist the analyses
across multiple entities species. - Taxonomy systematic classification of objects
according to relationships. - Ontologies
- An organizational framework for concepts
44Controlled Vocabularies in Bioinformatics
- The Gene Ontology http//www.geneontology.org/
- Knowledge capture (the ontology itself)
- Annotation of gene products (for comparisons)
- The MGED Ontology (arising from MIAME)
- http//mged.sourceforge.net/
- Annotation of microarray experiments for public
repositories - Clinical Bioinformatics Ontology
- Annotation of gene tests in electronic medical
records - http//www.cerner.com/cbo
- MIAPE from Proteomics Standards Initiative (PSI)
- http//psidev.sourceforge.net/
454. Genomics Data and Patient Care
- From genotype to phenotype
46Bioinformatics and Patient Care
- Understanding a persons genome ushers the era of
Personalized Medicine - Obviously you should keep track of
health-related genetic data in the EMR. - The 9-11 disaster showed you need to know the
genomic variant information as well. - Cash et al. Forensic bioinformatics in the wake
of the World Trade Center Disaster. PSB
2003638-653.
47Human Disease Gene Specifics
- Genes linked to human diseases (9-2004)
- 425 in 2 yrs
- 1700/20,000 9 of loci
48Genetic Medicine is not new
- Karl Landsteiner started genetic medicine over
100 years ago (1903) - Blood transfusions worked off the ABO blood group
system. - Landsteiner got the Nobel Prize in 1930 for his
work. - http//nobelprize.org/medicine/laureates/1930/land
steiner-bio.html
49Genomic Medicine is New
- What to do with all of this genetic information
and every person being unique? - And the information about genetic conditions is
available on the Internet.
50Genomics Data and Patient Care
- Where do you find the data for genes causing
human diseases? - What do you do with genetic data in electronic
medical records?
51Where do you find the data for genes causing
human diseases?
- Study on availability of genetic data on health
implications of the HGP. - Mitchell, McCray, Bodenreider. Methods Inf Med
2003 42557-63.
52Questions
- What genes cause the condition?
- What are the normal function of the gene?
- What mutations have been linked to diseases?
- How does the mutation alter gene function?
- What laboratories are performing DNA tests?
- Are there gene therapies or clinical trials?
- What names are used to refer to the genes and the
diseases? - What other conditions are linked to these same
genes?
53You can find the answers online
- but it is not easy answers in many places
- Cant navigate by genes names - must use hot
links and numeric identifiers - The number and function of alternate forms of the
protein are inconsistently reported - Synonymy (many names, same meaning) and polysemy
(same name, different meanings) cause confusion - Upper and lower case are used for species
distinctions
54Major Challenges of Navigation
- Complexity of data
- Dynamic nature of the data
- Diverse foci and number of data/knowledge base
systems - Data and knowledge representation lack standards
- Can navigate if you know what you are looking for.
55Genetics Home Reference
- Consumer health resource to help the public
navigate from phenotype to genotype. - Focus on health implications of the Human Genome
Project. - http//ghr.nlm.nih.gov
- Mitchell, Fun, McCray, JAMIA, 2004 Nov
11(6)439-437
56Hands-on with GHR
- Scavenger hunt with hemochromatosis and the genes
that influence it. - Explore the Genetics Home Reference by answering
the following questions. Start at
http//ghr.nlm.nih.gov .
57GHR Scavenger Hunt
- How common is hemochromatosis?
- How many genes have been proven to be involved in
hemochromatosis when the genes are mutated? - What are the symbols for these genes?
- Can you find the link to MedlinePlus with health
information on hemochromatosis?
58GHR Scavenger Hunt
- What are the names of the patient support
associations for hemochromatosis? - One synonym for this condition is bronze
diabetes. Can you find a reason for this? - What kind of damage is done to the liver of
people with hemochromatosis?
59GHR Scavenger Hunt
- For the genes involved in hemochromatosis, how
many of them are available as a DNA test? - Give one place where you would choose to send a
tissue sample for DNA testing. - What sites are listed under Research Resources
for the TFR2 gene? - How many alternately spliced proteins for TFR2?
- In what tissues is this gene expressed?
60GHR Scavenger Hunt
- How do people inherit hemochromatosis?
- Do the genes involved in hemochromatosis cause
other health conditions when they are mutated? - Can you find a protein sequence for one of the
genes? - What clinical trials are available for
hemochromatosis patients close to where you live?
615. Impact of Bioinformatics on Health Information
Systems
- Electronic Medical Record
- Public Health Systems
62Genetics is Impacting Medicine Today!
- 1700 genes health conditions
- gt 1100 gene tests for diagnosis
- Relate to diagnosis, therapy, drug dosage,
occupational hazards, reproductive plans, health
risks, .
63Well-known Examples
- Pharmacogenetics
- CYP450 alleles exaggerated, diminished or
ultra-rapid drug responses. E.G., Warfarin. 93
of patients are OK on standard doses. 7 of
patients have severe hemorrhage. CYP2C92 and
CYP2C93 most severe of 6 known mutations. - Environmental susceptibility
- Sickle Cell trait carrier and malaria parasite
- Nutrition
- PKU and avoidance of phenylalanine
64Another Example Iressa (gefitinib)
- Non-small cell lung CA 140,000 pt/yr
- Iressa (Astra Zeneca) causes remission in 1 of 10
patients if taken daily for life. - Iressa efficacy correlates with EGFR mutation in
the tumor. Now have gene testing for EGFR so can
target appropriate people. http//www.sciencemag.o
rg/cgi/content/full/305/5688/1222a - BUT Astra Zeneca cant make money on only
14,000 per year. - http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id
131550
65Collie Dog Example
- Collies are more sensitive to the anti-parasytic
drug invermectin and loperamide (imodium) and
other drugs - 75 of collies in US have a mutation in the mdr1
gene causing multiple drug sensitivity (50
drugs). Can cause death or neurological damage. - Now have testing available.
- http//www.wral.com/money/3565592/detail.html
66Implications for Health Care System
- More gene tests will be ordered. reports of 300
increase in gene tests in 2003. - Arch Pathol Lab Med 2004, 128(12)1330-1333
- The FDA will regulate panels of tests.
- http//www.fda.gov/bbs/topics/news/2004/new01149.h
tml - Non-discrimination laws for insurance and
employment would open a floodgate. - Preventive healthcare will play a larger part.
- Environmental risk factors dictate OSHA-type
approach to worker empowerment and education
about safe behavior
67Example Hemochromatosis
- 2 copies of mutated HFE gene - too much iron
absorbed from diet, which accumulates. Causes
arthritis, liver disease, diabetes, skin
discoloration. - (1 million people in US)
- HFE gene regulates the storage, transport and
absorption of iron - Labs doing gene tests use different techniques
full sequence vs limited analysis
68A Portion of the HFE DNA Sequence
- ATGGGCCCGCGAGCCAGGCCGGCGCTTCTCCTCCTGATGCTTTTGCAGAC
CGCGGTCCTGCAGGGGCGCTTGCTGCGTTCACACTCTCTGCACTACCTCT
TCATGGGTGCCTCAGAGCAGGACCTTGGTCTTTCCTTGTTTGAAGCTTT
GGGCTACGTGGATGACCAGCTGTTCGTGTTCTATGATCATGAGAGTCGCC
GTGTGGAGCCCCGAACTCCATGGGTTTCCAGTAGAATTTCAAGCCAGATG
TGGCTGCAGCTGAGTCAGAGTCTGAAAGGGTGGGATCACATGTTCACTG
TTGACTTCTGGACTATTATGGAAAATCACAACCACAGCAAGGAGTCCCAC
ACCCTGCAGGTCATCCTGGGCTGTGAAATGCAAGAAGACAACAGTACCGA
GGGCTACTGGAAGTACGGGTATGATGGGCAGGACCACCTTGAATTCTGC
CCTGACACACTGGATTGGAGAGCAGCAGAACCCAGGGCCTGGCCCACCAA
GCTGGAGTGGGAAAGGCACAAGATTCGGGCCAGGCAGAACAGGGCCTACC
TGGAGAGGGACTG
69(No Transcript)
70A Portion of the HFE DNA Sequence
- GCACAAGATTCGGG GGACAAGATTCGGG
- His CAU and CAC
- Asp GAU and GAC
- A Mutation in position 225 changes C to G.
- Changes a part of the protein. (histadine to
aspartic acid at position 63)
71Amino Acid Sequence for HFE
- MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEAL
GYVDDQLFVFYD H D ESRRVEPRTPWVSSRISSQMWLQL
SQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWK
YGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERD
CPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNI
TMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQV
EHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQG
SRGAMGHYVLAERE - His63Asp in ONE chromosomes
- Cys282Tyr in ONE chromosome (not shown)
72Report Back from Full Sequence Lab
- Reference sequences for transcript variant 1 for
the HFE gene. NM_000410 NP_000401 - Consensus CDS (CCDS) CCDS4578.1
- Mutant phenotype changes
- His63Asp Cys282Tyr (2 mutations)
- Polymorphisms noted
- AA position 59 VAL53MET 157GA (freq 5)
73Special health concerns HFE
- For person with dx
- For family members
74Dilemmas
- The reference sequence ties you to external data
sources that change - The protein has eleven transcript variants
- Mutant phenotype is noted as an amino acid change
- Polymorphisms are noted as nucleotide change
- These results have implications for other family
members in addition to the patient
75What Should You Store in the EMR?
- Do you put the DNA sequence for the gene into the
EMR? Where do you put it? - Do you just store meta-data about the DNA
sequence? HFE test abn or (his63asp cys282tyr)
What about the normal variants? - If you dont store the sequence, what do you do
when the reference sequence changes? - How do you trigger alerts and reminders? And for
what? People with hemochromatosis need special
screening and check-ups.
76Genetic data in electronic medical records?
- Implications for component systems
- Laboratory
- Pharmacy
- Computerized order entry
- Documentation and notes
- Knowledge management
- Alerts and reminders
- Finding patients matching profiles
- Practice guidelines and clinical trials
77Genome Data and Other Information Systems
- Genomic information will be pervasive in all
healthcare information systems. - Also in public health systems
- Newborn screening
- Tissue and organ banks
- DOD requires DNA samples
- Bioterrorism and homeland security
- Identification of World Trade Center victims
- Privacy and security issues will remain with us
always but are manageable.
78Summary
- Informatics is the enabler of personalized,
genomic medicine. - Personalized medicine requires a combination of
medical informatics and applied bioinformatics
(and a lot more).
79Informatics will be a very dynamic discipline for
eons to come!
- Your week at Woods Hole is the first step to an
exciting future.
80 The End
- Joyce Mitchell, PhD
- University of Utah