Bioinformatics: Definitions, Challenges and Impact on Health Care Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics: Definitions, Challenges and Impact on Health Care Systems

Description:

... organ banks DOD requires DNA samples Bioterrorism and homeland security Identification of World Trade Center victims Privacy and security issues will remain with ... – PowerPoint PPT presentation

Number of Views:426
Avg rating:3.0/5.0
Slides: 81
Provided by: Mitch66
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics: Definitions, Challenges and Impact on Health Care Systems


1
Bioinformatics Definitions, Challenges and
Impact on Health Care Systems
  • Joyce Mitchell, Ph.D.
  • University of Utah
  • Sept 29, 2005
  • NLMs Woods Hole Informatics Course

2
Outline for Talk
  • What is Bioinformatics?
  • Health Informatics compared to Bioinformatics
  • Problems considered in Bioinformatics
  • Genomics, proteomics, transcriptomics, etc
  • Genomics data and patient care
  • Impact of Bioinformatics on Health Information
    Systems

3
Central Dogma of Molecular Biology
Transcription
DNA
RNA
Protein
Phenotype
Phenotype
Translation
Replication
This happens in Cells.
4
1. What is Bioinformatics?
  • Definitions first

5
NIH Working Definition
  • Bioinformatics Research, development, or
    application of computational tools and approaches
    for expanding the use of biological, medical,
    behavioral or health data, including those to
    acquire, store, organize, archive, analyze, or
    visualize such data.
  • http//www.bisti.nih.gov/CompuBioDef.pdf

6
Another Definition
  • An interdisciplinary area at the intersection of
    biological, computer, and information sciences
    necessary to manage, process, and understand
    large amounts of data, for instance from the
    sequencing of the human genome, or from large
    databases containing information about plants and
    animals for use in discovering and developing new
    drugs. www.isye.gatech.edu/tg/publications/ecolo
    gy/eolss/node2.html

7
Another definitionNCBI (National Center for
Biotechnology Information
  • Bioinformatics is the field of science in which
    biology, computer science, and information
    technology merge into a single discipline. The
    ultimate goal of the field is to enable the
    discovery of new biological insights and to
    create a global perspective from which unifying
    principles in biology can be discerned. There are
    sub-disciplines in bioinformatics.
  • http//www.ncbi.nlm.nih.gov/About/primer/bioinform
    atics.html

8
2. Health Informatics Compared to Bioinformatics
  • Same methods, different application domains

9
Different Areas of Strengths
  • Bioinformatics has much more data available on
    the Internet than Health Informatics
  • Much more progress on database integration across
    multiple data sources
  • Health Informatics has much more need for
    aggregation of national statistics
  • Much more progress on terminologies for
    integration of data

10
Bioinformatics Health Informatics
  • Bioinformatics is the study of the flow of
    information in biological sciences.
  • Health Informatics is the study of the flow of
    information in patient care.
  • These two field are on a collision course as
    genomics data becomes used in patient care.
  • Russ Altman,MD, Ph.D., Stanford Univ.

11
3. Problems Considered in Bioinformatics
  • OMES and OMICS

12
Omes and Omics
  • Genomics
  • Primarily sequences (DNA and RNA)
  • Databanks and search algorithms
  • Proteomics
  • Sequences (Protein)
  • Mass spectrometry, X-ray crystallography
  • Databanks, knowledge bases, terminologies
  • Functional Genomics (transcriptomics)
  • Microarray data
  • Databanks, analysis tools, traversal techniques
  • Systems Biology (metabolomics)
  • Metabolites and interacting systems
    (interactomics)
  • Graphs, visualization, modeling, networks of
    entities

13
Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
14
Genome and Genomics
  • Genome entire complement of DNA in a species
  • Both nuclear and mitochondrial/chloroplast
  • Variants among individuals
  • Genomics study of the sequence, structure and
    function of the genome. Study of whole sets of
    genes rather than single genes.
  • Comparative genomics study of the differences
    among species. Usually covers evolutionary
    studies of differences conservation over time.

15
A Genome Database (e.g., GenBank)
  • Consists of long strings of DNA bases ATCG..
  • Consists of annotations of this database to
    attach meaning to the sequence data.
  • Example entry from GenBank
  • http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val
    NM_000410doptgb Hemochromatosis gene HFE

16
Human Genome Project
  • Human Genome Project - International research
    effort
  • Determine sequence of human genome and other
    model organisms
  • Began 1990, completed 2003
  • Next steps for 20,000 genes
  • Function and regulation of all genes
  • Significance of variations between people
  • Cures, therapies, genomic healthcare

17
The Human Genome Project has catalyzed striking
paradigm changes in biology - biology is an
information science.
  • Leroy Hood, MD, PhD
  • Institute for Systems Biology
  • Seattle, Washington

18
Genomes In Public Databases
  • Published complete genomes
  • Ongoing prokaryotic genomes
  • Ongoing eukaryotic genomes

1560
http//www.genomesonline.org/
19
Genomics activities
  • Sequence the genes and chromosomes done by
    breaking the DNA into parts
  • Map the location of various gene entities to
    establish their order
  • Compare the sequences with other known sequences
    to determine similarity
  • Across species, conserved sequence motifs
  • Predict secondary structure of proteins
  • Create large databases GenBank, EMBL, DDBJ
  • Develop algorithms and similarity measures
  • BLAST and its many forms

20
Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype Tissues Organs Organisms
Genomics
Proteomics
Transcriptomics Functional Genetics
21
Proteome and Proteomics
  • Proteome the entire set of proteins (and other
    gene products) made by the genome.
  • Proteomics study of the interactions among
    proteins in the proteome, including networks of
    interacting proteins and metabolic
    considerations. Also includes differences in
    developmental stages, tissues and organs.

22
Protein Functions
  • Catalysis
  • Transport
  • Nutrition and storage
  • Contraction and mobility
  • Structural elements
  • Cytoskeleton
  • Basement membranes
  • Defense mechanisms
  • Regulation
  • Genetic
  • Hormonal
  • Buffering capacity

23
Protein Databases
  • SwissProt
  • PIR
    http//www.pir.uniprot.org/
  • GENE http//www.ncbi.nlm.nih.gov/gene
  • InterPro http//www.ebi.ac.uk/interpro/
  • Correspond to (and derived from) Genome data
    bases
  • All connected by Reference Sequences (NCBI)

UniProt
24
Gene/Protein Database entries
  • HFE record in Entrez GENE (NCBI)
  • http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
    genecmdretrievedoptGraphicslist_uids3077

25
Structure Function Determination
  • X-ray crystallography
  • Nuclear magnetic resonance spectroscopy and
    tandem MS/MS
  • Computational modeling
  • Sequence alignment from others
  • Homology modeling

26
Structure Databases
  • Contain experimentally determined and predicted
    structures of biological molecules
  • Most structures determined by X-ray
    crystallography, NMR
  • Example MMDB molecular modeling db
    http//www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.sh
    tml
  • HFE Entry
  • http//www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv
    .cgi?form6dbtDoptsuid9816

27
Protein Interaction Databases
  • Record observations of protein-protein
    interactions in cells
  • Attempts to detail interactions observed in
    thousands of small-scale experiments described in
    published articles
  • Examples
  • BIND Biomolecular Interaction Network Database
  • DIP Database of Interacting Proteins
  • MIPS Munich Information Center for Protein
    Sequences
  • PRONET Protein interaction on the Web
  • Many others, both academic and commercial

28
Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
29
Proteome vs Transcriptome
  • Functional genomics (transcriptomics) looks at
    the timing and regulation of the gene products
    (both RNA and proteins)
  • This is different from looking at what gene
    products can be produced it looks at the
    circumstances under which production occurs.
  • Involves experimental conditions.

30
Functional Genomics Microarrays
  • Transcriptome and transcriptomics
  • High throughput technique designed to measure the
    increase in RNA (or sometimes proteins, tissues,
    etc) in a cell in response to an experiment.
  • Also called gene expression analysis
  • Microarrays called gene chips (although now
    there are protein and tissue chips)

31
How Do Microarrays Work?
  • Conceptual description
  • Set of targets (cDNA, proteins, tissues, etc) are
    immobilized in predetermined positions on a
    substrate
  • Solution containing tagged molecules capable of
    binding to the targets is placed over the targets
  • Binding occurs between targets and tagged
    molecules.
  • Fluorescent tags allow you to visualize which
    targets have been bound (and tell you something
    about the molecules that were present in your
    solution).

32
Animation of Microarrays
  • http//www.bio.davidson.edu/courses/genomics/chip/
    chip.html

33
How Do Microarrays Work?
  • Conceptual description
  • Set of targets (cDNA, proteins, tissues, etc) are
    immobilized in predetermined positions on a
    substrate
  • Solution containing tagged molecules capable of
    binding to the targets is placed over the targets
  • Binding occurs between targets and tagged
    molecules.
  • Fluorescent tags allow you to visualize which
    targets have been bound (and tell you something
    about the molecules that were present in your
    solution).

34
How Spotted Arrays Work
  • Result
  • Spots where cDNA from the reference sample
    hybridized look green
  • Spots where cDNA from the experimental sample
    hybridized look red
  • Spots where cDNA from both samples hybridized
    look yellow (greenredyellow)
  • Spots with little/no cDNA hybridized look black

35
(No Transcript)
36
(No Transcript)
37
Uses of Expression Profiling
  • Pharmaceutical research
  • ID drug targets by comparing expression profile
    of drug-treated cells with those of cells
    containing mutations in genes encoding known drug
    targets
  • Disease Dx and Tx
  • Distinguish morphologically similar cancers
  • DLBCL (Poulsen et al (2005) Microarray-based
    classification of diffuse large B-cell lymphomas
    European Journal of Haematology 74(6)453-65.))
  • Therapy potential
  • Rabson AB, Weissmann D. From microarray to
    bedside targeting NF-kappaB for therapy of
    lymphomas. Clin Cancer Res. 2005 Jan 111(1)2-6.

38
Future Applications
  • Diagnostic tool to screen for infective agents
  • Chip imprinted with set of pathogenic genomes
    used to identify bacterial, viral, or parasite
    genomic material in patients body fluids
  • Diagnostic chip to check for mutations involved
    in drug-gene interactions.

39
Experimental Design (2)
  • A fundamental challenge of microarray
    experiments underdetermined systems

Kohane IS, Kho AT, Butte AJ. Microarrays for an
Integrative Genomics. (The MIT Press Cambridge,
MA 2003), p. 11.
40
MGED Microarray gene expression data
http//www.mged.org/Workgroups/MAGE/mage.html
http//www.mged.org/Workgroups/MIAME/miame.html
41
Public Microarray Data Repositories
  • Major public repositories
  • GEO (NCBI)
  • http//www.ncbi.nlm.nih.gov/geo/
  • ArrayExpress (EBI)
  • http//www.ebi.ac.uk/arrayexpress/

42
Standards and Repositories
  • Brazma, A, et al. Minimum information about a
    microarray experiment (MIAME)-toward standards
    for microarray data. Nature Genetics. 2001
    Dec29(4)373.
  • http//www.nature.com/cgi-taf/DynaPage.taf?file/
    ng/journal/v29/n4/full/ng1201-365.html
  • Ball, CA, et al. Submission of Microarray Data to
    Public Repositories. PLoS Biology. 2004
    September 2 (9) e317
  • http//www.pubmedcentral.nih.gov/articlerender.fc
    gi?toolpubmedpubmedid15340489

43
Controlled Vocabularies
  • Genomics, proteomics, and especially microarray
    techniques have created a large need for
    controlled vocabularies to assist the analyses
    across multiple entities species.
  • Taxonomy systematic classification of objects
    according to relationships.
  • Ontologies
  • An organizational framework for concepts

44
Controlled Vocabularies in Bioinformatics
  • The Gene Ontology http//www.geneontology.org/
  • Knowledge capture (the ontology itself)
  • Annotation of gene products (for comparisons)
  • The MGED Ontology (arising from MIAME)
  • http//mged.sourceforge.net/
  • Annotation of microarray experiments for public
    repositories
  • Clinical Bioinformatics Ontology
  • Annotation of gene tests in electronic medical
    records
  • http//www.cerner.com/cbo
  • MIAPE from Proteomics Standards Initiative (PSI)
  • http//psidev.sourceforge.net/

45
4. Genomics Data and Patient Care
  • From genotype to phenotype

46
Bioinformatics and Patient Care
  • Understanding a persons genome ushers the era of
    Personalized Medicine
  • Obviously you should keep track of
    health-related genetic data in the EMR.
  • The 9-11 disaster showed you need to know the
    genomic variant information as well.
  • Cash et al. Forensic bioinformatics in the wake
    of the World Trade Center Disaster. PSB
    2003638-653.

47
Human Disease Gene Specifics
  • Genes linked to human diseases (9-2004)
  • 425 in 2 yrs
  • 1700/20,000 9 of loci

48
Genetic Medicine is not new
  • Karl Landsteiner started genetic medicine over
    100 years ago (1903)
  • Blood transfusions worked off the ABO blood group
    system.
  • Landsteiner got the Nobel Prize in 1930 for his
    work.
  • http//nobelprize.org/medicine/laureates/1930/land
    steiner-bio.html

49
Genomic Medicine is New
  • What to do with all of this genetic information
    and every person being unique?
  • And the information about genetic conditions is
    available on the Internet.

50
Genomics Data and Patient Care
  • Where do you find the data for genes causing
    human diseases?
  • What do you do with genetic data in electronic
    medical records?

51
Where do you find the data for genes causing
human diseases?
  • Study on availability of genetic data on health
    implications of the HGP.
  • Mitchell, McCray, Bodenreider. Methods Inf Med
    2003 42557-63.

52
Questions
  • What genes cause the condition?
  • What are the normal function of the gene?
  • What mutations have been linked to diseases?
  • How does the mutation alter gene function?
  • What laboratories are performing DNA tests?
  • Are there gene therapies or clinical trials?
  • What names are used to refer to the genes and the
    diseases?
  • What other conditions are linked to these same
    genes?

53
You can find the answers online
  • but it is not easy answers in many places
  • Cant navigate by genes names - must use hot
    links and numeric identifiers
  • The number and function of alternate forms of the
    protein are inconsistently reported
  • Synonymy (many names, same meaning) and polysemy
    (same name, different meanings) cause confusion
  • Upper and lower case are used for species
    distinctions

54
Major Challenges of Navigation
  • Complexity of data
  • Dynamic nature of the data
  • Diverse foci and number of data/knowledge base
    systems
  • Data and knowledge representation lack standards
  • Can navigate if you know what you are looking for.

55
Genetics Home Reference
  • Consumer health resource to help the public
    navigate from phenotype to genotype.
  • Focus on health implications of the Human Genome
    Project.
  • http//ghr.nlm.nih.gov
  • Mitchell, Fun, McCray, JAMIA, 2004 Nov
    11(6)439-437

56
Hands-on with GHR
  • Scavenger hunt with hemochromatosis and the genes
    that influence it.
  • Explore the Genetics Home Reference by answering
    the following questions. Start at
    http//ghr.nlm.nih.gov .

57
GHR Scavenger Hunt
  • How common is hemochromatosis?
  • How many genes have been proven to be involved in
    hemochromatosis when the genes are mutated?
  • What are the symbols for these genes?
  • Can you find the link to MedlinePlus with health
    information on hemochromatosis?

58
GHR Scavenger Hunt
  • What are the names of the patient support
    associations for hemochromatosis?
  • One synonym for this condition is bronze
    diabetes. Can you find a reason for this?
  • What kind of damage is done to the liver of
    people with hemochromatosis?

59
GHR Scavenger Hunt
  • For the genes involved in hemochromatosis, how
    many of them are available as a DNA test?
  • Give one place where you would choose to send a
    tissue sample for DNA testing.
  • What sites are listed under Research Resources
    for the TFR2 gene?
  • How many alternately spliced proteins for TFR2?
  • In what tissues is this gene expressed?

60
GHR Scavenger Hunt
  • How do people inherit hemochromatosis?
  • Do the genes involved in hemochromatosis cause
    other health conditions when they are mutated?
  • Can you find a protein sequence for one of the
    genes?
  • What clinical trials are available for
    hemochromatosis patients close to where you live?

61
5. Impact of Bioinformatics on Health Information
Systems
  • Electronic Medical Record
  • Public Health Systems

62
Genetics is Impacting Medicine Today!
  • 1700 genes health conditions
  • gt 1100 gene tests for diagnosis
  • Relate to diagnosis, therapy, drug dosage,
    occupational hazards, reproductive plans, health
    risks, .

63
Well-known Examples
  • Pharmacogenetics
  • CYP450 alleles exaggerated, diminished or
    ultra-rapid drug responses. E.G., Warfarin. 93
    of patients are OK on standard doses. 7 of
    patients have severe hemorrhage. CYP2C92 and
    CYP2C93 most severe of 6 known mutations.
  • Environmental susceptibility
  • Sickle Cell trait carrier and malaria parasite
  • Nutrition
  • PKU and avoidance of phenylalanine

64
Another Example Iressa (gefitinib)
  • Non-small cell lung CA 140,000 pt/yr
  • Iressa (Astra Zeneca) causes remission in 1 of 10
    patients if taken daily for life.
  • Iressa efficacy correlates with EGFR mutation in
    the tumor. Now have gene testing for EGFR so can
    target appropriate people. http//www.sciencemag.o
    rg/cgi/content/full/305/5688/1222a
  • BUT Astra Zeneca cant make money on only
    14,000 per year.
  • http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id
    131550

65
Collie Dog Example
  • Collies are more sensitive to the anti-parasytic
    drug invermectin and loperamide (imodium) and
    other drugs
  • 75 of collies in US have a mutation in the mdr1
    gene causing multiple drug sensitivity (50
    drugs). Can cause death or neurological damage.
  • Now have testing available.
  • http//www.wral.com/money/3565592/detail.html

66
Implications for Health Care System
  • More gene tests will be ordered. reports of 300
    increase in gene tests in 2003.
  • Arch Pathol Lab Med 2004, 128(12)1330-1333
  • The FDA will regulate panels of tests.
  • http//www.fda.gov/bbs/topics/news/2004/new01149.h
    tml
  • Non-discrimination laws for insurance and
    employment would open a floodgate.
  • Preventive healthcare will play a larger part.
  • Environmental risk factors dictate OSHA-type
    approach to worker empowerment and education
    about safe behavior

67
Example Hemochromatosis
  • 2 copies of mutated HFE gene - too much iron
    absorbed from diet, which accumulates. Causes
    arthritis, liver disease, diabetes, skin
    discoloration.
  • (1 million people in US)
  • HFE gene regulates the storage, transport and
    absorption of iron
  • Labs doing gene tests use different techniques
    full sequence vs limited analysis

68
A Portion of the HFE DNA Sequence
  • ATGGGCCCGCGAGCCAGGCCGGCGCTTCTCCTCCTGATGCTTTTGCAGAC
    CGCGGTCCTGCAGGGGCGCTTGCTGCGTTCACACTCTCTGCACTACCTCT
    TCATGGGTGCCTCAGAGCAGGACCTTGGTCTTTCCTTGTTTGAAGCTTT
    GGGCTACGTGGATGACCAGCTGTTCGTGTTCTATGATCATGAGAGTCGCC
    GTGTGGAGCCCCGAACTCCATGGGTTTCCAGTAGAATTTCAAGCCAGATG
    TGGCTGCAGCTGAGTCAGAGTCTGAAAGGGTGGGATCACATGTTCACTG
    TTGACTTCTGGACTATTATGGAAAATCACAACCACAGCAAGGAGTCCCAC
    ACCCTGCAGGTCATCCTGGGCTGTGAAATGCAAGAAGACAACAGTACCGA
    GGGCTACTGGAAGTACGGGTATGATGGGCAGGACCACCTTGAATTCTGC
    CCTGACACACTGGATTGGAGAGCAGCAGAACCCAGGGCCTGGCCCACCAA
    GCTGGAGTGGGAAAGGCACAAGATTCGGGCCAGGCAGAACAGGGCCTACC
    TGGAGAGGGACTG

69
(No Transcript)
70
A Portion of the HFE DNA Sequence
  • GCACAAGATTCGGG GGACAAGATTCGGG
  • His CAU and CAC
  • Asp GAU and GAC
  • A Mutation in position 225 changes C to G.
  • Changes a part of the protein. (histadine to
    aspartic acid at position 63)

71
Amino Acid Sequence for HFE
  • MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEAL
    GYVDDQLFVFYD H D ESRRVEPRTPWVSSRISSQMWLQL
    SQSLKGWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWK
    YGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLERD
    CPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNI
    TMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQV
    EHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQG
    SRGAMGHYVLAERE
  • His63Asp in ONE chromosomes
  • Cys282Tyr in ONE chromosome (not shown)

72
Report Back from Full Sequence Lab
  • Reference sequences for transcript variant 1 for
    the HFE gene. NM_000410 NP_000401
  • Consensus CDS (CCDS) CCDS4578.1
  • Mutant phenotype changes
  • His63Asp Cys282Tyr (2 mutations)
  • Polymorphisms noted
  • AA position 59 VAL53MET 157GA (freq 5)

73
Special health concerns HFE
  • For person with dx
  • For family members

74
Dilemmas
  • The reference sequence ties you to external data
    sources that change
  • The protein has eleven transcript variants
  • Mutant phenotype is noted as an amino acid change
  • Polymorphisms are noted as nucleotide change
  • These results have implications for other family
    members in addition to the patient

75
What Should You Store in the EMR?
  • Do you put the DNA sequence for the gene into the
    EMR? Where do you put it?
  • Do you just store meta-data about the DNA
    sequence? HFE test abn or (his63asp cys282tyr)
    What about the normal variants?
  • If you dont store the sequence, what do you do
    when the reference sequence changes?
  • How do you trigger alerts and reminders? And for
    what? People with hemochromatosis need special
    screening and check-ups.

76
Genetic data in electronic medical records?
  • Implications for component systems
  • Laboratory
  • Pharmacy
  • Computerized order entry
  • Documentation and notes
  • Knowledge management
  • Alerts and reminders
  • Finding patients matching profiles
  • Practice guidelines and clinical trials

77
Genome Data and Other Information Systems
  • Genomic information will be pervasive in all
    healthcare information systems.
  • Also in public health systems
  • Newborn screening
  • Tissue and organ banks
  • DOD requires DNA samples
  • Bioterrorism and homeland security
  • Identification of World Trade Center victims
  • Privacy and security issues will remain with us
    always but are manageable.

78
Summary
  • Informatics is the enabler of personalized,
    genomic medicine.
  • Personalized medicine requires a combination of
    medical informatics and applied bioinformatics
    (and a lot more).

79
Informatics will be a very dynamic discipline for
eons to come!
  • Your week at Woods Hole is the first step to an
    exciting future.

80
The End
  • Joyce Mitchell, PhD
  • University of Utah
Write a Comment
User Comments (0)
About PowerShow.com