Bioinformatics and cancer - PowerPoint PPT Presentation


PPT – Bioinformatics and cancer PowerPoint presentation | free to view - id: 74131-MWMwY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Bioinformatics and cancer


Application of computer technology to biological problems ... Rodent homologs. Still very little genomic sequence, so have to rely on ESTs alone ... – PowerPoint PPT presentation

Number of Views:604
Avg rating:3.0/5.0
Slides: 25
Provided by: victorjo


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Bioinformatics and cancer

Bioinformatics and cancer
  • How an experimental science can benefit from
    information technology

What is bioinformatics?
  • Application of computer technology to biological
  • Extraction of biological information from raw
  • Presentation of complex results in an
    understandable form
  • Generation of testable hypotheses for the

How is bioinformatics relevant to cancer?
  • Record keeping, particularly in complex,
    multi-center studies
  • Immunoscope
  • Compiling and integrating information derived
    from multiple sources
  • Leveraging the information derived from genome
    and transcriptome sequencing projects

The SEREX database
  • Single repository for all SEREX data
  • Unified format for storing and accessing data
  • Unified set of tools for first-level analysis of
    new sequences
  • Identification of recurrent epitopes
  • Curation and annotation of data

Primary SEREX data
  • Identity of clone
  • Library, origin of serum, clone identifiers
  • Sequence of clone
  • Nucleotide, protein if unambiguous
  • Experimental data
  • Reactivity with autologous and hetereologous sera
  • Tissue-specific expression

Derived SEREX data
  • Identity of gene from which cDNA was derived
  • Full sequence of the cDNA
  • Sequence of the corresponding protein
  • Similarities in the databases
  • Chromosomal localization

Nucleotide sequence analysis methods
  • Compare to SEREX database
  • Find if epitope already identified by others
  • Compare to human section of EMBL
  • Find if sequence is derived from known gene
  • Compare to Unigene database
  • Identify gene or EST cluster from which sequence
    is derived
  • Get information about chromosomal localization,
    specificity of expression
  • Compare to protein databases
  • Get information about candidate ORFs, taking
    frameshifts into account

Protein sequence analysis methods
  • Compare to protein databases
  • Try to identify potential homologs
  • Compare to profile databases
  • Identify sequence motifs indicative of structure
    and/or function
  • Search for HLA-binding peptides
  • Intrinsic methods
  • Look for coiled-coils, transmembrane segments,
    signal peptides, etc.

Current status of database
  • 1806 entries (Sep. 2000)
  • Reactivity 359 testis, 206 stomach, 310 breast,
    157 kidney, 102 colon, 37 lung, 73 melanoma, 180
    cell lines
  • 1480 entries in the public part of the database
  • 593 sequences match at least one other sequence
  • Many genes are novel

Re-engineering the SEREX database
  • Convert to true relational database
  • Standardize the annotation
  • Link to external databases
  • SWISS-PROT, GeneCards, LocusLink, RefSeq
  • Integrate into Cancer Immunome Database

Gene discovery tools
  • General goal convert raw data to a form where it
    can be searched efficiently and successfully
  • Tools developed at the SIB
  • EST clustering and assembly software
  • Genome contig reconstitution
  • Program to find and correct coding regions in
    low-quality sequences

ESTScan, TrEST and TrGEN
  • ESTScan is a tool that uses a special hidden
    Markov model of coding regions to find and
    correct coding sequences in ESTs
  • TrEST is a virtual protein database derived from
    EST contigs using ESTScan
  • TrGEN is a virtual protein database derived from
    raw genome sequences using GENSCAN
  • Both are very rich sources of novel genes

Hunting for homologs of NY-ESO-1
  • NY-ESO-1 is a prototype CT antigen, to which
    cancer patients mount both humoral and cellular
  • There is a second human gene, LAGE-1, also
    located on chr X and highly similar to NY-ESO-1
  • Are there other human genes in the same family?
  • Are there homologs in other species?

  • Identify similar regions in NY-ESO-1 and LAGE-1,
    and make profile
  • Use profile to search TrEST and TrGEN, plus
    traditional protein databases (SWISS-PROT,
  • Use profile to search EST contigs and predicted
    genomic exons in framesearch tolerant mode
  • Use hits from profile search to refine profile
    and reiterate database searches

Human homologs
  • First search identifies a predicted CDS and a
    Unigene cluster mapping to Xp28
  • The corresponding mRNA and protein are in the
    databases (ITBA2), but translated in the wrong
  • Second search identifies a new Unigene cluster,
    whose closest current homolog is a pseudogene on
  • Therefore, there are at least four human family

Rodent homologs
  • Still very little genomic sequence, so have to
    rely on ESTs alone
  • In mouse, pick up two clusters that are similar
    to each other and to ITBA2
  • In rat, pick up only one cluster similar to the
    mouse ones
  • One mouse mRNA checked for tissue specificity
    turned out not to be CT antigen

More distant homologs
  • For genes centrally involved in development or
    metabolism, expect homologs in most eukaryotes
  • Found ESO-1 homologs in several fully sequenced
    genomes Drosophila, C. elegans, S. pombe (but
    not S. cerevisiae)
  • In Drosophila, protein was misannotated as
    ribosomal, because is it located adjacent to L34
    protein from 60S subunit

Alignment of homologous regions
Lessons learned
  • EST and unannotated genome data are still a rich
    source of information for gene discovery
  • Much of the existing annotation is erroneous,
    even if it was not done automatically
  • Bioinformatics approaches can suggest new
    experimental avenues

The ORESTES project
  • Sponsors Ludwig Institute for Cancer Research,
    FAPESP (São Paulo State funding agency)
  • Goal to obtain EST sequences from the
    under-represented, often coding, central portions
    of mRNAs
  • Methodology use low-stringency semi-random
    priming followed by PCR, producing low complexity
  • Results over 500000 ESTs produced, of which
    half produce novel information

The Human Transcriptome project
  • Sponsors NCI, LICR, FAPESP
  • Goal to provide a comprehensive and
    experimentally validated catalog of human
  • Methodology create a stable index of transcripts
    by using poly(A) tags, extend this by a
    combination of expermental and bioinformatic

The EUCIP project
  • Sponsors European Union, LICR
  • Goal to examine in detail the cancer biology of
    genes identified using the SEREX method
  • Methodology for each gene, examine
    cancer-related alterations, patterns of
    expression, tissue distribution and immune
    responses document in integrated database

The DNA chip consortium
  • Partners LICR, ICRF, Sanger Centre
  • Goal to produce cDNA chips representing the
    entire human transcriptome
  • Current state two 5000-feature chips produced
    routinely, expect non-redundant 10k chips at
    years end

  • Philipp Bucher (ISREC)
  • Christian Iseli (LICR)
  • Brian Stevenson (LICR)
  • Dmitry Kuznetsov (LICR)
  • Andy Simpson and Sandro de Souza (LICR São Paulo)