Bioin401 Project: AffyPipe Final Presentation - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Bioin401 Project: AffyPipe Final Presentation

Description:

Apache web server interacts with perl cgi scripts on the server side. ... (regulation of transcription) GO:0000785(chromatin) GO:0003702(RNA polymerase II ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 15
Provided by: bill521
Category:

less

Transcript and Presenter's Notes

Title: Bioin401 Project: AffyPipe Final Presentation


1
Bioin401 Project AffyPipeFinal Presentation
  • Yifeng Liu,
  • Chelsea Ju,
  • Chunyan Meng,
  • April 10, 2006

2
Outline
  • Pipeline software structure
  • CEL file upload and access control
  • Clustering
  • Gene function prediction

3
Pipeline - Implementation
  • Apache web server interacts with perl cgi scripts
    on the server side. The cgi scripts call the
    backend R scripts to do the data processing and
    analysis
  • The cgi scripts are modulized based on the
    functions, such as DE.pm for differential
    expression analysis, Cluster.pm for gene chip
    clustering, Annotation.pm for gene function
    annotation.

4
Pipeline User Access Control
  • The user has to provide a valid email address.
    Javacript will check the format of the email
    string.
  • Number of CEL files can be easily restricted to
    protect the server.

5
CEL File Upload
  • User can upload the CEL files as control and
    experimental samples
  • Confirm with the user before upload

6
Gene Chip Clustering
  • Hierarchical Clustering methods can be applied on
    the gene chips based on the expression of the
    selected genes from the previous steps.
  • Heat-map of the selected genes expression level
    can be generated if the user wants.

7
Agglomeration methods for distance calculation
  • The following methods are supported ward --
    Ward's minimum variance method single -- Also
    called connected, single linkage or nearest
    neighbour complete -- Furthest neighbour or
    compact average-- Also called "average linkage"
    and similar to an algorithm called "centroid"
    mcquitty-- McQuitty's method median -- Median
    (as opposed to average) similarity centroid--
    Geometric centroid
  • A help page is available for detailed explanation

8
Distance measures for distance matrix computation
  • The following methods are supported euclidean
    -- Usual square distance between the two vectors
    (2 norm).maximum-- Maximum distance between two
    components of x and y (supremum norm)manhattan--
    Absolute distance between the two vectors (1
    norm)canberra-- sum(x_i - y_i / x_i y_i).
    Terms with zero numerator and denominator are
    omitted from the sum and treated as if the values
    were missing.binary-- (aka _asymmetric binary_)
    The vectors are regarded as binary bits, so
    non-zero elements are 'on' and zero elements are
    'off'. The distance is the _proportion_ of bits
    in which only one is on amongst those in which at
    least one is on.minkowski-- The p norm, the pth
    root of the sum of the pth powers of the
    differences of the components. A help page is
    available for detailed explanation
  • A help page is available for detailed explanation

9
Gene Chip Cluster
10
Gene Expression Heat-map
  • Orage-Red sequential color range

d7b
d7a
d7a-hiv
d7b-hiv
11
Gene Function Prediction
  • Input
  • Unknown genes
  • Annotated genes with GO ids and biology terms
  • Output
  • Predicted gene functions with GO ids and biology
    terms indicated by the nearest annotated gene in
    the same cluster in the context of the selected
    interesting genes

12
Example Input
  • X56681_s_at GO0006357(regulation of
    transcription from RNA polymerase II
    promoter) GO0006355(regulation of transcription,
    DNA-dependent) GO0045449(regulation of
    transcription) GO0000785(chromatin) GO0003702(RN
    A polymerase II transcription factor
    activity) GO0030528(transcription regulator
    activity)
  • M15205_at GO0006139(nucleobase, nucleoside,
    nucleotide and nucleic acid metabolism) GO0044237
    (cellular metabolism) GO0008152(metabolism) GO00
    50875(cellular physiological process) GO0044238(p
    rimary metabolism) GO0005737(cytoplasm) GO000479
    7(thymidine kinase activity) GO0019136(deoxynucle
    oside kinase activity) GO0019206(nucleoside
    kinase activity)

HG2815-HT4023_s_at HG2815-HT2931_at
13
Example Output
  • HG2724-HT2820_at GO0016021(integral to
    membrane)GO0031224(intrinsic to membrane)
  • HG1869-HT1904_at GO0008429(phosphatidylethanolam
    ine binding)GO0005543(phospholipid
    binding)GO0008289(lipid binding)

14
Possible future work for gene function prediction
  • Associate the gene function based on the distance
    from different clusters.
  • Clustering only the selected interesting genes
    may not give strong support for the gene function
    prediction. More factors should be considered.
Write a Comment
User Comments (0)
About PowerShow.com