BioConductor - PowerPoint PPT Presentation

About This Presentation
Title:

BioConductor

Description:

A vignette is an executable document consisting of a collection of code chunks ... Vignettes provide dynamic, integrated, and reproducible statistical documents ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 34
Provided by: net95
Learn more at: http://www.nettab.org
Category:

less

Transcript and Presenter's Notes

Title: BioConductor


1
BioConductor
  • Steffen Durinck
  • Robert Gentleman
  • Sandrine Dudoit
  • November 28, 2003
  • NETTAB Bologna

2
Outline
  • what is R
  • what is Bioconductor
  • packages
  • getting and using Bioconductor

3
R
  • R is a language and environment for statistical
    computing and graphics. It is a GNU project which
    is similar to the S language and environment
    which was developed at Bell Laboratories
    (formerly ATT, now Lucent Technologies) by John
    Chambers and colleagues. R can be considered as a
    different implementation of S.

4
R
  • what sorts of things is R good at?
  • there are very many statistical algorithms
  • there are very many machine learning algorithms
  • visualization
  • it is possible to write scripts that can be
    reused
  • R is a real computer language

5
R
  • R supports many data technologies
  • XML,database integration,SOAP
  • R interacts with other languages
  • C FORTRAN Perl Python Java
  • R has good visualization capabilities
  • R has a very active development environment
  • R is largely platform independent
  • Unix Windows OSX

6
Overview of the Bioconductor Project
7
Bioconductor
  • Bioconductor is an open source and open
    development software project for the analysis of
    biomedical and genomic data.
  • The project was started in the Fall of 2001 and
    includes 23 core developers in the US, Europe,
    and Australia.
  • R and the R package system are used to design and
    distribute software.
  • Releases
  • v 1.0 May 2nd, 2002, 15 packages.
  • v 1.1 November 18th, 2002, 20 packages.
  • v 1.2 May 28th, 2003, 30 packages.
  • v 1.3 October 28th, 2003, 54
    packages.
  • ArrayAnalyzer Commercial port of Bioconductor
    packages in S-Plus.

8
Goals
  • Provide access to powerful statistical and
    graphical methods for the analysis of genomic
    data.
  • Facilitate the integration of biological metadata
    (GenBank, GO, LocusLink, PubMed) in the analysis
    of experimental data.
  • Allow the rapid development of extensible,
    interoperable, and scalable software.
  • Promote high-quality documentation and
    reproducible research.
  • Provide training in computational and statistical
    methods.

9
Bioconductor Packages
10
Bioconductor packages
  • Bioconductor software consists of R add-on
    packages.
  • An R package is a structured collection of code
    (R, C, or other), documentation, and/or data for
    performing specific types of analyses.
  • E.g. affy, cluster, graph, hexbin packages
    provide implementations of specialized
    statistical and graphical methods.

11
Bioconductor packagesRelease 1.3, October 28th,
2003
  • AnnBuilder Bioconductor annotation data package
    builder
  • Biobase Biobase Base functions for Bioconductor
  • DynDoc Dynamic document tools
  • MAGEML handling MAGEML documents
  • MeasurementError.cor Measurement Error model
    estimate for correlation coefficient
  • RBGL Test interface to boost C graph lib
  • ROC utilities for ROC, with uarray focus
  • RdbiPgSQL PostgreSQL access
  • Rdbi Generic database methods
  • Rgraphviz Provides plotting capabilities for R
    graph objects
  • Ruuid Ruuid Provides Universally Unique ID
    values
  • SAGElyzer A package that deals with SAGE
    libraries
  • SNPtools Rudimentary structures for SNP data
  • affyPLM affyPLM - Probe Level Models
  • Affy Methods for Affymetrix Oligonucleotide
    Arrays
  • Affycomp Graphics Toolbox for Assessment of
    Affymetrix Expression Measures
  • Affydata Affymetrix Data for Demonstration
    Purpose
  • Annaffy Annotation tools for Affymetrix
    biological metadata
  • Annotate Annotation for microarrays

12

Bioconductor packagesRelease 1.3, October 28th,
2003
  • Ctc Cluster and Tree Conversion.
  • daMA Efficient design and analysis of factorial
    two-colour microarray data
  • Edd expression density diagnostics
  • externalVector Vector objects for R with external
    storage
  • factDesign Factorial designed microarray
    experiment analysis
  • Gcrma Background Adjustment Using Sequence
    Information
  • Genefilter Genefilter filter genes
  • Geneplotter Geneplotter plot microarray data
  • Globaltest Global Test
  • Gpls Classification using generalized partial
    least squares
  • Graph graph A package to handle graph data
    structures
  • Hexbin Hexagonal Binning Routines
  • Limma Linear Models for Microarray Data
  • Makecdfenv CDF Environment Maker
  • marrayClasses Classes and methods for cDNA
    microarray data
  • marrayInput Data input for cDNA microarrays
  • marrayNorm Location and scale normalization for
    cDNA microarray data
  • marrayPlots Diagnostic plots for cDNA microarray
    data
  • marrayTools Miscellaneous functions for cDNA
    microarrays

13
Bioconductor packagesRelease 1.3, October 28th,
2003
  • Matchprobes Tools for sequence matching of probes
    on arrays
  • Multtest Multiple Testing Procedures
  • ontoTools graphs and sparse matrices for working
    with ontologies
  • Pamr Pam prediction analysis for microarrays
  • reposTools Repository tools for R
  • Rhdf5 An HDF5 interface for R
  • Siggenes Significance and Empirical Bayes
    Analyses of Microarrays
  • Splicegear splicegear
  • tkWidgets R based tk widgets
  • Vsn Variance stabilization and calibration for
    microarray data
  • widgetTools Creates an interactive tcltk widgets

14
Microarray data analysis
.gpr, .Spot, MAGEML
CEL, CDF
marray limma vsn
affy vsn
Pre-processing
exprSet
Annotation
annotate annaffy metadata packages
Differential expression
Graphs networks
Cluster analysis
Prediction
CRAN class e1071 ipred LogitBoost MASS nnet random
Forest rpart
graph RBGL Rgraphviz
edd genefilter limma multtest ROC CRAN
CRAN class cluster MASS mva
Graphics
geneplotter hexbin CRAN
15
marray packages
  • Pre-processing two-color spotted array data
  • diagnostic plots,
  • robust adaptive normalization (lowess, loess).

maImage
maBoxplot
maPlot hexbin
16
affy package
  • Pre-processing oligonucleotide chip data
  • diagnostic plots,
  • background correction,
  • probe-level normalization,
  • computation of expression measures.

plotAffyRNADeg
barplot.ProbeSet
image
plotDensity
17
annotate, annafy, and AnnBuilder
Metadata package hgu95av2 mappings between
different gene identifiers for hgu95av2 chip.
  • Assemble and process genomic annotation data from
    public repositories.
  • Build annotation data packages or XML data
    documents.
  • Associate experimental data in real time to
    biological metadata from web databases such as
    GenBank, GO, KEGG, LocusLink, and PubMed.
  • Process and store query results e.g., search
    PubMed abstracts.
  • Generate HTML reports of analyses.

GENENAME zinc finger protein 261
LOCUSID 9203
ACCNUM X95808
MAP Xq13.1
AffyID 41046_s_at
SYMBOL ZNF261
PMID 10486218 9205841 8817323
GO GO0003677 GO0007275 GO0016021
many other mappings
18
MAGEML package
lt!DOCTYPE MAGE-ML SYSTEM "D/DATA/MAGE-ML/MAGE-ML.
dtd"gt ltMAGE-ML identifier"MAGE-MLE-SNGR-4"gt ltQua
ntitationTypeDimension_assnlistgt ltQuantitationType
Dimension identifier"QTD1"gt ltQuantitationTypes_a
ssnreflistgt ltMeasuredSignal_ref
identifier"QTF635 Median"/gt ltMeasuredSignal_ref
identifier"QTF635 Mean"/gt
.
marray packages (cDNA arrays)
19
SIGGENES PACKAGE - SAM
20
multtest package
  • Multiple hypothesis testing
  • Control type I error rate by using e.g.
    Bonferroni method

21
mva package -clustering

heatmap
22
mva package principal component analysis
23
Getting started
24
Installation
  • Main R software download from CRAN
    (cran.r-project.org), use latest release, now
    1.8.0.
  • Bioconductor packages download from Bioconductor
    (www.bioconductor.org), use latest release, now
    1.3.
  • Available for Linux/Unix, Windows, and Mac OS.

25
Installation
  • After installing R, install Bioconductor packages
    using getBioC install script.
  • From R
  • gt source("http//www.bioconductor.org/getBioC.R")
  • gt getBioC()
  • In general, R packages can be installed using the
    function install.packages.
  • In Windows, can also use Packages pull-down
    menus.

26
User interaction
  • R Command-line
  • Widgets. Small-scale graphical user interfaces
    (GUI), providing point click access for
    specific tasks.
  • E.g. File browsing and selection for data input,
    basic analyses.

27
Widgets
Reading in phenoData
tkSampleNames
tkphenoData
tkMIAME
28
Documentation and help
  • R manuals and tutorialsavailable from the R
    website or on-line in an R session.
  • R on-line help system detailed on-line
    documentation, available in text, HTML, PDF, and
    LaTeX formats.
  • gt help.start()
  • gt help(lm)
  • gt ?hclust
  • gt apropos(mean)
  • gt example(hclust)
  • gt demo()
  • gt demo(image)

29
Short courses
  • Bioconductor short courses
  • modular training segments on software and
    statistical methodology
  • lectures notes, computer labs, and course
    packages available on WWW for self-instruction.

30
Vignettes
  • Bioconductor has adopted a new documentation
    paradigm, the vignette.
  • A vignette is an executable document consisting
    of a collection of code chunks and documentation
    text chunks.
  • Vignettes provide dynamic, integrated, and
    reproducible statistical documents that can be
    automatically updated if either data or analyses
    are changed.
  • Each Bioconductor package contains at least one
    vignette, providing task-oriented descriptions of
    the package's functionality.

31
Vignettes
  • HowTos Task-oriented descriptions of package
    functionality.
  • Executable documents consisting of documentation
    text and code chunks.
  • Dynamic, integrated, and reproducible
    statistical documents.
  • Can be used interactively vExplorer.
  • Generated using Sweave (tools package).

vExplorer
32
References
  • R www.r-project.org, cran.r-project.org
  • software (CRAN)
  • documentation
  • newsletter R News
  • mailing list.
  • Bioconductor www.bioconductor.org
  • software, data, and documentation (vignettes)
  • training materials from short courses
  • mailing list.
  • Personal
  • sdurinck_at_esat.kuleuven.ac.be

33
acknowledgements
  • Robert Gentleman
  • Department of Biostatistical Science, Dana Faber
    Cancer Institute, Boston
  • Sandrine Dudoit
  • Division Biostatistics, University of California,
    Berkeley
Write a Comment
User Comments (0)
About PowerShow.com