Bioinformatics R for Bioinformatics PART II Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Universit - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics R for Bioinformatics PART II Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Universit

Description:

Title: PowerPoint Presentation Author: ymeng Last modified by: Kristel Van Steen Created Date: 2/28/2005 1:28:21 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 83
Provided by: yme1
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics R for Bioinformatics PART II Kristel Van Steen, PhD, ScD (kristel.vansteen@ulg.ac.be) Universit


1
BioinformaticsR for BioinformaticsPART
IIKristel Van Steen, PhD, ScD(kristel.vansteen_at_
ulg.ac.be)Université de Liege - Institut
Montefiore 2008-2009
2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Simplified Epistasis Testing
  • We shall now use logistic regression in R to test
    for epistatic interactions between locus 3 and
    another unlinked locus (locus 5). An epistatic
    interaction means that the combined effect of
    locus 3 and 5 is greater than the product (on the
    odds scale) or the sum (on the log odds scale) of
    the locus 3 and locus 5 individual effects.
    First get rid of the data in the memory and
    read in the new data. This data is the same as
    the original pedfile, but with an additional
    column giving genotype at (unlinked) locus 5
    detach(casecon) newcasecon lt-
    read.table("newcasecondata.txt",
    headerT)attach(newcasecon) You can look at
    the data by typing fix(newcasecon)

28
Cordell practical (see statistical genetics
class)
  • Next create appropriate genotype and case
    variables case lt- affected-1 g3 lt-
    genotype(loc3_1, loc3_2) g5 lt- genotype(loc5_1,
    loc5_2) The individual effects at locus 3 and 5
    are now coded by the variables g3 and g5. We can
    test for association at each locus separately
    gcontrasts(g3) lt- "genotype" logit (case g3)
    anova(logit (case g3)) gcontrasts(g5) lt-
    "genotype" logit (case g5) anova(logit (case
    g5))

29
  • In order to investigate epistasis, it is more
    convenient to create new variables that code
    numerically for the number of copies of allele 2
    in each genotypes count3lt-allele.count(g3,2)
    count5lt-allele.count(g5,2) Check you
    understand how variables count3 and count5 relate
    to g3 and g5 by typing g3 count3 g5 count5

30
  • We then create a variable that codes for the
    combined effect of locus 3 and 5 as follows
    combolt-10count3count5 Check you understand
    how the 'combo' variable relates to g3 and g5 by
    typing g3 g5 combo Now we need to code each
    of these variables as 'factors' which means we
    simply consider the numeric codes to act as
    labels for the different categories rather than
    having numeric meaning fact3lt-factor(count3)
    fact5lt-factor(count5) factcombolt-factor(combo)

31
  • Check that the analysis with the 'factors' gives
    the same results as you found previously with the
    genotype variables anova(logit (case fact3))
    anova(logit (case fact5)) Now test whether
    there is significant epistasis by typing
    anova(logit(case fact3 fact5 factcombo))
    1-pchisq(9.59,4) This first fits the
    individual locus factors, and then adds in the
    extra effect of looking at the model with
    epistasis included (i.e. a model with 9 estimated
    parameters corresponding to the 9 genotype
    combinations), and tests the difference between
    the models. You should get a chi-squared of 9.59
    on 4 df with p value 0.048 i.e. there is marginal
    evidence of epistasis. The above test is valid
    for testing for epistasis between linked or
    unlinked loci, although it does not allow for
    haplotype (phase) effects between linked loci. A
    more powerful test for epistasis between UNLINKED
    LOCI ONLY is to use 'case-only' analysis and test
    whether the genotypes at one locus predict those
    at the other, in the cases alone. This is only
    valid at unlinked loci, because at linked loci we
    expect genotypes at one locus to predict those at
    the other (even in controls) due to linkage
    disequilibrium.

32
  • To do this, we can use a chi squared test to look
    for correlation (association) between the loci
    within the case and control groups separately.
    First we need to set up 2 new vectors of
    genotypes for loci 3 and 5, using only the cases.
    To do this, we can take advantage of the fact
    that the data has been ordered in such a way that
    cases are the first 384 observations. (Check this
    by typing case or fix(newcasecon) ). So we can
    create genotype vectors just for the cases using
    the following commands caseg3lt-g31384
    caseg5lt-g51384 Take a look at the vectors
    you have created by typing caseg3 caseg5 Now
    do a chi-squared test on the genotype variables
    to see if they are correlated with each other
    table(caseg3,caseg5) chisq.test(caseg3,caseg5)

33
  • You should find much more significant evidence of
    epistasis (p value 0.0018) than you did using
    logistic regression. This is not surprising as
    the case-only test of interaction is a more
    powerful test. However, the case-only test does
    rely on the assumption that the two genotype
    variables g3 and g5 are uncorrelated in the
    general population. Strictly speaking, we cannot
    test this assumption as we do not have a
    population-based control sample (our controls are
    all unaffected). However, if the disease is rare,
    our controls should be reasonably close to an
    unselected sample. So we can use them to see if
    the genotype variables g3 and g5 are uncorrelated
    in the control population contg3lt-g33851056
    contg5lt-g53851056 contg3 contg5
    table(contg3,contg5) chisq.test(contg3,contg5)
    You should find a non-significant p value
    (p0.99). This suggests that the case-only
    analysis we did is valid, so there is indeed some
    reasonable (p0.002) evidence for statistical
    interaction between these loci.

34
  • Running the command lines in R
  • to test for epistasis

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
Resources
  • for microarray analysis
  • http//www.nslij-genetics.org/microarray/

81
  • Review Paper
  • gene expression analysis
  • (Slonim et al 2002)

82
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com