A Gentle Introduction to R for Stat 133 Class - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

A Gentle Introduction to R for Stat 133 Class

Description:

access R by launching the corresponding executable (RGui.exe under Windows, R under Unix) ... 2 NH 169 3259 345.9 564 NOREAST NEWENG. Read this data set into a ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 51
Provided by: Eduard6
Category:

less

Transcript and Presenter's Notes

Title: A Gentle Introduction to R for Stat 133 Class


1
  • A Gentle Introduction to Rfor Stat 133 Class
  • by Doods

2
Introduction to R
  • 1. A few things to know before starting
  • access R by launching the corresponding
    executable (RGui.exe under Windows, R under
    Unix).
  • the prompt gt indicates that R is waiting for
    your commands.

3
1.1 The operator lt-
  • R is an object-oriented language the variables,
    data, matrices, functions, results, etc. are
    stored in the active memory of the computer in
    the form of objects which have a name
  • one has just to type the name of the object to
    display its content.
  • For example, if an object n has value 10
  • gt n
  • 1 10gt
  • The digit 1 within brackets indicates that the
    display starts at the first element of n.

4
  • The symbol assign is used to give a value to an
    object. This symbol is written with a bracket (lt
    or gt) together with a sign minus so that they
    make a small arrow which can be directed from
    left to right, or the reverse
  • gt n lt- 15
  • gt n
  • 1 15
  • gt 5 -gt n
  • gt n
  • 1 5

5
  • Can simply type an expression without assigning
    its value to an object, the result is thus
    displayed on the screen but not stored in memory
  • Example
  • gt (102)5
  • 1 60
  • 1.2 Listing and deleting the objects in memory
  • The function ls() lists the objects in memory
    only the names of the objects are displayed.
    Example
  • gt name lt- "Laure" n1 lt- 10 n2 lt- 100 m lt- 0.5
  • gt ls()
  • 1 "m" "n1" "n2" "name"
  • Note the use of the semi-colon "" to separate
    distinct commands on the same line.

6
1.3 The on-line help
  • The on-line help of R gives some very useful
    informations on how to use the functions. The
  • help in html format is called by typing
  • gt help.start()
  • A search with key-words is possible with this
    html help. Example
  • gt help(anova)

7
2. Data with R
  • 2.1 Lire des données à partir dun fichier
  • R can read data stored in text (ASCII) files
    three functions can be used read.table()(which
    has two variants read.csv() and read.csv2()),
    scan() and read.fwf(). We will discuss only
    read.table() and read.fwf()function.
  • Example, if we have a file data.dat, one can just
    type
  • gt mydata lt- read.table("data.dat")
  • Note you can specify your directory example
    read.table(a\data.dat"). The default is a
    library where R is installed..

8
  • data.dat is a file in table (text) format
    separated by space(s).
  • Example mydata.dat has the format below
  • A 1.50 1.2
  • A 1.55 1.3
  • B 1.60 1.4
  • B 1.65 1.5
  • C 1.70 1.6
  • C 1.75 1.1
  • Can read with gtread.table(mydat.dat,
    col.namesc(dose, water, relief)) to get
  • Dose Water Relief
  • A 1.50 1.2
  • A 1.55 1.3
  • B 1.60 1.4
  • B 1.65 1.5
  • C 1.70 1.6
  • C 1.75 1.1

9
  • If mydata.dat already has the format
  • Dose Water Relief
  • A 1.50 1.2
  • A 1.55 1.3
  • B 1.60 1.4
  • B 1.65 1.5
  • C 1.70 1.6
  • C 1.75 1.1
  • Then read this table with
  • gt read.table(" mydata.dat", headerT) reads the
    variable name
  • Dose Water Relief
  • A 1.50 1.2
  • A 1.55 1.3
  • B 1.60 1.4
  • B 1.65 1.5
  • C 1.70 1.6
  • C 1.75 1.1

10
  • For read.fwf(), the options are the same than for
    read.table() except widths which specifies the
    width of the fields.
  • Example data.txt has the following data
  • A1.501.2
  • A1.551.3
  • B1.601.4
  • B1.651.5
  • C1.701.6
  • C1.751.7, one can read them with
  • gt mydata lt- read.fwf("data.txt", widthsc(1,4,3))
  • gt mydata
  • V1 V2 V3
  • 1 A 1.50 1.2
  • 2 A 1.55 1.3
  • 3 B 1.60 1.4
  • 4 B 1.65 1.5
  • 5 C 1.70 1.6
  • 6 C 1.75 1.

11
  • The function write(x, file"data.txt") writes an
    object x (a vector, a matrix, or an array) in the
    file data.txt.
  • 3. Graphics with R
  • R offers a remarkable variety of graphics. To get
    an idea, one can type demo(graphics).
  • 3.1 Opening several graphic windows
  • When a graphic function is typed, a graphic
    window is open with the graph required. It is
    possible to open another window by typing
  • gt x11()
  • The window so open becomes the active window, and
    the subsequent graphs will be displayed on it.
    To know the graphic windows which are currently
    open
  • gt dev.list()
  • windows windows
  • 2 3

12
  • The figures displayed under windows are the
    numbers of the windows which can be used to
    change the active window
  • gt dev.set(2)
  • windows
  • 2
  • 3.2 Partitioning a graphic window
  • The function split.screen() partitions the active
    graphic window. For instance, split.screen(c(1,2)
    ) divide the window in two parts which can be
    selected with screen(1) or screen(2)
  • erase.screen() erases the last drawn graph. Well
    have more on this when we go to linear models.

13
4. Generating data
  • Regular sequences.
  • A regular sequence of integers, for example from
    1 to 30, can be generated with
  • gt x lt- 130
  • The resulting vector x has 30 éléments. The
    operator has priority on the arithmetic
    operators within an expression
  • gt 110-1
  • 1 0 1 2 3 4 5 6 7 8 9
  • gt 1(10-1)
  • 1 1 2 3 4 5 6 7 8 9

14
  • The function seq() can generate sequences oe real
    numbers as follows
  • gt seq(1, 5, 0.5)
  • 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
  • where the first number indicates the start of the
    sequence, the second one the end, and the
  • third one the increment to be used to generate
    the sequence. One can use also
  • gt seq(length9, from1, to5)
  • 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

15
  • It is also possible to type directly the values
    using the function c()
  • gt c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)
  • 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
  • which gives exactly the same result, but is
    obviously longer.
  • The function rep() creates a vector with
  • elements all identical
  • gt rep(1, 30)
  • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1

16
  • The function sequence() creates a series of
    sequences of integers each ending by the
  • numbers given as arguments
  • gt sequence(45)
  • 1 1 2 3 4 1 2 3 4 5
  • gt sequence(c(10,5))
  • 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5

17
5. Statistical analyses with R
  • Lots of statistical packages, but will probably
    concentrate on regression only. See the URL
    http//cran.rproject.
  • org/src/contrib/PACKAGES.html for extensive list.
  • Among the most remarkable ones, there are
  • gee (generalised estimating equations)
  • multiv (multivariate analyses, includes
    correspondance analysis)
  • nlme (linear and nonlinear models with
    mixed-effects)
  • survival5 (survival analyses)
  • tree (trees and classification)
  • tseries (time-series analyses)
  • lm linear models

18
  • We will discuss only regression analysis.
  • If we have two vectors x and y each with five
    observations, and we wish to perform a linear
    regression of y on x
  • gt x lt- 15
  • gt y lt- rnorm(5) generates 5 random numbers from
    a normal distribution
  • gt lm(yx)
  • Call
  • lm(formula y x)
  • Coefficients
  • (Intercept) x
  • 0.2252 0.1809

19
  • As for any function in R, the result of lm(yx)
    can be copied in an object
  • gt mymodel lt- lm(yx)
  • if we type mymodel, the display will be the same
    as previously. Several functions allow the user
    to display details relative to a statistical
    model, among the useful ones summary() -
    displays details on the results of a model
    fitting procedure residuals()displays the
    regression residuals, predict() displays the
    values predicted by the model, and coef()
    displays a vector with the parameter estimates
  • .

20
  • gt summary(mymodel)
  • Call
  • lm(formula y x)
  • Residuals
  • 1 2 3 4 5
  • 1.0070 -1.0711 -0.2299 -0.3550 0.6490
  • Coefficients
  • Estimate Std. Error t value Pr(gtt)
  • (Intercept) 0.2252 1.0062 0.224 0.837
  • x 0.1809 0.3034 0.596 0.593
  • Residual standard error 0.9594 on 3 degrees of
    freedom
  • Multiple R-Squared 0.1059, Adjusted R-squared
    -0.1921
  • F-statistic 0.3555 on 1 and 3 degrees of
    freedom, p-value 0.593

21
  • gt residuals(mymodel)
  • 1 2 3 4 5
  • 1.0070047 -1.0710587 -0.2299374 -0.3549681
    0.6489594
  • gt predict(mymodel)
  • 1 2 3 4 5
  • 0.4061329 0.5870257 0.7679186 0.9488115 1.1297044
  • gt coef(mymodel)
  • (Intercept) x
  • 0.2252400 0.1808929

22
  • Other statistical functions in R
  • func indicates the law of probability, n the
    number of data to generate and p1, p2, ...
    are the values for the parameters of the law.
  • Heres a few
  • Gaussian (normal) rnorm(n, mean0, sd1)
  • exponential rexp(n, rate1)
  • gamma rgamma(n, shape, scale1)
  • Poisson rpois(n, lambda)
  • Weibull rweibull(n, shape, scale1)
  • Cauchy rcauchy(n, location0, scale1)
  • beta rbeta(n, shape1, shape2)
  • Student (t) rt(n, df)

23
  • Fisher (F) rf(n, df1, df2)
  • Pearson (?2) rchisq(n, df)
  • binomial rbinom(n, size, prob)
  • geometric rgeom(n, prob)
  • hypergeometric rhyper(nn, m, n, k)
  • logistic rlogis(n, location0, scale1)
  • lognormal rlnorm(n, meanlog0, sdlog1)
  • negative binomial rnbinom(n, size, prob)
  • uniform runif(n, min0, max1)
  • Wilcoxons statistics
  • rwilcox(nn, m, n), rsignrank(nn, n)

24
6. The Plot Function
  • Generic X-Y Plotting
  • Usage
  • plot(x, y, xlimrange(x), ylimrange(y),
    type"p",
  • main, xlab, ylab, ...)
  • Arguments
  • x the coordinates of points in the plot.
    Alternatively, a single plotting structure,
    function or any R object with a plot' method
    can be provided.
  • y the y coordinates of points in the plot,
    optional if x' is an appropriate structure.
  • xlim, ylim the ranges to be encompassed by the x
    and y axes.

25
  • type what type of plot should be drawn.
    Possible types are
  • "p" for points,
  • "l" for lines,
  • "b " for both,
  • "c" for the lines part alone of "b"',
  • "o" for both overplotted'',
  • "h" for histogram'' like (or high-density'')
  • vertical lines,
  • "s" for stair steps,
  • "S" for other steps, see Details below,
  • "n" for no plotting.
  • All other type's give a warning or an error

26
  • main an overall title for the plot.
  • xlab a title for the x axis.
  • ylab a title for the y axis.

27
6. 1 Plotting Densities
  • Normal density curves
  • gt plot(dnorm, xlim c(-7, 7), ylim c(0,0.8))
  • gt x lt- seq(-7, 7, 0.5)
  • gt normdensity1 lt- dnorm(x, mean 0, sd 0.6)
  • gt normdensity2 lt- dnorm(x, mean 0, sd 2)
  • gt normdensity3 lt- dnorm(x, mean 3, sd 0.8)
  • gt lines(x, normdensity1, col "blue")
  • gt lines(x, normdensity2, col "red")
  • gt lines(x, normdensity3, col "green")

28
Plot of Normal Density
29
  • Gamma density curves
  • gt x lt- seq(0, 7, 0.1)
  • gt gammadensity1 lt- dgamma(x, shape 1, scale
    1)
  • gt gammadensity2 lt- dgamma(x, shape 2, scale
    2)
  • gt gammadensity3 lt- dgamma(x, shape 2, scale
    1/3)
  • gt gammadensity4 lt- dgamma(x, shape 2, scale
    1)
  • gt plot(x, gammadensity1, type "l", ylab
    "Gamma densities", ylim c(0, 1.1))
  • gt lines(x, gammadensity2, col "blue")
  • gt lines(x, gammadensity3, col "red")
  • gt lines(x, gammadensity4, col "green")

30
  • Exponential density curves
  • gt plot(dexp, xlim c(0, 7), ylab exponential
    densities)
  • gtx lt- seq(0, 7, 0.1)
  • gtexpdensity1 lt- dexp(x, rate 0.5)
  • gtexpdensity2 lt- dexp(x, rate 2)
  • gtlines(x, expdensity1, col "blue")
  • gtlines(x, expdensity2, col "red")

31
  • Chi-squared densities
  • gt x lt- seq(0, 10, 0.1)
  • gt chisq1 lt- dchisq(x-1, df 1)
  • gt plot(x-1, chisq1, type "l", ylab
    chi-squared densities)
  • gt chisq2 lt- dchisq(x, df 2)
  • gt chisq3 lt- dchisq(x, df 3)
  • gt lines(x, chisq2, col "blue")
  • gt lines(x, chisq3, col "red")

32
Going back to Regression, etc
  • create and embellish simple graphics R.
  • The file education.dat contains data about school
    expenditures for each of the 50 states and the
    District of Columbia.
  • The variables are school expenditures in 1970
    (SE70), each states citizens average income in
    1968 (PI68), school age population per capita in
    1969 (Y69), urban population per capita in 1970
    (Urban70), and two variables for the states
    location Region (general) and Locale (specific).
  • We are interested in exploring the relationships
    between SE70 and the other variables.

33
  • education.dat
  • State SE70 PI68 Y69 Urban70 Region Locale
  • 1 ME 189 2824 350.7 508 NOREAST NEWENG
  • 2 NH 169 3259 345.9 564 NOREAST NEWENG
  • .. ...
  • Read this data set into a data frame and attach
    to it. (The database is attached to the R search
    path. This means that the database is searched by
    R when evaluating a variable, so objects in the
    database can be accessed by simply giving their
    names. Heres how
  • gt edu lt- read.table("education.dat", headerT)
  • gt attach(education)
  • gt hist(SE70)produces a histogram

34
  • We can expect that school expenditures will be
    related to the number of students, so lets plot
    Y69 (students) against SE70.
  • gtplot(Y69, SE70)
  •  

35
  • Embellishing Plots
  • Once you have created a plot, you can embellish
    it in various ways. S-PLUS documentation refers
    to function
  • of this sort as low-level graphics functions.
  • Adding Titles
  • R allows you to add titles at several places in
    the plot a main title (above the picture) and
    x-axis and y-axis titles.
  • gt plot(PI68, SE70)
  • gt title(main"Income vs. School Expenditures")

36
  • Adding a Line
  • The function abline adds a line of the form y a
    bx to a plot. It also has an lty field to
    allow different line types (dashed, dotted,
    etc.).
  • Lets add two lines to the plot of income and
    school expenditures, one including Alaska and the
    other not.
  • We can find these lines using regression. The
    first command below says SE70 modelled by PI68
    and the second says SE70 modelled by PI68 using
    all points except the 50th.

37
  • gt lm(SE70 PI68)
  • Output
  • Call
  • lm(formula SE70 PI68)
  • Coefficients
  • (Intercept) PI68
  • 17.71003 0.05537594
  • Degrees of freedom 51 total 49 residual
  • Residual standard error 34.9384

38
  • gt lm(SE70 PI68, subset-50)
  • Call
  • lm(formula SE70 PI68, subset -50)
  • Coefficients
  • (Intercept) PI68
  • 40.56264 0.04747211
  • Degrees of freedom 50 total 48 residual
  • Residual standard error 29.93981

39
  • gt lm(SE70 PI68, subset-50)
  • Call
  • lm(formula SE70 PI68, subset -50)
  • Coefficients
  • (Intercept) PI68
  • 40.56264 0.04747211
  • Degrees of freedom 50 total 48 residual
  • Residual standard error 29.93981

40
  • Now we can use the first equation to plot a solid
    line (line type 1, the default), and the second
    to plot a dotted line (line type 2).
  • gt abline(17.71003, 0.05537594, lty1)
  • gt abline(40.56264, 0.04747211, lty2)

41
(No Transcript)
42
  • Adding Text
  • R allows you to add text to a graph in a few
    different ways. The simplest is with the function
    text.
  • The format is text(x,y,"string"), and this plots
    the text string at the coordinates (x,y).
  • R also has a somewhat more elegant function,
    legend. This function takes x and y coordinates,
    a
  • vector of labels, and a vector of colors or
    line types corresponding to those labels.
  • For instance, we could add a legend to the graph
    which labels the two line types, and text to
    write
  • Alaska approximately underneath the Alaska data
    point.

43
  • gt legend(2500, 350, c("With Alaska","Without
    Alaska"), ltyc(1,2))
  • gt text(4100, 360, "Alaska")

44
Plotting All Pairs of Variables
  • You can plot all possible pairs of variables in
    edu by issuing the command.
  • gt pairs(edu)
  • Try it!

45
(No Transcript)
46
Multiple Plots per Page
  • To see multiple plots on the same page, first
    break the graphics window into multiple sections.
    Lets try a 2 by 2 grid of smaller windows.
  • gt par(mfrowc(2,2))
  • This has no obvious immediate effect. However,
    the next plot will appear in the upper left hand
    frame of a 2 by 2 matrix of plotting frames.
    Lets fill these frames with plots of predictors
    and SE70, two of which we have already seen.
  • gt plot(PI68, SE70)
  • gt plot(Y69, SE70)
  • gt plot(Urban70, SE70)

47
Multiple Plots
48
  • Heres another one you can try If you create the
    object
  • educ.lm lt-lm(SE70 PI68)
  • Then you can plot all diagnostics by
  • gtpar(mrowc(2,2)) and
  • gtplot(educ.lm)

49
Diagnostic Plots
50
End?!
  • Thats probably enough, just read the R manual
    and the others I posted on the internet!
Write a Comment
User Comments (0)
About PowerShow.com