Laboration i matematisk statistik

DATA

The assignment consists of analysing the Data: 2003 and 2004 collected from the students taking the course. Click here to get the data in the Splus format.
The data anonymously presents the following personal information:

S = student's sex (F/M)

C = student's original hair color, 1 for light, 2 for brown, 3 for black

H = student's height in cm

W = student's weight in kg

The assignment is to be done using S-PLUS. These links might be helpful when doing the assignment

Assignments

1. Summarize the data using summary statistics.

2. Is there a relationship between person's S and C? Put together a 2x3 contingency table reflecting the joint distribution of two factors. Set an appropriate null hypothesis and test it at 5% significance level. What is the P-value of the test?

3. Draw your conclusions after doing normal probability plots on the weights and on the heights.

4. Fit a straight line to the scatterplot of weights vs heights. What is your conclusion about the relationship between them? Give an appropriate measure of dependence between the weight and height of a person.

5. Estimate from the data the population means for the weight and the height. What are the standard errors of these estimates?

6. Present the results of your analysis in a nice readable form.

Help with SPLUS

Data

Importing your ASCII file to Splus:

    data <- importData("your_file.txt",colNameRow=1)
    # Here your data is saved in SPLUS as data, the first row of your
    ASCII file data is used to name the columns
    # <- (and also _) assigns a name to an object

The filter argument to importData allows you to subset the data you import

    data.females <- importData("your_file.txt", colNameRow=1, filter="sex = 1")
    # Here only the data for females is imported

To have a look at the data, type the name of the data

     data.females

To only select one column or row

    
    data[,n]
    # Here the n:th column is selected

    data[n,]
    # Here the n:th row is selected

    heigth <- data[,4]
    # Here a vector called height is created (the 4th column is saved
    as a vector called height)

Tables and plots

A simple table

    table(x,y)
    # Here x and y are the variables you want to tabulate

A contingency table

    crosstabs(~x+y)

A simple scatter plot

    plot(x,y, xlab="The label of x-axis", ylab="The label of y-axis")
    title("A scatter plot of x and y")

    #It is possible to add lines and points to the plot (z and w are
    the cordinates or the lines or the points)
    lines(z,w)
    points(z,w)

A histogram

    hist(x)

A quantile-quantile plot

    qqnorm(x)

Summary statistics and tests

Mean, variance, standard deviation, median, minimum, maximum

    mean(x)
    var(x)  # The sample variance
    stdev(x) # The sample standard deviation
    median(x)
    min(x)
    max(x)

A sample correlation coefficient

    cor(x,y)

A Pearson's chi-square test on a two-dimensional contingency table

    chisq.test(x,y) 
    # Here x and y are the variables whose relationship you are interested in

Fitting a linear regression model, the least squares fitting method is the default

    lm(y ~ x)
    # Here x is the independent variable and y is the dependent variable