## DATA

The assignment consists of analysing the Data: 2003 and 2004 collected from the
students taking the course. Click here to get the data in the Splus format.

The data anonymously presents the following personal information:

S = student's sex (F/M)
C = student's original hair color, 1 for light, 2 for brown, 3 for black
H = student's height in cm
W = student's weight in kg
The assignment is to be done using S-PLUS. These links might be helpful
when doing the assignment
### Assignments

1. Summarize the data using summary statistics.

2. Is there a relationship between person's S and C? Put together a
2x3 contingency table reflecting the joint distribution of two factors.
Set an appropriate null hypothesis and test it at 5% significance level.
What is the P-value of the test?

3. Draw your conclusions after doing normal probability plots on
the weights and on the heights.

4. Fit a straight line to the scatterplot of weights vs heights. What
is your conclusion about the relationship between them? Give an appropriate
measure of dependence between the weight and height of a person.

5. Estimate from the data the population means for the weight and the
height. What are the standard errors of these estimates?

6. Present the results of your analysis in a nice readable form.

### Help with SPLUS

#### Data

Importing your ASCII file to Splus:
`data <- importData("your_file.txt",colNameRow=1)
# Here your data is saved in SPLUS as data, the first row of your
ASCII file data is used to name the columns
# <- (and also _) assigns a name to an object
`

The filter argument to importData allows you to subset the data
you import
`data.females <- importData("your_file.txt", colNameRow=1, filter="sex = 1")
# Here only the data for females is imported
`

To have a look at the data, type the name of the data
` data.females
`

To only select one column or row
`
data[,n]
# Here the n:th column is selected
data[n,]
# Here the n:th row is selected
heigth <- data[,4]
# Here a vector called height is created (the 4th column is saved
as a vector called height)
`

#### Tables and plots

A simple table
`table(x,y)
# Here x and y are the variables you want to tabulate
`

A contingency table
`crosstabs(~x+y)
`

A simple scatter plot
`plot(x,y, xlab="The label of x-axis", ylab="The label of y-axis")
title("A scatter plot of x and y")
#It is possible to add lines and points to the plot (z and w are
the cordinates or the lines or the points)
lines(z,w)
points(z,w)
`

A histogram
`hist(x)
`

A quantile-quantile plot
`qqnorm(x)
`

#### Summary statistics and tests

Mean, variance, standard deviation, median, minimum, maximum
`mean(x)
var(x) # The sample variance
stdev(x) # The sample standard deviation
median(x)
min(x)
max(x)
`

A sample correlation coefficient
`cor(x,y)
`

A Pearson's chi-square test on a two-dimensional contingency table
`chisq.test(x,y)
# Here x and y are the variables whose relationship you are interested in
`

Fitting a linear regression model, the least squares fitting method is the default
`lm(y ~ x)
# Here x is the independent variable and y is the dependent variable
`