# TMS150/MSG400, Stochastic data processing and simulation, 2017/18

This course focuses on learning how to solve statistical problems with a selection of statistical software. It consists of lectures and six mandatory projects (labs). The projects are accompanied by extensive background/introduction texts.

## Latest news

2018-01-23 The next deadline for late reports is Mon 3 Sep, 2018.

2017-10-18 All reports for lab 2-4 that were handed in before the recommended deadlines have now been corrected and sent out. Note that reports that have been handed in after these deadlines will most likely not be corrected before 30 October. Therefore, if you want to hand in your report once more you may not want to wait too long before starting to work on it.

2017-10-05 The easiest way to compile and execute the C programs on the Windows computers is to use an online compiler such as JDoodle. You should be able to solve Lab 5 using this, or at the very least get started. It is however recommended to use Linux or Mac for this lab.

2017-10-02 Jonatan will be away for a scientific conference in study week 7. Therefore, Andreas will give the lecture on 11 Oct and Sebastian Jobjörnsson will replace Jonatan in the computer rooms on 9 Oct and 12 Oct. The office hours on 13 Oct are moved to 16 Oct 14:15-15:00.

2017-09-29 Note that in lab 4 you can often speed up boot by setting the arguments parallel="multicore" and ncpus=x where x is the number of CPUs on the machine you are using. Note also that non-parametric bootstrapping is usually faster than parametric bootstrapping. UPDATE: The speed-up seems to perform differently for different operating systems and versions of R. It might not work on Windows. Also, if you are implementing this for studentized intervals, only include the parallel and ncpus arguments in the outer bootstrap. Furthermore, it is recommended but not obligatory that you solve Assignment 4 of Lab 4 without using the built-in function "boot".

2017-09-28 Here is a hint for lab 3 regarding the numerical calculation of double integrals in R. To calculate, for example, the integral of x+t over a triangle in x-t-space you can do:

g <- function(t) {
f <- function(x) x+t
integrate(f, 0, t)\$value
}
g <- Vectorize(g)
integrate(g, 0, 1)


2017-09-22 Note that the vector of weights in Lab 2 is a column vector. We have clarified this.

2017-09-15 The student representatives for course evaluation are baazm@student.chalmers.se MARCUS BAAZ, theobo@student.chalmers.se THÉO BOCQUELET, evelinne@student.chalmers.se EVELINNE DIMOVSKI and tangp@student.chalmers.se PENGFEI TANG .

2017-09-13 Due to popular demand, the C demo session has been moved to Mon 2 Oct 9:00.

2017-09-11 There is a misprint in the introduction for lab 2. The stock data covers the range 2002-06-03 to 2006-06-01 (days when banks and stock exchanges are closed are excluded). This information is not relevant for solving the assignments, but some students have wondered why there were more observations than days.

2017-09-04 The examples that were shown during the R demo session demo2017.R

2017-09-01 For lab 2, note that the definition of the sample autocorrelation function in the lab text differs slightly from the implementation of it in MATLAB (i.e. the function autocorr). This makes however little difference in practice and you may use whatever implementation you like.

2017-08-28 Introductions for labs 1, 2, 3 and 4 are available. Remaining lab introductions will be available soon.

2017-08-25 Introductions for labs 1 and 2 are available. Remaining lab introductions will be available soon.

Welcome to the course! The schedule for the course can be found in TimeEdit.

## Teachers

Course coordinator, lectures and computer labs: Jonatan Kallus (kallus@chalmers.se)

Submit reports to: statdata.chalmers@analys.urkund.se

Examiner: Erik Kristiansson

## Course literature

Lecture notes and lab introductions will be available on this page, under the Program section. No additional course literature.

## Program

Note that none of the lectures or computer lab sessions are mandatory to attend, the only requirements is to carry out and hand in the six projects (only answers for lab 1 and 5 and complete reports for lab 2-4 and 6).

#### Lectures

Lectures are in room Euler in the physics building at campus Johanneberg.

Day Contents Lecture slides
Wed 30 Aug Introduction to the course. Lab 1: Robustness and distribution assumptions. slides
Wed 6 Sep Recap lecture on basic statistics: point estimation, confidence intervals, hypothesis testing, p-values, maximum likelihood estimators. No slides
Wed 13 Sep Lab 2: Decision theory. You are going to need this data set. Example of report structure, lab 2: (pdf, tex source code). slides
Wed 20 Sep Lab 3: Reliability and survival. slides
Wed 27 Sep Lab 4: Bootstrap. Example of report structure (pdf, tex) slides
Wed 4 Oct Lab 5: Monte Carlo integration. No slides
Wed 11 Oct Lab 6: Simulation of stochastic processes. No slides
Wed 18 Oct Extra office hours (Jonatan will be in room Euler. You can line up and ask individual questions. There will be no lecture this day.)

#### Computer labs

Computer labs are in rooms MVF22, MVF24 and MVF25 in the physics building at campus Johanneberg. To use the computers, you will need a Chalmers computer account. Visit the IT helpdesk if you don't have one already link.

The teachers will be available for answering questions. There will not be any formal class (except for the two demo sessions listed below).

Day Main topic Demo sessions
Thu 31 Aug Lab 1
Mon 4 Sep Lab 1 At 9:00 in room MVF26 Jonatan will give an intro/demo of R
Thu 7 Sep Lab 1
Mon 11 Sep Lab 1
Thu 14 Sep Lab 2
Mon 18 Sep Lab 2
Thu 21 Sep Lab 3
Mon 25 Sep Lab 3
Thu 28 Sep Lab 4
Mon 2 Oct Lab 4 At 9:00 in room MVF26 Jonatan will give an intro/demo of C. Slides
Thu 5 Oct Lab 5
Mon 9 Oct Lab 5
Thu 12 Oct Lab 6
Mon 16 Oct Lab 6
Thu 19 Oct Lab 6

#### Office hours

The teachers will only be available for supervision during the scheduled lectures, computer labs and office hours. Please respect that the teachers have limited time; do not approach them with course related matters outside the scheduled times.

Primarily ask for advice during the computer labs and lectures. As a second option, mail is OK for questions that only need a short answer. If you have questions that you strongly prefer to ask outside of computer labs and lectures, you are welcome to Jonatan's office (L2120, math building) on Fridays between 13:15 and 14:00.

## Computer labs

Lab sessions: Andreas will be available in the computer rooms. Jonatan will also be available there during the latter half of each session.

Programming: Advice on programming is given here.

Writing: I big part of your work in this course will be spent on writing reports. Being able to express knowledge and results clearly and concisely is an important skill for all scientists and engineers. This skill is one of the learning goals of this course. Advice on Latex and report writing is given here. Examples of report outline can be found under Program. This will help you in report writing, but it is not necessary to use the example structure.

To open Rstudio on the linux computers, open a terminal and type rstudio.

Working on your own laptop: You can, of course, work on your own computer. You should be able to download Matlab via Chalmers IT when you have a Chalmers computer account. R is free and can be downloaded from this site, and you can download RStudio from here. You can use Sharelatex or Overleaf for preparing latex reports. Register with your Chalmers email at Sharelatex to get a premium account. There also are free Latex compilers out there, for example this one.

Lab 3

• A reference for reading about the death intensity function is chapter 7.1.1 in the book "Probability and Risk Analysis" written by Igor Rychlik and Jesper Ryden. It is available as e-book at the Chalmers library.

Lab 4

•  A link to an introductory text about bootstrap: Bootstrap methods and permutations tests. For example, the introduction on page 2 is really good if you want a motivation why we should learn about bootstrap.

Lab 5
• Here are two example programs (from the document programming_tips.pdf): startup1.c and startup2.c. Compile with the Linux command gcc -o app1 startup1.c -lm. Here gcc is the command that invokes the C compiler, -o is a flag that indicates the we want to have the compiled program in the file app1. Further, -lm is a flag that have to be included when we use the math.h library. The program is then executed with the UNIX command ./app1.

## Course requirements

The requirements and learning goals of the course can be found in the course plan.

## Examination

The course is examined by completing the mandatory assignments. There is no written exam.

Deadlines for the first version of each report and instructions for handing in
The recommended deadlines for handing in reports are the following:

Lab 1 - Robustness and distribution assumptions Matlab and R Only answers Mon 11 Sep, 11.45 Mon 30 Oct, 17.00
Lab 2 - Decision theory Matlab Complete report Mon 25 Sep, 8.15 Mon 30 Oct, 17.00
Lab 3 - Reliability and survival R Complete report Mon 2 Oct, 8.15 Mon 30 Oct, 17.00
Lab 4 - Bootstrap R Complete report Mon 9 Oct, 8.15 Mon 30 Oct, 17.00
Lab 5 - Monte Carlo integration C and Matlab Only answers Thu 12 Oct 11.45 Mon 30 Oct, 17.00
Lab 6 - Simulation of stochastic processes. R Complete report Mon 23 Oct 8.15 Mon 30 Oct, 17.00

Lab 1 and 5:
Only answers to the questions are needed. You can pass the exercise either by showing your answers to the teacher at the exercise session (in the computer room), or by sending the answers by email to Jonatan (kallus@chalmers.se)

Lab 2, 3, 4 and 6:
Complete reports are needed. Each student needs to write individual reports. The report should be written in Latex and not exceed 10 pages, including figures, but excluding appendix. Send the reports by email in pdf format to statdata.chalmers@analys.urkund.se. The reports will be checked for plagiarism. If you hand in the report before the recommended deadline stated above, you will get the corrections back before the final deadline. In this way you will be able to hand in a return with corrections if you want. You can hand in each report maximum two times (original + return).
Instructions for hand-in of returns: Write in the mail message (the same mail as you attach the return hand-in to) which sections in the report that have been improved. (Or highlight the differences inside the report). Send in the return from the same mail address as you used for the original, so that we can easily find the original and our comments.

#### Rules for examination

• Examination is handled by means of 6 mandatory projects, which are preferably carried out in pairs of 2 students. However, it is required that each student writes his/her own report.
• Each project consist of a number of assignments, the completion of which will give you points. The exception is the first and fifth lab where you will only get the grade "pass" or "not pass", and only answers are needed. The overall grading for the course will be based on the sum of points over all the projects. However, observe that in order to pass you have to complete at least some part of each of the projects (see below).
• Written reports are to be handed in on lab 2, 3, 4 and 6. We strongly recommend you to hand in each report before its recommended deadline (see above), but there is only one "real" deadline: Oct 30. Note that we will give priority in the correction procedure to the reports that are handed in before the recommended deadlines.
• You can hand in each project twice during the course. You can hand in a project partly (e.g. first two questions but not the third) and complete it later (in your return). When we correct the reports, we will give points for things you have done correctly and give comments on what you need to improve/do for getting additional points.
• The reports are required to be written in Latex. See information in the Program section and the Computer labs section above for advice on how to write the report, what to include and how to hand in.
• The report for each lab should not be longer than 10 pages, including figures, but excluding appendix. Figures should be big enough to be readable when printed.
• It is also possible to hand in reports after the final deadline, but then they will not be corrected until we open up the course for "reexamination period" in January. Deadline: Mon 15 Jan, 2018. Second deadline: Mon 3 Sep, 2018. If these hand-ins result in a higher grade, your grade will be changed accordingly in Ladok.

#### Cheating

You are encouraged to solve the assignments in pairs, but reports should be written individually. All hand-ins will be checked for plagiarism. It is, for example, not accepted to write the report together and then change words or sentences to make them different. Cheating will be reported to the disciplinary committee of Chalmers/GU.

#### Number of points given for lab reports

Project Maximum Pass
Lab 2 13 7
Lab 3 11 5
Lab 4 14 6
Lab 6 10 4
Total 48 22

Note that for a given project report, 0.5 points will be deducted if the report is not clearly structured or is otherwise hard to understand. Likewise, 0.5 points will be deducted if the code attached to the report is not properly structured and commented.

Note, that for all grades below you need to have a pass on each of the labs (see table above), i.e. it is not enough to only have the total number of points according to the limits below.

Chalmers: