MSG500, Linear Statistical Models, Autumn 17

Latest news

Extra office hours next week: Tue 8-12. During the break I will try to respond to email - but please make sure your questions are itemized, clear and concise, and remember there are 50 of you and one of me.... so try first on your own and only check in with me if you are truly stuck, something looks odd, you are running out of ideas.

Old exams: look through before Thur next week when we will go through them together. Final Jan 2017, Final Aug 2016, Final Jan 2016

Solutions: Final Jan 2017, Final Aug 2016, Final Jan 2016

Mini5: Prepare 6-10 slide of your preliminary analysis for your project. PRINT OUT THESE SLIDES!!!
Print these. We will have a "poster session" Dec 15 where you will tape these slides around the room. I will visit your posters - you will visit eachothers' posters - you may get hints, inspiration, help from other students and me to better help you finish your projects. Poster: a) describe your data - challenges with it b) state your goals c) some preliminary findings d) something "weird", surprising, unexpected you want to get feedback on?

Important information about your projects!!!! It is great that so many data sets are available online and helpful to see what other people have done in terms of getting nice visualizations of the data, ideas for how to model etc. That being said;

  • 1. You have to submit all your codes with your project as a separate file. Both your report and your code will be checked against sources online.
  • 2. Yes, you can use codes available online but you have to reference where you took them. <\li>
  • 3. For some data, kernels are more or less "complete" in terms of running a basic linear model or even some model selection. That is fine - you can build on that, but again you need to reference the source.
  • In addition, of course your project has to do more than that kernel. Most kernels are limited when it comes to interpretation - what does this result mean? Which variables are selected and why? Is the model selection stable? Does that match up with p-values? Effect of sample size on training? I expect something more thorough from you than most kaggle kernels provide. Also, be critical when you look at these contributions. There is no guarantee that the person writing the kernel did things correctly. If your data set has a very detailed kernel that goes with it - fine, use it and build on it. Create a stability analysis e.g. by using less data, adding noise, adding outliers, adding missing values, adding extra noise variables,...... or try with a different type of model and compete (CART instead of regression, regularized regression or PCR).

    Teachers

    Course coordinator: Rebecka Jornsten

    Office hours:Tue 10-12 in MVH3029

    Course literature



    J.O. Rawlings, S.G. Pantula, D.A. Dickey. Applied Regression Analysis.
    Lecture notes
    You will use the statistical package R for the lab and project work.

    Program

    Syllabus
    (pdf file)
    Preliminary Course Outline
    Week Chapter
    Topic
    44
    1:1-7,9, 11, 12:1-4
    Basics, Simple linear model, Diagnostics, Matrix formulation
    45
    2,3,4,(6), 11, 12, 13.1
    Multiple regression. Diagnostics and testing
    46
    9+notes
    ANOVA,ANCOVA, Model selection
    47
    7,13+notes
    Model selection
    48
    notes, 7
    Bootstrap, Cross-validation
    49
    13+notes
    Regularized regression
    50
    8,10,12,15
    WLS, NLM, GLM
    In-class presentations, Old Exams


    Lecture notes, Demos and MiniAnalyses
    Week Material
    MiniAnalysis
    44
    Lecture 1, Lecture 1 R code
    Demo 1, Demo 1b
    sleeptab.dat Animal sleep data, choc.dat Chocolate data set, TV.dat TV data set
    Lecture 2, Lecture 2 R code
    Demo 2,
    MiniAnalysis1
    Due Nov 10
    kc_house_data.csv, The King County house price data (use read.csv to get into R)


    45
    Lecture 3, Lecture 3 R code
    Lecture 4, Lecture 4 R code
    Lecture 5, Lecture 5 R code
    Housing demo


    MiniAnalysis2
    Due Nov 17


    46
    Lecture 6, Lecture 6 R code, , SA.dat Heart disease data
    Demo 6, pollution RData file
    Lecture 7, Lecture 7 R code, cars.dat car data, demo7
    Lecture 8, Lecture 8 R code, demo 8 , SelAndPred.R, CVcode.R, ModelSelection.R
    MiniAnalysis 3 , due Nov 24
    47
    Lecture 9, Lecture 9 R code, demo 9
    Lecture 10, Lecture 10 R code, cola.dat , HouseData.txt The Boston housing data; price, sqft, tax as wel as indicators on grade (features), location (NE - northeast Boston) and Corner (if house is on a corner plot).
    Lecture 11, Lecture 11 R code, anorexia.dat
    48
    Lecture 12, Lecture 12 R code
    Lecture 13, Lecture 13 R code, demo 13
    Lecture 14, Lecture 14 R code, wine data, CART slides
    Mini4 , due Dec 8
    49
    Lecture 15, Lecture 15 R code
    Lecture 16, Lecture 16 R code
    Lecture 17, Lecture 17 R code
    Mini5 , due Dec 15
    50
    about the prostate data, demo R code, prostate.RData Modelaveraging code (KA is the number of models to average, use nbr (the number of models of each size) bigger than 1 for this).


    Reference literature:


    J.O. Rawlings, S.G. Pantula, D.A. Dickey. Applied Regression Analysis.
    Lecture notes
    You will use the statistical package R for the lab and project work.

    Course requirements

    The learning goals of the course can be found in the course plan.

    Assignments

    MiniAnalysis 1-6 are labs where you will have a week to perform an analysis task and prepare slides to present in class on FRIDAYS. It is mandatory to present at least 3/6 minis. You can work in teams of 1-3 students. Teams will be selected at random on Friday Mini sessions. We discuss your findings in class.

    Examination

    The Minis contribute 10% to your final grade AND the minis will also be part of your final project exam. The project counts for 40% of your grade and a written final for 50%.

    Examination procedures

    In Chalmers Student Portal you can read about when exams are given and what rules apply on exams at Chalmers. In addition to that, there is a schedule when exams are given for courses at University of Gothenburg.

    Before the exam, it is important that you sign up for the examination. If you study at Chalmers, you will do this by the Chalmers Student Portal, and if you study at University of Gothenburg, you sign up via GU's Student Portal, where you also can read about what rules apply to examination at University of Gothenburg.

    At the exam, you should be able to show valid identification.

    After the exam has been graded, you can see your results in Ladok by logging on to your Student portal.

    At the annual (regular) examination:
    When it is practical, a separate review is arranged. The date of the review will be announced here on the course homepage. Anyone who can not participate in the review may thereafter retrieve and review their exam at the Mathematical Sciences Student office. Check that you have the right grades and score. Any complaints about the marking must be submitted in writing at the office, where there is a form to fill out.

    At re-examination:
    Exams are reviewed and retrieved at the Mathematical Sciences Student office. Check that you have the right grades and score. Any complaints about the marking must be submitted in writing at the office, where there is a form to fill out.

    Old exams