Latest news
  • Welcome to Statistical Learning for Big Data!
  • New deadline Project 4: June 10th 2pm. This is a HARD DEADLINE!
  • I will be available Mon, Wed 13-15 the next 2 weeks to answer any questions you might have about the final project.
  • Final
    Cats and Dogs Rdata file,Contaminated spam data, TCGAdata.RData
  • I am looking for master students for the fall and also have an opening for a PhD. Contact me after June 10th if you're interested.
  • Teachers
    Course coordinator: Rebecka Jörnsten
    Email: jornsten@chalmers.se
    Office: MVH 3029
    Office hours: Mon and Wed 15-16 in MVH3029
    Course litterature

    The Elements of Statistical Learning , Hastie, T., Tibshirani, R., and Friedman, J.

    Weblink to the book. (Right-click to open in new tab or window).

    We will also use Journal papers and other materials. These will be posted under "Programme".

    Recommended texts include:

  • "Statistics for High-dimensional Data: Methods, Theory and Applications", Springer 2011, P. Buhlmann and S v.d. Geer, editors.
  • "Handbook of Big Data", Chapman and Hall CRC, 2016. P. Buhlmann, P. Drineas, M. Kane, M vd Laan editors

  • Programme
    Syllabus


    Lectures
    Week Contents
    Material

    w12

    Introduction. Big-p, Big-n, Big-p-and-n
    CART and RandomForest - Project 1

    Chapters 2.1-2.3. Lecture 1, puppy image as text , R code
    Chapters 8.7, 9.2, 15
    Lecture 2, R code
    Project 1: instructions at the end of 2nd set of lecture notes. Remember to sign up for a project using the doodle!

    w15

    Modelbuilding: regression and classification


    2.1-2.7, 3.1-3.8, 4.1-4.4, 7.1-7.10, 13.3
    Lecture 3, R code , winedata
    LDA R code , More LDA R code
    RECAP linear models: Regression , Model selection, Logistic regression
    R code

    w16

    Big p: sparse modeling.
    Project 2

    3.8, 18.2-18.4, 18.6
    Lecture 6. Try packages lars and elasticnet
    Lecture 7, High-dimensional inference paper
    Simulation study,SAheart data
    Lecture 8, caret paper
    caret slides , sparse LDA paper

    w17

    Data representations: PCA, SVD, NMF, MDS and SOM

    14.4-14.9, Journal papers
    Lecture 9 - RECAP , R code, cars data
    Lecture 10 , R code
    Sparse SVD, Zou et al , The Extraordinary SVD

    w18

    Clustering
    Project 3

    14.1-14.3 + journal papers
    Lecture 11 , R code
    Fun paper about NMF and several high-dimensional data sets , Using sparse SVD for high-dimensional data

    w19

    Big n: Divide and conquer, Bag of Little Bootstraps

    Lecture 12 , Lecture 13
    Subspace clustering , Spectral clustering , Consensus clustering
    Demo simple clustering , Demo model based clustering
    Demo spectral clustering , Demo spectral clustering TCGAdata.RData TCGA data and class labels (load("TCGAdata.RData"), Demo code

    w20

    Bayesian vs Frequentists. Review
    Project 4

    Lecture 14 , Statistical methods and computing for big data
    Bag of Little Bootstraps ,Leveraging
    Bootstrap demo, BLB demo
    Lecture 15 , Divide and Conquer paper
    Maximum mean likelihood ,Split and Conquer - penalized regression
    Lecture 16
    Sullivan and Feinn: Pvalues and Effect Size , A. Gelman: induction and deduction,A. Gelman: P-values and Statistical Practice
    B. Efron: A 250-year argument
    Raftery et al, Bayesian Model Averaging , Park and Casella: Bayesian Lasso
    Review lecture , Review demo


    Assignments

    There will be 4 projects. You can work in pairs for the first 3 projects, but not the same pairs. If you prefer to work on your own this is fine too.
    You have to write project reports and present results in class.
    Examination

    Your final grade will be based on the 3 class projects and one final project that you will work on individually. Each project counts for 25 percent of your final grade. All projects are compulsory.
    Examination procedures
    In Chalmers Student Portal you can read about when exams are given and what rules apply on exams at Chalmers.
    At the link Scedule you can find when exams are given for courses at University of Gothenburg.
    At the exam, you should be able to show valid identification.
    Before the exam, it is important that you report that you want to take the examination. If you study at Chalmers, you will do this by the Chalmers Student Portal, and if you study at University of Gothenburg, so sign up via GU's Student Portal.

    You can see your results in Ladok by logging on to the Student portal.

    At the annual examination:
    When it is practical a separate review is arranged. The date of the review will be announced here on the course website. Anyone who can not participate in the review may thereafter retrieve and review their exam on Mathematical sciences study expedition, Monday through Friday, from 9:00 to 13:00. Check that you have the right grades and score. Any complaints about the marking must be submitted in writing at the office, where there is a form to fill out.

    At re-examination:
    Exams are reviewed and picked up at the Mathematical sciences study expedition, Monday through Friday, from 9:00 to 13:00. Any complaints about the marking must be submitted in writing at the office, where there is a form to fill out.
    Old exams
    ...
    ...