High-dimensional data analysis, fall 2013

Lectures: Mondays 13:15 – 15:00 in MVH 11

Thursdays 13:15 – 15:00 in MVL 15

Wednesdays 13:15 – 15:00 in ?  (reserve time)

Course book:

“Statistics for High-Dimensional Data. Methods, Theory and Applications”,  P. Buhlmann and  S. van de Geer, Springer 2011.

Complementary book:

”The Elements of Statistical Learning”, T. Hastie, R. Tibshirani, J. Friedman, Springer 2009

Course content:

Lasso

- linear models

- generalized linear models

- group

- smooth functions

P-values

Boosting (probably not)

Graphical models

Asymptotics

Computation

Exercises:

2.1, 2.2, 2.3,  2.5, 2.8, 3.2, 3.3, 3.4, 3.5, 4.1, 5.1, 5.4, 5.5, HTF5.1, H:1, 6.1, 10.1

Examination:

Oral exam  (one random question on  book/slide  + one random question on an excercise + follow-up/other questions: 30 min to prepare, using all material, about 20 min for exam)

Project: analyze a high-dimensional data set of your own (if you don’t have one, Volvo might be able to provide), alone or in groups. Examination by presentation of  your project,  Monday,  Nov. 25 or Thursday, Nov . 28 +  handin  of the slides for your presentation.

A computation lab (probably cancelled)

 Date Content Literature Thursd.   12/9 MVL 15 Introduction, lecture by José, the Lasso José’s slides B&vdG 1 - 2.3 Slides: Hdd1 Mond.   16/9 MVH 11 Prediction, selection, asymptotics B&vdG 2.4 – 2.7, HTF 7.10 Slides: Hdd1 Thursd.   19/9 MVL 15 Adaptive Lasso, thresholding the Lasso,  BIC,  elastic net B&vdG 2.8 – 2.13 Slides: Hdd2 Mond.   23/9 MVH 11 generalized linear models group Lasso B&vdG 3, 4.1-4.6 Slides: Hdd3, Hdd4 Thursd.   26/9 MVL 15 group Lasso additive models B&vdG 4.1-4.6, 5.1-5.3.2, 5.4.0, 5.4.2-5.9, HTF 5.1, 5.2 Slides: Hdd4, Hdd5 Mond.   30/9 No Lecture Wednesd. 2/10 problem solving by participants Solved problems: 2.1, 2.2 2.3, 2.8, 3.2, 3.3, 3.4 Slides: Solutions1 Thursd.   3/10 MVL 15 additive models B&vdG 5.1-5.3.2, 5.4.0, 5.4.2-5.9, HTF5.1, 5.2 Slides: Hdd5 Mond.   7/10 MVH 11 proofs B&vdG 6.2 Slides: Hdd6 Wednesd. 9/10   MVH 11 problem solving by participants (Almost) solved problems: 4.1, 5.4, 5.5, HTF 5.1,   H:1 Slides: Solutions2 Thursd.   10/10   MVL 15 proofs B&vdG 6.2 Slides: Hdd6 Mond.   14/10  MVH 11 No Lecture Thursd.  17/10 No Lecture Mond.   21/10 No Lecture Thursd.   24/10 MVL 15 stable solutions discussion of projects B&vdG 10 Slides: Hdd10 Mond.   28/10 MVH 11 p-values B&vdG 11 Slides: Hdd11 Thursd.   31/10 MVL 15 p-values B&vdG 11 Slides: Hdd11 Mond.   4/11 No Lecture Thursd.   7/11 No Lecture Mond.   11/11 MVH 11 graphical modelling B&vdG 13 Slides: Hdd13 Thursd.   14/11 MVL 15 graphical modelling B&vdG 13 Slides: Hdd13 Mond.   18/11 MVH 11 problem solving by  participants Slides: Solutions 3 Thursd.   21/11 No Lecture Mond. 25/11 problem solving by participants Thursd.   28/11 MVL 15 project presentations

Links to R-programs (tutorials for some of the programs can be found by googling):

http://www-stat.stanford.edu/~tibs/statlearningsoft.html

http://stat.ethz.ch/~buhlmann/software/

Slides:

High-dimensional data 1

High-dimensional data 2

Highdimensional data 3

Highdimensional data 6

Highdimensional data 10

Highdimensional data 11

Highdimensional data 13

Conditional distributions for multivariate normal distribution (extracted from "Stationary Stochastic Processes for Scientists and Engineers", by Lindgren, Sandsten; Rootzen, http://www.crcpress.com/product/isbn/9781466586185)

Solutions2

Solutions3

Projects

Tobias Abenius "Fused elastic net EPoC"

José Sánchez "Gene Networks Estimation: Extensions of the lasso"

Artur Grzebowski & Henrike Häbel "PQS dissatisfaction survey: comparison OPLS/Lasso"