High-dimensional data analysis, fall 2013













 

 

 

 


Lectures: Mondays 13:15 – 15:00 in MVH 11

                Thursdays 13:15 – 15:00 in MVL 15

                Wednesdays 13:15 – 15:00 in ?  (reserve time)

Course book:

“Statistics for High-Dimensional Data. Methods, Theory and Applications”,  P. Buhlmann and  S. van de Geer, Springer 2011.

Complementary book:

”The Elements of Statistical Learning”, T. Hastie, R. Tibshirani, J. Friedman, Springer 2009

Course content:

        Lasso

            - linear models

            - generalized linear models

            - group

            - smooth functions

        P-values

        Boosting (probably not)

        Graphical models

        Asymptotics

        Computation

Exercises:

2.1, 2.2, 2.3,  2.5, 2.8, 3.2, 3.3, 3.4, 3.5, 4.1, 5.1, 5.4, 5.5, HTF5.1, H:1, 6.1, 10.1

Examination:

        Oral exam  (one random question on  book/slide  + one random question on an excercise + follow-up/other questions: 30 min to prepare, using all material, about 20 min for exam)

        Project: analyze a high-dimensional data set of your own (if you don’t have one, Volvo might be able to provide), alone or in groups. Examination by presentation of  your project,  Monday,  Nov. 25 or Thursday, Nov . 28 +  handin  of the slides for your presentation.

        A computation lab (probably cancelled)

 

Date

Content

Literature

 

 

 

Thursd.   12/9

MVL 15

Introduction, lecture by José, the Lasso

 

José’s slides

B&vdG 1 - 2.3

Slides: Hdd1

Mond.   16/9

MVH 11

Prediction, selection, asymptotics

B&vdG 2.4 – 2.7, HTF 7.10

Slides: Hdd1

Thursd.   19/9

MVL 15

Adaptive Lasso, thresholding the Lasso,  BIC,  elastic net

B&vdG 2.8 – 2.13

Slides: Hdd2

Mond.   23/9

MVH 11

generalized linear models
group Lasso

B&vdG 3, 4.1-4.6

Slides: Hdd3, Hdd4

Thursd.   26/9

MVL 15

group Lasso

additive models

B&vdG 4.1-4.6, 5.1-5.3.2, 5.4.0, 5.4.2-5.9, HTF 5.1, 5.2

Slides: Hdd4, Hdd5

Mond.   30/9

 

No Lecture

 

Wednesd. 2/10 

 

problem solving by participants Solved problems: 2.1, 2.2 2.3, 2.8, 3.2, 3.3, 3.4
Slides: Solutions1

Thursd.   3/10

MVL 15

additive models

B&vdG 5.1-5.3.2, 5.4.0, 5.4.2-5.9, HTF5.1, 5.2

Slides: Hdd5

Mond.   7/10

MVH 11

 proofs

B&vdG 6.2
Slides: Hdd6

Wednesd. 9/10 

 MVH 11

problem solving by participants

 (Almost) solved problems: 4.1, 5.4, 5.5, HTF 5.1, 

 H:1

Slides: Solutions2

 Thursd.   10/10

  MVL 15
 proofs
 B&vdG 6.2
 Slides: Hdd6

Mond.   14/10

 MVH 11

No Lecture

 

Thursd.  17/10


No Lecture

 

Mond.   21/10


No Lecture

 

Thursd.   24/10

MVL 15

stable solutions

discussion of projects

 B&vdG 10
 Slides: Hdd10

Mond.   28/10

MVH 11

 p-values

 B&vdG 11
 Slides: Hdd11

Thursd.   31/10

MVL 15

 p-values

 B&vdG 11
 Slides: Hdd11

Mond.   4/11


No Lecture

 

Thursd.   7/11


No Lecture

 

Mond.   11/11

MVH 11

 graphical modelling

 B&vdG 13
 Slides: Hdd13

Thursd.   14/11

MVL 15

 graphical modelling

 B&vdG 13
 Slides: Hdd13

Mond.   18/11

MVH 11

 problem solving by

 participants

 Slides: Solutions 3

Thursd.   21/11

 

No Lecture

 

Mond. 25/11
problem solving by

participants

 

Thursd.   28/11

MVL 15

 project presentations

 

 

 Links to R-programs (tutorials for some of the programs can be found by googling):

http://www-stat.stanford.edu/~tibs/statlearningsoft.html

http://stat.ethz.ch/~buhlmann/software/


Slides:

Gene Networks Estimation

High-dimensional data 1

High-dimensional data 2

Highdimensional data 3

Highdimensional data 4

Highdimensional data 5   

Highdimensional data 6

Highdimensional data 10

Highdimensional data 11

Highdimensional data 13

Conditional distributions for multivariate normal distribution (extracted from "Stationary Stochastic Processes for Scientists and Engineers", by Lindgren, Sandsten; Rootzen, http://www.crcpress.com/product/isbn/9781466586185)

Inverse of block matrices

Solutions1

Solutions2

Solutions3

Projects

Tobias Abenius "Fused elastic net EPoC"

José Sánchez "Gene Networks Estimation: Extensions of the lasso"

Artur Grzebowski & Henrike Häbel "PQS dissatisfaction survey: comparison OPLS/Lasso"