Lectures
Topics 
Chapter

Lecture materials

Introduction + Clustering

Chapters 2.12.3. Skim Chapters 8.7, 9.2, 15
Chapters 14.114.3 + journal papers

The Parable of Google Flu:
Traps in Big Data Analysis
Science VOL 343 14 MARCH 2014
David Lazer,Ryan Kennedy, Gary King,
Alessandro Vespignani
Statistical Inference, Learning and Models in Big Data Franke et al, 2016
Lecture 1, Lecture 1  R code, puppy.txt
Lecture 2, Lecture 2  R code
Data clustering: 50 years beyond kmeans Anil K. Jain, Pattern Rec Letter, 31 (2010) 651–666
Mini 1: due Thursday April 12th. Sign up on the doodle  check Lecture 2 notes for the link.

Classification

2.12.7, 3.13.8, 4.14.4, 7.17.10, 13.3

Lecture 3, Lecture 3  R code
Here's the paper that explains the various indices in the Nbclust package.
Lecture 4, Lecture 4  R code, wine data, Caret paper, Caret slides
Lecture 5, Lecture 5  R code

Highdimensional modeling

3.8, 18.218.4, 18.6

Lecture 6, Lecture 6  R code, , more R codespare LDA paper
Lecture 78, R code
Review paper on highdimensional DA, Review paper feature selection

Data representations: PCA, Factor Analysis, NMF...

14.414.9, Journal papers

Lecture 9, R code, HDI paper,
Lecture 10, Lecture 11,R code, more R code
sparse SVD paper, NMF paper generalized to structured sparsity
CatsDogs.R,CATSnDOGS.RData
Lecture 12,R code
Nonlinear Dimension reduction  review paper,LLE paper
Great DimRed review,DimRed tutorial

Clustering revisited

14.114.3 + journal papers

Lecture 13 , Mclust R code , Codes for consensus clustering etc: More R code , Even more R code , Spectral clustering R code , Graphical lasso R code
Journal papers: Modelbased clustering , Variable selection ,
The HDclassif package , The Highdim class paper , Highdim clustering
Subspace clustering , Spectral clustering , Consensus clustering
TCGAdata.RData TCGA data and class labels (load("TCGAdata.RData")

Big n.

Lecture notes and Journal papers

Lecture 14 , Bootstrap R code , BLB and leverage code , KC house price data (csv file)
Journal papers: Statistical methods and computing for big data
Bag of Little Bootstraps ,Leveraging
Journal papers for online learning/Mini 6
DataStream Classification paper , DataStream clustering paper
Big data versions of RF , Variants of decision trees , Bagging methods for concept drift , Online bagging paper.
R package that includes these online or chunkbased classification method: RMOA (with poor documentation!). Here is documentation for the Java version
MOA options . Scroll down to see which tuning parameters each method uses.
Links to MOA information
Code examples
List of methods
Dstream clustering method , Clustering stream data R package
Lecture 15 , RMOA, Stream

Review


Lecture 16 , Sullivan and Feinn: Pvalues and Effect Size ,
A. Gelman: induction and deduction,A. Gelman: Pvalues and Statistical Practice
B. Efron: A 250year argument
Raftery et al, Bayesian Model Averaging , Park and Casella: Bayesian Lasso
Review ,




There will be 6 MiniAnalysis projects. You can work in pairs for these, but not the same pairs. If you prefer to work on your own this is fine too.
You have to hand in slides and be prepared to present results in class. MiniAnalyses are compulsory. You have to present at least 2 projects and I will randomly choose presenters each time. Mondays are MiniAnalysis day.
Your final grade will be based on a takehome final. Question 1 of the final will be an individual writeup of the 6 MiniAnalyses where you can revise and improve the work you did during the course. The Minis count for 50% percent of your final grade and are compulsory. The other questions on the final will be a set of data analysis tasks, one of which is a "miniproject" on a data set of your own choice.
Course requirements
The learning goals of the course can be found in the
course plan.
Assignments
Examination
Examination procedures
In
Chalmers Student Portal you can read about when exams are
given and what rules apply on exams at Chalmers. In addition to that, there is a
schedule
when exams are given for courses at University of
Gothenburg.
Before the exam, it is important that you sign up for the examination.
If you study at Chalmers, you will do this
by the
Chalmers Student Portal, and if you study at University of
Gothenburg, you sign up via
GU's Student Portal,
where you also can read about what rules apply to examination at University of Gothenburg.
At the exam, you should be able to show valid identification.
After the exam has been graded, you can see your results in Ladok by logging on to your Student
portal.
At the annual (regular) examination:
When it is practical, a separate review is arranged. The date of
the review will be announced here on the course homepage. Anyone
who can not participate in the review may thereafter retrieve and
review their exam at the
Mathematical Sciences Student office.
Check that you have the
right grades and score. Any complaints about the marking must be
submitted in writing at the office, where there is a form to fill
out.
At reexamination:
Exams are reviewed and retrieved at the
Mathematical Sciences Student office.
Check that you have the
right grades and score. Any complaints about the marking must be
submitted in writing at the office, where there is a form to fill
out.
Old exams