Diary

This page will contain a diary of the course, i.e. a short description of the lectures. The table will be filled in as we go along.


How to register (a PDF-file)

Lecture notes

Introduction to C, tcsh and bash

springer.pdf (read if you like)

Some old handouts, I do not lecture about these topics any longer (read if you like):
Process control under unix, interprocess communication, nonblocking communication

Day Activity Comments
 Tue Introduction. Course-adm. Registration. First part of programming languages for HPC. OH-pages 1-16. No lab today
Thu Fortran, OH-pages 17-26.  
Tue LDA in Fortran77, dangerous things in C and Fortran. Make. Computer architecture. OH-pages 27-44.
Thu Computer architecture. OH-pages 45-62. Read the free chapter in Triebel's book. Fetch the "Architecture Optimization Reference Manual", from this page. Skim through chapter 2, "INTEL® 64 AND IA-32 PROCESSOR ARCHITECTURES". My lecture does not cover all the terminology used in the above manuals, so it would be hard to understand every small detail.

"Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel and AMD CPU's" you can find on this page.

Typo page 53, penultimate line: Division is still slow.
Tue Virtual memory, code optimzation. OH-pages 63-78. Typo page 77, the prototype for add. Change int n to double f, int n .
Thu Note in the handouts: Specifications of the student machines (see under Comments to the right). Hints to one lab. More about aliasing.

Code optimization contd. OH-pages 79-86.
The model is: Intel Core i5-650 (4M Cache, 3.20 GHz) http://ark.intel.com/Product.aspx?id=43546 for some technical details.

I fetched a new and better cpuid-code from: http://linux.softpedia.com (search for cpuid) and ran it on a student machine. Here are some of the results.

The student machines have a maximum clock frequency of 3.2 GHz. They have two cores with hyper-threading, two threads each (/proc/cpuinfo and top list four cores). Here something about the cache:

      L1 instruction cache: 32K, 4-way, 64-byte lines
      L1 data cache: 32K,  8-way, 64 byte lines
      L2 cache: 256K,  8-way, 64 byte lines
      L3 cache: 4M, 16-way, 64 byte lines
Tue More on code optimization. Handed out a quick ref quide (PostScript) for the BLAS and talked about it some time. OH-pages 87-101.
Thu Profiling. Valgrind. PAPI, gprof, gcov. Calling a Lapack-routine from Fortran. Calling a Lapack-routine from C. OH-pages 102-125. I have added some information for the Lapack- and MEX-labs (how to get things working on the student machines).
I have updated OH-page 59 (details about the student machines) and page 133 (Mex-files on the student machines). Changed the last sentence on page 128 as well.
Update 2011-04-29: I have updated the threads-example in the handouts so it works on the 64-bit student system (pages 170-173).
Tue Mexfiles and libraries. OH-pages 126-133, 137-145. Returned lab1. If you were not there, you can find the lab in the plastic magazine holder outside my office. G (for Godkänt) means a passing grade (there are ony pass/no pass on the labs).
OH-page 130 is updated, to take care of the case when bandsolve is called with only one output argument.
Thu Answers to two questions (a text file).
Tar-files. Intro. to parallel computing. OH-pages 146-167.
May 5 at 18:50. Updated the text for the inlining lab.
Added a hint to the Lapack-lab (about how to link with the RedHat Lapack-library).
Tue Deadlock and other communication issues. POSIX-threads, intro. to PVM and MPI. OH-pages 166-183.
Thu The rest of MPI and the beginning of OpenMP. OH-pages 184-207. Bring your calendar next time. We are going to decide the exam-schedule.
Tue More OpenMP, OH-pages 208-227. Changed last due date to May 20.
Handed back lab 2 today.
Decided exam dates. I have put up a booking schedule, outside my office, where you can sign up for a time.

On OH-page 227 we need to add a barrier, as I said during the lecture. So
add the following two lines (above the comment // Add the partial...)

   // Must wait for all partial sums to be ready
    #pragma omp barrier


Answer to two questions:

Reduction variables and subtraction (a text-file).

Are there other HPC-courses? Here is an incomplete list:
  • A link to the PDC Summer School at KTH.
  • Courses at HPC2N (Umeå).
  • A course in Denmark about HPC on GPU:s (graphics processors). Some of the HPC-students took this course in 2010. This is the textbook used in 2010: David B. Kirk, Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, 2010.
ThuThe rest of OpenMP. Did not have time with the second case study. OH-pages 228-243.Course evaluation.