+, -, *, /, sin etc.

Write a program that approximates the minimal time to perform
 

provided you do many of the same kind. So, we are not interested in measuring the time it takes to do one single addition, say. We would like to know how many additions the FPU can perform per second if we use its pipelining capabilities. It can, of course, be interesting to know the number of clock cycles it takes to perform one single addition, as well (i.e. the number of stages in the add-pipeline). From a computational point of view, the throughput is more interesting, however.

What times did you get? Gflops? Are the values reasonable? Comments?

2015-04-02: Two students have commented on the max clock frequence of the student computers. As one student pointed out, using Intel Turbo Boost Technology the 3.4GHz I mentioned during the lecture can (in the best of cases) be increased to 3.9 GHz, according to:
http://www.intel.com/support/processors/corei7/sb/CS-032279.htm

If you are studying the assembly output, it may be good to know that on the math-computers:

fmuld   is a double precision multiplication using the x87-unit. It can produce a product ever clock cycle.
mulsd   is a vector operation forming the product of two numbers every clock cycle
mulpd   is a vector operation with two pairs, it can form two products every clock cycle.
vmulpd  is a vector operation with four pairs, it can form four products every clock cycle.

The same is true for the corresponding add-instructions. If you are using the Gnu-compilers read about the mtune-flag in the manual page for gcc.

So, it makes a difference if you have use vectorization or not.

For more information about the performance of arithmetic operations, see http://www.agner.org/optimize/instruction_tables.pdf. Look in the handouts to find out the properties of the math-computers.


Back