project_updates

Project comments and updates.

Some of the projects are being changed this year, at least partly. As a consequence, there are bound to be inconsistencies and unclear formulations, for which I apologize. I will, of course, attempt to avoid them, and, if they manage to get past me, clarify them as soon as I am made aware of their existence.
Also, there are some questions that appear naturally while doing the assignments and which I believe many of you may ask yourselves. I will attempt to answer those here as well.

Lab 1 (the newest version is here)

In the second question, there is a sentence that goes “...with a sample mean of size 100”. I have rewritten this part of the question and now it is, hopefully, more comprehensible. See the link above for the updated version of the project.

There are other probability plots than pp/qq – plots. A popular one is the so called “normal probability plot” which the “normplot” function in Matlab does. It is similar to pp/qq and serves the same purpose. You may use it, but in this case you will have to explain in your report what, exactly, is plotted and why the “straight line = normal distribution” argument is valid even here.

If you wonder if you can use C++ rather than C then the answer is yes, you can, but then you are on your own.

In all the plot questions you can do either pp or qq-plot, whichever you want.

In the next-to-last question in task 1, where you simulate your data from a Gamma distribution 1000 times, you have to construct the CI assuming that the data is Normal.

You can write your reports in Swedish if you want.

To use gsl library in C you have to tell so to the compiler. That is, just gcc filename.c -o filename is not enough. Use gcc -lgsl -lgslcblas filename.c -o filename. Run the program with ./filename

If you can't get the Pinv functions in gsl package to work, try adding #include <gsl/gsl_cdf.h> to the header.

Lab 2

If you have problems calculating the expected utility integral numerically with quad, namely you get it equal to zero all the time, set smaller integral limits (like -0.1 to 0.1 or even less). More precisely, those limits should be the places where the function inside the integral begins to level out, so plotting the said function first with fplot helps a lot.

When you do the optimization with fmincon you will probably need to either change the tolerance or blow up your function values (by multiplying it with, say, 1000).

Lab 3

For the second task, you have to repeat all the points of Task 1 (except for the very first one) with the difference being that you do parametric bootstrap instead. That is, you have to both write your own bootstrap and use the boot package.

When doing studentized bootstrap, the one with double boot, do not do 1000 re-samples for both of the bootstrap levels. It is enough to have around 50 for the second one. Also, you can base the estimated coverage of the CIs on, say, 100-200 iterations instead of 1000. With n = c(10,20,...,100), CI number = 100, outerBoot = 1000, innerBoot = 50 it should take about an hour to run.

For the studentized question, I want you to briefly explain what is the probable reason behind the bad CI coverage when the non-studentized intervals are used.

Lab 6

For those of you who wander why the array-changing function in the mex-example is double and not void, the answer is that yes, it should be void. My mistake. It works even with double (the function will return the last element of the list as a default), but making it void is a better practice.

About the Brownian motion task. Since you are required to calculate the integral of a realization of Brownian motion, which is random, you should get different results for that integral all the time. So there is no way to say that this or that value is plausible. To check whether your code is correct, I advice you to plot a sample path (for example in Matlab) and compare it to the graph of the Wiener process that you had in lab 4 (those two are the same thing). If they don't look similar, then your code is probably wrong.

The Expected shortfall. I belatedly realized that the formulation of the question is unclear. “Giving bounds on the error” does not mean that you should have error+- something. It simply means that you have to estimate the MC error as you did in the previous tasks.

The Brownian motion again. Below is the Matlab code for two seemingly logical ways of simulating it:

B1 = @(t,nk)sum(sqrt(8)./pi.*sin(0.5*(2*repmat((0:1:length(nk)-1),...
length(t),1)+1).*pi.*repmat(t,length(nk),1)')./...
(2*(repmat(0:1:length(nk)-1,length(t),1)+1)).*repmat(nk,length(t),1),2);

B2 = @(t,lnk)sum(sqrt(8)./pi.*sin(0.5*(2*repmat((0:1:lnk-1),...
length(t),1)+1).*pi.*repmat(t,lnk,1)')./...
(2*(repmat(0:1:lnk-1,length(t),1)+1)).*normrnd(...
0,1,length(t),lnk),2);

nk = normrnd(0,1,1,1000);
t = linspace(0,1,1000);
lnk = length(nk);

clear b1 b2
b1 = B1(t,nk);
b2 = B2(t,lnk);

figure
subplot(1,2,1), plot(b1);
subplot(1,2,2), plot(b2);

The only difference is that B1 takes in nk while B2 simulates nk. Guess which way is the correct one and why.

For the different variance reduction methods I want you to say something as to why do they actually reduce the variance (or, as may be in some cases, not).