Project comments and updates.
Some of the projects are being changed
this year, at least partly. As a consequence, there are bound to be
inconsistencies and unclear formulations, for which I apologize. I
will, of course, attempt to avoid them, and, if they manage to get
past me, clarify them as soon as I am made aware of their existence.
Also, there are some questions that
appear naturally while doing the assignments and which I believe many
of you may ask yourselves. I will attempt to answer those here as
well.
Lab 1 (the newest version is here)
- In the second question, there is a
sentence that goes “...with a sample mean of size 100”. I have
rewritten this part of the question and now it is, hopefully, more
comprehensible. See the link above for the updated version of the
project.
- There are other probability plots than
pp/qq – plots. A popular one is the so called “normal probability
plot” which the “normplot” function in Matlab does. It is
similar to pp/qq and serves the same purpose. You may use it, but in
this case you will have to explain in your report what, exactly, is
plotted and why the “straight line = normal distribution”
argument is valid even here.
- If you wonder if you can use C++
rather than C then the answer is yes, you can, but then you are on
your own.
- In all the plot questions you can do
either pp or qq-plot, whichever you want.
- In the next-to-last question in task 1,
where you simulate your data from a Gamma distribution 1000 times,
you have to construct the CI assuming that the data is Normal.
- You
can write your reports in Swedish if you want.
- To
use gsl library in C you have to tell so to the compiler. That is,
just gcc filename.c -o filename is not enough. Use gcc
-lgsl -lgslcblas filename.c -o filename. Run the program with
./filename
- If you can't get the Pinv functions in gsl package to work, try adding #include <gsl/gsl_cdf.h> to the header.
Lab 2
- If you have problems calculating the
expected utility integral numerically with quad, namely you get it
equal to zero all the time, set smaller integral limits (like -0.1 to
0.1 or even less). More precisely, those limits should be the places
where the function inside the integral begins to level out, so
plotting the said function first with fplot helps a lot.
- When you do the optimization with
fmincon you will probably need to either change the tolerance or blow
up your function values (by multiplying it with, say, 1000).
Lab 3
- For the second task, you have to repeat
all the points of Task 1 (except for the very first one) with the
difference being that you do parametric bootstrap instead. That is,
you have to both write your own bootstrap and use the boot package.
- When doing studentized bootstrap, the
one with double boot, do not do 1000 re-samples for both of the
bootstrap levels. It is enough to have around 50 for the second one.
Also, you can base the estimated coverage of the CIs on, say, 100-200
iterations instead of 1000. With n = c(10,20,...,100), CI number = 100, outerBoot = 1000, innerBoot = 50 it should take about an hour to run.
- For the studentized question, I want
you to briefly explain what is the probable reason behind the bad CI
coverage when the non-studentized intervals are used.
Lab 6
- For
those of you who wander why the array-changing function in the
mex-example is double and not void, the answer is that yes, it should
be void. My mistake. It works even with double (the function will
return the last element of the list as a default), but making it void
is a better practice.
- About the Brownian motion task. Since
you are required to calculate the integral of a realization of
Brownian motion, which is random, you should get different results
for that integral all the time. So there is no way to say that this
or that value is plausible. To check whether your code is correct, I
advice you to plot a sample path (for example in Matlab) and compare
it to the graph of the Wiener process that you had in lab 4 (those
two are the same thing). If they don't look similar, then your code
is probably wrong.
- The Expected shortfall. I belatedly
realized that the formulation of the question is unclear. “Giving
bounds on the error” does not mean that you should have error+-
something. It simply means that you have to estimate the MC error as
you did in the previous tasks.
- The Brownian motion again. Below is the
Matlab code for two seemingly logical ways of simulating it:
B1 =
@(t,nk)sum(sqrt(8)./pi.*sin(0.5*(2*repmat((0:1:length(nk)-1),...
length(t),1)+1).*pi.*repmat(t,length(nk),1)')./...
(2*(repmat(0:1:length(nk)-1,length(t),1)+1)).*repmat(nk,length(t),1),2);
B2 =
@(t,lnk)sum(sqrt(8)./pi.*sin(0.5*(2*repmat((0:1:lnk-1),...
length(t),1)+1).*pi.*repmat(t,lnk,1)')./...
(2*(repmat(0:1:lnk-1,length(t),1)+1)).*normrnd(...
0,1,length(t),lnk),2);
nk = normrnd(0,1,1,1000);
t = linspace(0,1,1000);
lnk = length(nk);
clear b1 b2
b1 = B1(t,nk);
b2 = B2(t,lnk);
figure
subplot(1,2,1), plot(b1);
subplot(1,2,2), plot(b2);
The only difference is that B1 takes in
nk while B2 simulates nk. Guess which way is the correct one and
why.
- For the different variance reduction
methods I want you to say something as to why do they actually reduce
the variance (or, as may be in some cases, not).