Randomiserade algoritmer - projektförslag

Proposals for PROJECTS in the course Randomized Algorithms

As part of the course, one project should be carried out, individually or in groups of 2-3 students.

Some projects invove the construction and implementation of some randomized algorithms. Other, more theoretical, projects need not involve any such parts. In the following list of project proposals, the two types of projects are signified with -P- (programming) and -L- (liteature study), respectively. Of course, there is no strict borderline between the two types of projects.

Note that suggestions for other projects, not on this list, are very welcome. Inspiration for other projects may, for instance, be found in the recommended side-reading ``Randomized Algorithms'' by Motwani and Raghavan.

Each student is required to have selected a project (after consulting me) no later Wednesday, April 10. The projects are to be reported both in writing and in a short oral presentation. We devote a whole day to oral presentations on Tuesday, May 21, at time 10:00-11.45 (in S4, Matmatiskt Centrum) and 13:15-17:00 (in MD9, Matematiskt Centrum), and deadline for handing in a written report is on the same day. However, an earlier hand-in of the written report is advisable, since you can then get some input that may be useful in the oral presentation. In that case, send the report to Olle Häggström, Institut Mittag-Leffler, Auravägen 17, 182 60 Djursholm.

In cases where projects are done in a group of students, all members of the group are required to take part in the oral presentation. The length of the presentation may vary somewhat depending on the type of project; a suitable length for projects of type -P- may be 10-15 minutes, whereas more theoretical projects of type -L- may need 15-25 minutes. For projects done in groups of 2-3 students, about 10 minutes per student is a reasonable length of the presentation.

Project presentations may be done in the language of your choice (as long as it is Swedish or English!).

Many of the projects have potential for being expanded into a Master's thesis (exjobb). Some of them may even provide the seed for a licentiate or Ph.D. thesis. A few of the projects are closely related to reserch that is under way in this building. For instance, hard-core and Ising models are two of the main objects of my own (O.H.'s) research, and these are studied in Projects 4, 5, 6, 7 and (to some extent) 12. Projects 7, 8 and 9 deal with perfect simulation, which is another active research area here. Project 16 is, of course, closely related to the work of Devdatt Dubhashi at the CS department.

1. -P- Explain how randomized algorithms come into play in connection with factorization and prime testing (which are important in cryptography). Useful references are Chapter 14 in Motwani och Raghavan: ``Randomized Algorithms'', and Johan Håstad's ``Notes for the course advanced algorithms''.

2. -P- Compare, theoretically or experimentally, different methods for placing points randomly in the Monte Carlo method for volume estimation. Two main approaches is to throw them out completely at random (independently of each other), or according to a randomly shifted lattice.

3. -L- Study how a refined analysis of the running time of RandomQuicksort can be carried out, leading to a central limit theorem for the running time. See ``A limit theorem for Qicksort'' and other papers by Uwe Rösler. See also McDiarmid, C: Large deviations for Quicksort, Journal of Algorithms 21 (1996), 476-507.

4. -P- In the generalized hard-core model (see Problem 7.4 in the lecture notes) on large parts of the square lattice, qualitatively different behavior is obtained depending on the parameter value - a so-called phase transition. If the parameter is large, the system settles down to a chessboard pattern, with (just about) every second vertex occupied, whereas no such order appears for small parameters. Determine (approximately) the critical value for this phase transition, using MCMC methods.

5. -P- Do a similar study as in Project 4, except on the so-called Ising model for ferromagnetism, instead of on the hard-core model. For the Ising model, a particularly efficient MCMC algorithm, known as the Swendsen-Wang algorithm, is available, see, e.g., the paper ``The random geometry of equilibrium phases'' which can be found here.

6. -P- For the statistically inclined student: Construct a pseudolikelihood-based EM-algorithm for estimation in a partially observed Ising model.

7. -P- Read how Propp and Wilson, in their original paper ``Exact sampling with coupled Markov chains and applications to statistical mechanics'', which can be found here, were able to apply their algorithm to the Ising model. Then implement it.

8. -L- An interesting alternative to the Propp-Wilson algorithm for perfect simulation is ``Fill's algorithm'' (Fill, J: An interruptible algorithm for perfect sampling via Markov chains, Annals of Applied Probability 8 (1998), 131-162). Explain this algorithm, its properties, and the theory behind it.

9. -L- This is similar to Project 8, except that it concerns the ``randomness recycler'', which was recently introduced by Jim Fill and Mark Huber, and which is yet another interesing alternative to the Propp-Wilson algorithm.

10. -P- Implement a simulated annealing algoritm for the traveling salesman problem, and try it out on a real case, such as (say) the 50 biggest towns in Sweden, with distances taken from some suitable road atlas.

11. -P- Build a simulated annealing algorithm for the following problem: At department X at the company Y, there are great problems with personal differences, and there are therefore plans to split X into two separate departments X1 and X2. We are given a matrix of zeroes and ones, which for each pair of coworkers tells whether they can stand each other or not. How should X be split in order to minimize the total amonunt of disagreement within each of the two subdepartments?

12. -P- Another problem for which simulated annealing is suitable, is packing of as many balls (of fixed radius) as possible in some rectangular space. Note that in this case, we no longer have a finite state space to optimize over (since each ball has infinitely many possible positions). The theory in Chapter 12 of the lecture notes therefore need to be extended somewhat, via an interesting limiting procedure.

13. -L- An interesting way of obtaining algorithms that are good at playing some game, such as chess or bridge, is via evolutionary methods or genetic programming. Let a large number of programs play against each other. The next generation of programs consists of (mutated) children of the previous generation, where winners get more children than losers. This project may consist of doing this for some simple game, such as noughts and crosses (luffarschack).

14. -L- Discuss the modern theory of pseudo-random number generators, using, e.g., the references given in my lecture notes.

15. -L- Discuss some examples of the use of randomization in parallel programming. A suitable starting point could be Chapter 12 in Motwani och Raghavan: ``Randomized Algorithms''; one could then perhaps go on with some of the research articles referred to there.

16. -L- Devdatt Dubhashi at the CS department is, together with Alessandro Panconesi, working on the book project ``Concentration of Measure for Computer Scientists'', which deals with, among other things, the distribution of the running time of randomized algorithms. A preliminary version can be found via their home pages. Discuss suitable parts of the contents of this book.

17. -L- An area which is closely related to randomized algorithms, is probabilistic (average case) analysis of deterministic algorithms with random data. A possible ``testing ground'' for such problems is the theory of random graphs. Discuss (parts of) this area. A ueful reference is Frieze, A. och McDairmid, C: Algorithmic theory of random graphs, Random Structures and Algorithms 10 (1997), 5-42.

18. -L- 3-SAT is a problem of fundamental importance in computer sciences, an particular the theory of NP-completeness. 3-SAT with random input has recently been a popular research area. Summarize some of what has been done! The literature in this area is (to my knowledge) rather spread out, but some references can, e.g., be found via the homepages of Svante Janson and Dimitris Achlioptas.

19. -L- Probabilistic methods for web search - a very hot area! Some information, and useful links, can be found here. Or see the paper ``Random walks with back buttons'' by eight (!) coauthors, available at the homepage of one of them, Madhu Sudan.

Last modified: Thu Apr 18 10:28:53 MET DST 2002

	1. -P- Explain how randomized algorithms come into play in connection with factorization and prime testing (which are important in cryptography). Useful references are Chapter 14 in Motwani och Raghavan: ``Randomized Algorithms'', and Johan Håstad's ``Notes for the course advanced algorithms''.
	2. -P- Compare, theoretically or experimentally, different methods for placing points randomly in the Monte Carlo method for volume estimation. Two main approaches is to throw them out completely at random (independently of each other), or according to a randomly shifted lattice.
	3. -L- Study how a refined analysis of the running time of RandomQuicksort can be carried out, leading to a central limit theorem for the running time. See ``A limit theorem for Qicksort'' and other papers by Uwe Rösler. See also McDiarmid, C: Large deviations for Quicksort, Journal of Algorithms 21 (1996), 476-507.
	4. -P- In the generalized hard-core model (see Problem 7.4 in the lecture notes) on large parts of the square lattice, qualitatively different behavior is obtained depending on the parameter value - a so-called phase transition. If the parameter is large, the system settles down to a chessboard pattern, with (just about) every second vertex occupied, whereas no such order appears for small parameters. Determine (approximately) the critical value for this phase transition, using MCMC methods.
	5. -P- Do a similar study as in Project 4, except on the so-called Ising model for ferromagnetism, instead of on the hard-core model. For the Ising model, a particularly efficient MCMC algorithm, known as the Swendsen-Wang algorithm, is available, see, e.g., the paper ``The random geometry of equilibrium phases'' which can be found here.
	6. -P- For the statistically inclined student: Construct a pseudolikelihood-based EM-algorithm for estimation in a partially observed Ising model.
	7. -P- Read how Propp and Wilson, in their original paper ``Exact sampling with coupled Markov chains and applications to statistical mechanics'', which can be found here, were able to apply their algorithm to the Ising model. Then implement it.
	8. -L- An interesting alternative to the Propp-Wilson algorithm for perfect simulation is ``Fill's algorithm'' (Fill, J: An interruptible algorithm for perfect sampling via Markov chains, Annals of Applied Probability 8 (1998), 131-162). Explain this algorithm, its properties, and the theory behind it.
	9. -L- This is similar to Project 8, except that it concerns the ``randomness recycler'', which was recently introduced by Jim Fill and Mark Huber, and which is yet another interesing alternative to the Propp-Wilson algorithm.
	10. -P- Implement a simulated annealing algoritm for the traveling salesman problem, and try it out on a real case, such as (say) the 50 biggest towns in Sweden, with distances taken from some suitable road atlas.
	11. -P- Build a simulated annealing algorithm for the following problem: At department X at the company Y, there are great problems with personal differences, and there are therefore plans to split X into two separate departments X1 and X2. We are given a matrix of zeroes and ones, which for each pair of coworkers tells whether they can stand each other or not. How should X be split in order to minimize the total amonunt of disagreement within each of the two subdepartments?
	12. -P- Another problem for which simulated annealing is suitable, is packing of as many balls (of fixed radius) as possible in some rectangular space. Note that in this case, we no longer have a finite state space to optimize over (since each ball has infinitely many possible positions). The theory in Chapter 12 of the lecture notes therefore need to be extended somewhat, via an interesting limiting procedure.
	13. -L- An interesting way of obtaining algorithms that are good at playing some game, such as chess or bridge, is via evolutionary methods or genetic programming. Let a large number of programs play against each other. The next generation of programs consists of (mutated) children of the previous generation, where winners get more children than losers. This project may consist of doing this for some simple game, such as noughts and crosses (luffarschack).
	14. -L- Discuss the modern theory of pseudo-random number generators, using, e.g., the references given in my lecture notes.
	15. -L- Discuss some examples of the use of randomization in parallel programming. A suitable starting point could be Chapter 12 in Motwani och Raghavan: ``Randomized Algorithms''; one could then perhaps go on with some of the research articles referred to there.
	16. -L- Devdatt Dubhashi at the CS department is, together with Alessandro Panconesi, working on the book project ``Concentration of Measure for Computer Scientists'', which deals with, among other things, the distribution of the running time of randomized algorithms. A preliminary version can be found via their home pages. Discuss suitable parts of the contents of this book.
	17. -L- An area which is closely related to randomized algorithms, is probabilistic (average case) analysis of deterministic algorithms with random data. A possible ``testing ground'' for such problems is the theory of random graphs. Discuss (parts of) this area. A ueful reference is Frieze, A. och McDairmid, C: Algorithmic theory of random graphs, Random Structures and Algorithms 10 (1997), 5-42.
	18. -L- 3-SAT is a problem of fundamental importance in computer sciences, an particular the theory of NP-completeness. 3-SAT with random input has recently been a popular research area. Summarize some of what has been done! The literature in this area is (to my knowledge) rather spread out, but some references can, e.g., be found via the homepages of Svante Janson and Dimitris Achlioptas.
	19. -L- Probabilistic methods for web search - a very hot area! Some information, and useful links, can be found here. Or see the paper ``Random walks with back buttons'' by eight (!) coauthors, available at the homepage of one of them, Madhu Sudan.