# Projects (preliminary version)

Projects are to be carried out in groups, typically with 2-4 students in each group, or individually. A good option is that you have a project involving images from an application that you have some background information on. But projects can also be chosen with image data from current or previous projects at the Mathematical statistics department or with data from the Internet.

Below follows a (preliminary) list of some possible projects for the course spring 2007. The list is an updating of lists from previous years, which is the reason for the fuzzy numbering of the projects, and some of the projects may be difficult to get data for.

Projects 54-- are new for this year.

1. Six projects on analysis of spotted microarray images, see 1a-1e below. Data for projects 1a-1f: Image data together with text files from a commercial analysis program are available for the Arabidopsis experiment described on pages 14-15 in the notes for the course and in more detail in Ekstrom CT, Bak S, Kristensen C & Rudemo M (2004) Spot shape modelling and data transformation for microarrays. Bioinformatics 20, 2270-2278. Possibly other microarray data sets will also be available.

1a. Localization of spot centres in spotted microarrays. In the analysis of spotted microarray the foreground values are typically estimated by adding pixel intensities in a circle around the spot centre. Precise localization of spot centres may improve the foreground estimate, cf. Ekstrom et al. (2004). If time permits it would be interesting to analyse the deviations from a rectangular grid which presumably is aimed at during spot printing.

1b. Estimation of spot diameters in spotted microarrays. The polynomial-hyperbolic model in Ekstrom et al. (2004) may be used to estimate diameters of spots. To estimate foreground intensities one may sum all pixel values within a certain fraction of the corresponding circular disc. The fraction may be optimised by minimizing the mean square deviation between replicated arrays.

1c. Nonparametric estimation of a radially symmetric spot shape function for spotted microarrays. As an alternative to the polynomial-hyperbolic spot shape model in Ekstrom et al. (2004) one can assume that the spot shape is rotationally symmetric and specified by a function of the distance to the spot centre. This function may be estimated by least squares or maximum likelihood.

1d. Optimal correction for background in spotted microarrays. Sometimes it is recommended to subtract background and sometimes it is recommended to disregard background subtraction. A compromise could be to subtract a fraction of the background, and this fraction could be optimised by minimizing the mean square deviation between replicated arrays.

1e. Pixel synchronisation of the two channels in spotted microarrays. To get a pixel-wise synchronisation of the two channels one may resample one of the channels. This problem is related to accurate spot centre determination.

1f. Estimation of spot shape by use of principle components. Apply the methods described in: Estimation of expression levels in spotted microarrays with saturated pixels - Glasbey, C.A., Forster, T. and Ghazal, P., see http://www.bioss.sari.ac.uk/~chris/ to the Arabidopsis data set.

1g. Quality control of spotted microarrays. Sometimes spots in microarrays are irregular, and such spot could be discarded or down-weighted in the subsequent statistical analysis. The object in this project could be to discover irregular spots and perhaps to quantify the degree of irregularity.

2. Cluster analysis of microarray data. See Dudoit S and Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 2002 Jun 25; 3(7) Research0036.1-21, where also addresses to datasets may be found, and Eisen MB, Spellman PT, Brown PO and Botstein D Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863-14868.

3. Discrimination methods with microarray data. See Dudoit S, Fridlyand J and Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, Vol. 97, No. 457, p. 77--87, Tech report Dep. Statistics Univ. Calif. Berkeley #576, where also addresses to datasets may be found.

4. Diffusion coefficient estimation from fluorescence recovery after photobleaching (FRAP). Data and problem formulation in projects 4a and 4b. come from Niklas Lore'n at SIK (The Swedish Institute for Food and Biotechnology). In the FRAP technique fluorescent dye molecules are photobleached in a limited region, for instance within a circular disc. Images showing diffusion of unbleached dye molecules back into this region can be used to measure the diffusion coefficient. Literature:Van Keuren E and Schrof W (2003) Fluorescence recovery after two-photon bleaching for the study of dye diffusion in polymer systems. Macromolecules 36, 5002-5007.

4a. Localization of photobleaching spot centres and radial profile shapes . This is similar to projects 1a and 1c.

4b. Estimation diffusion coefficient from a series of photobleaching images.

5-7. 2D electrophoresis. See John Gustafsson's list and Gustafsson, J.S., Blomberg, A. and Rudemo, M. (2002) Warping two-dimensional electrophoresis gel images to correct for geometric distortions of the spot pattern. Electrophoresis, 23, 1731-1744. Tech report 2001:73, November 2001.

5. Saturated pixel values in 2D gel images. Very similar to the microarray case, compare Ekstrom et al. (2004).

6. Spot models for 2D gel images. Can we find a better model for the spots than the standard bivariate Gaussian densities?

7. Area scale compensation in warping of quantitative images. Can we compensate for the change in spot quantities when we warp?

8. Tracking particles. See Mats Kvarnström's licentiate thesis: Estimating Diffusion Coefficients in Colloidal Particle Systems. A sequence of images may be found here. Related problems have been discussed by Edward Fäldt in the second part of his masters thesis: Image Analysis Methods for Drosophila Research.

9. Identifying shape of yeast cells. Identify contour and tagged proteins in images of yeast cells.

10. Motion in image sequences. See Jan Wegger (2001) Estimation of motion in image sequences. Master thesis, November 2001.

11. Description of fingerprints as collections of curve segments. See Setterberg (2001) Vectorisation of fingerprint images using local direction estimates. Master thesis, September 2001.

12. Image modeling with Markov chain Monte Carlo.

15. Knot patterns in wood.

16. Real time computation of optical flow.

17. Aerial photographs of a forest.

21. Statistical image analysis projects at AstraZeneca R&D. See the list from Anders Klint.

21a. Identification and measurement of blood vessel cross-sections in ultrasonic images.

21b. Robust general mosaic reconstruction of microscopic images.

22. Identification of digits from car number plates. Noisy images from car number plates are acquired from a road checkpoint. Find a pattern recognition method to identify the digits and perhaps also letters, and evaluate its performance.

24. Identification of letters extracted from images.

25. Segmentation of plaque images.

26. Warping of images. Warping of images similar to the fish images in A penalized likelihood approach to image warping - Glasbey, C.A.and Mardia, K.V., 2001 (Journal of the Royal Statistical Society, Series B, 63, 465-514).

27. Classification of cells in yeast images.

28. Identification of digits. Noisy images of digits and possibly also letters from car number plates are acquired by a digital camera. Find a pattern recognition method to identify the digits and evaluate its performance for a suitable data set.

29. Analysis of skin images.

30. Counting individuals in images.

31. When are humans better than computers in identifying letters and digits?

32. Orientation of pronuclei (from Torbjörn Lundh).

33. Comparison of eye images from wild type and mutated animals.

34. Identification of digits in noisy car number plates. Images of digits and possibly also letters from car number plates are acquired by a digital camera. For each plate a series of images with more and more noise (dirt) is acquired. Find a pattern recognition method to identify the digits, evaluate its performance, particularly with regard to how much noise can be tolerated.

35. Chili fruit discrimination. Use of size, colour and shape to discriminate between different varieties of chili fruit.

36. Transformation of grey scale rock carving photos to binary images. For some rock carvings also 3D-scanned images are available.

37. Transformation of rock carving frottage images to grey scale or binary images. A frottage image is obtained by placing a sheet of paper or other material on the rock carving surface and an image is made by rubbing over the surface with a pencil or crayon to identify lines and shapes.

38. Rock carving similarity. Try to find out if it is possible to introduce a similarity measure for rock carvings based on suitable image features. Such a measure might possibly be used to construct a tree diagram for evolution of rock carvings.

39. Characterization of fat cell populations. Characterize a population of fat cells by the distribution of cell size. First identify the different cells, then measure their size and estimate the distribution of cell size. How many cells are needed to get a reasonable distribution estimate? Characterize possibly also deviation from circular form.

40. Microscopic image analysis of bacteria. Differentiate between Gram-positive and Gram-negative bacteria by use of image analysis methods. (Usually this is done by visual perception.)

41-43. Yeast cell image sequence from Mats Kvarnström. There are two types of images (brightfield and fluorescent) at 10 time points (20 minutes between consecutive time points) and at 10 depths (z-scans). The fluorescense comes from a GFP molecule coupled to an enzyme, Gpd1, which is loalized in organelles called peroxisomes. Peroxisomes are found in eucaryote cells and take care of poisonous substances in the cell. The problems to be studied by use of the images are the distribution and migration of peroxisomes during the cell cycle, in particular during cell division (for instance, the relation of the number of peroxisomes in mother and bud cells). The use of the brightfield images is essentially only to locate the cells and to find cells that do not show any fluorescence.

41. Find and track the peroxisomes in some cells.

42. Describe motion of cells and cell clusters. The motion consists mainly of translation and rotation, while positions within clusters essentailly remain constant.

43. Find and track the the cell contours and cell centres in the brightfield images.

44. Identification of traffic signs for velocity. Obtain with a digital camera images with and without! traffic signs for velocity. Test if there is a traffic sign for velocity in the image. If there is one, identify the speed limit.

45. Finding comets in astronomy images.

46. Tracking of Littorina.

47a. Face detection. Try to find faces in images. See Samuel Englund's masters thesis for references and suggestions. Use also Google and search for "face detection" and "face databases".

47b. Face recognition. Try to recognize faces in a suitable face database available from internet (or in a database that you construct by use of a digital camera). Use Google to search for "face recognition" and "face databases".

48. Identification by use of ear images. See http://bias.csr.unibo.it/maltoni/handbook/chapter_1.pdf, particularly pages 8-12 for use of ear images for identification by use of different methods including use of ear images.

49. Clustering algorithms for colour image segmentation.

50. Identification/discrimination of cancer cells.

51. Analyse fingerprint images from FVC (Fingerprint Verification Competition). Download fingerprint images from FVC2000: The First International Fingerprint Verification Competition: http://bias.csr.unibo.it/fvc2000/. For example data set DB1_B: DB1_B.zip (5.4MB) which consists of 8 images (of size 300x300) of each of 10 fingers.

51a. Discrimination of fingerprints. Discrimination of fingerprints with given classes (10 classes in dataset DB1_B). Try for instance to discriminate by use of suitable features.

51b. Warping of fingerprints. Warping of fingerprint images. Try to warp fingerprint images into each other by use of suitable warping transformations, for instance: (i) rotation and translation, (ii) Procrustes transformation, (iii) general linear transformation or possibly (iv) joined bilinear transformations. Compare Ross et al. (2006) Fingerprint warping using ridge curve correspondences. IEEE Trans PAMI 28, 19-30.

52. Classification of images of skin cancer.

53. Identification of handwritten digits. Use for instance the MNIST handwritten digit database of Yann LeCun and Corinna Cortes, Courant Institute, NYU, available from yann.lecun.com/exdb/mnist/

54. Identification of fingerspelling images

55. Optical scanning of documents, find text and store as ASCII-code.

56. Segmentation of MR images

57. Identification of 3D structure from TM images

58. Estimation of diffusion coefficient from FRAP images

59. Profile and spot centre identification from FRAP images

60. Discrimination between scratches and veins in hide images.

61. Spatial distribution of errors in hide images

62. Identification of traffic signs for with names of places. This project is similar to project 44. Obtain with a digital camera images with and without! traffic signs with names of places. Test if there is a traffic sign with a name of a place in the image. If there is one, identify the name of the place.

63. Tracking the football in a series of images of soccer football. Try to follow the football in a series of images. Start perhaps by manually identifying the football in the first image. In the sequel use for instance the last image (or the last two or even the last three images) to find the the football in the following image. This project is similar to project 8, where one should track particles performing Brownian motion. However, in project 8 it is enough to use the previous image (and not two or more images) to find a particle in the next image, due to the Markov character of Brownian motion.

64. Scanning of scatter plots. From an image of a scatter plot, try to reconstruct the data pairs shown in the plot.

65. Discriminate between mouse strains using microarray data. Using data from 34 microarrays from two strains of mice, try to find genes and combinations of genes that give a good discrimination between the two strains.

66. Clustering subjects using microarray data. By use of data from 86 microarrays, two from each of 43 subjects, try to cluster the arrays, and see how well you can find the clusters corresponding to the 43 subjects. Perhaps you will also find some other clusters. Try clustering algorithms such as hierarchical clustering and K-means clustering.

67. Classification of pixels in AIRSAR images. Try to classify pixels in ocean ice AIRSAR (Airborne Synthetic Aperture Radar) images. Search Google with "airsar" for information on the technique. Try to classify pixels into classes: (i) open water, (ii) first year ice, (iii) multiyear ice, (iv) brash ice et cetera. Extensive references on classification techniques for this kind of data can for instance be found in IEEE Transactions on Geoscience and Remote Sensing.