On DNA Statistics in Growing Populations

Andrzej Polanski

Dept. of Automatic Control, Silesian Technical University, Gliwice, Poland

Abstract

Mathematical models for evolution with genetic drift and mutation include Fisher-Wright model for genetic drift with coalescence events modeled by a pure death branching process, and a homogeneous Poisson prosess for mutation. Most mathematical models of interaction between genetic drift and mutation have been studied under the hypothesis that the size (or effective size) of the analyzed population has been constant during the course of evolution. Under this hypothesis, after time long enough an equilibrium is attained and the statistical properties of the DNA polymorphism at the analyzed locus can be found from the model parameters.

However, most populations undergo changes in their sizes in the evolution. This led researchers to study polymorphism of DNA with population size changing in time. The conducted studies revealed difficulties in differentiating between different types of growth and between the effect of growth and other factors like geographic structure or fixation of the fittest allele some time ago. All these factors lead to star-like genealogies in populations and therefore to similar distributions related to DNA polymorphism. Nevertheless efforts to estimate histories of population sizes from DNA statistics are going on and it seems that empirical data concerning distributions of segregating sites in samples drawn from populations, carry the signatures of past population growth.

We present a computational approach to the problem of deriving statistical properties of DNA samples drawn randomly from the polpulation which has been changing its size. We describe the coalescence process in terms of extinction intensity function and we interpret the p.g.f. of the number of segregating sites as the Laplace transform of the extinction intensity function. We study the problem of how is the history of past population size change encoded in the distributions of segregating sites or allelic types. We apply our model to the availabe data on mitochondrial DNA in the human population. We compare this approach to the method based on the branching process model for growing human population.