Short course on Pathogen evolution, selection and immunity
This is all assuming a Wright-Fisher population of discrete non-overlapping generations.
There are a fixed number of N individuals.
Mutations enter the population at a rate of μ mutations per individual per site per generation.
In a haploid population (usually the case for pathogens), we summarize the population with the parameter θ, which equals 2Nμ. Without temporally resolved data, it’s not possible to separately estimate N and μ; in most population genetic circumstances, we can only estimate θ. With temporally resolved data we can separate θ into N and μ.
Genetic diversity is most commonly summarized with the statistic π, which is equal to the average number of mutations per site between two random individuals in the population. π is most commonly measured in terms of substitutions per site. The expectation of π follows
\[E[\pi] = \theta = 2N\mu\]π for Drosophila and π for flu is approximately 0.01, while π for humans is approximately 0.001. This means that for an average length gene of 1000 basepairs, two random fruit flies or two random flues will probably differ at ~10 sites, while two random humans will differ at ~1 site.
The number of unique haplotypes in a sample of n sequences can be estimated from Ewen’s sampling formula. Ewen’s sampling formula gives the probability of observing a1 copies of haplotype 1, a2 copies of haplotype 2, etc… in a sample of n sequences. The sole parameter of the sampling formula is θ. Thus θ is sufficient to predict the entire distribution of haplotype frequencies. The expectation of k unique haplotypes follows:
\[E[k] = \sum^{n}_{i=1} \frac{\theta}{\theta + i - 1}\]With θ = 0.2, there is usually only a single dominant haplotype in the population.
With θ = 1.0, there are generally a small handful of haplotypes.
With θ = 5.0, there are many segregating haplotypes.
A new mutant appears in the population at an initial frequency p of \(1/N\). It has a \(1/N\) of eventually fixing in the population and a \((N-1)/N\) chance of being lost from the population.
Similarly, if a mutant is at population frequency p, then it has a p chance of fixing. At any point in time, looking forwards, the chance of fixation of a neutral mutation is just its frequency.
Conditioned on a neutral mutant fixing, the expected time to fixation is 2N generations. Thus, the rate of population turnover scales inversely with population size. This time to fixation is also a measure of the strength of random genetic drift.
With θ = 1 and N = 20, haplotypes emerge and disappear rapidly.
With θ = 1 and N = 100, population turnover takes approximately 200 generations.
With θ = 1 and N = 500, population turnover occurs slowly.
Each generation we expect Nμ mutations across the entire population. Each mutation has a \(1/N\) chance of fixing. Thus the rate of fixed mutations is \(N \mu \times 1/N = \mu\).
This result, that the rate of neutral divergence across a species is equal to the rate of mutation in a single individual, is a classic finding from Kimura.
Mutation rate μ determines the rate of interspecies divergence, population size N determines the rate of population turnover and their product Nμ determines the level of standing genetic variation.