The introduction of the parameter p in Simon's model is necessary in order to consider the death of individuals in the population, which is the only mechanism leading to the eventual disappearance of surnames. In addition, mortality has immediate consequences in other quantities describing population dynamics. First, the average growth of the population is exponential in time for p < 1,

with v standing for the birth rate per individual and unit time, and the product pv yielding the corresponding death rate.t The quantity N0 is the size of the initial population. In principle, the N0 initial individuals f The relation between the step variable s, which gives the total number of individuals added to the population, and the real time t comes from noticing that the birth frequency is proportional to the total population, such that the elementary increment in time St is inversely proportional to N(t), St(s) = (vN(s))-1. The frequency v fixes time units.

can be distributed among a number of families of different sizes. The initial condition becomes fully specified once the number of surnames initially borne by exactly n individuals, P(n, 0), is known. Polyphyletism corresponds to a situation where P(n, 0) > 1 for at least one value of n > 1. The opposite case, where P(n, 0) = 0 for all n > 1, is to be associated with monophyletism. Note that N0 = J2n nP(n, 0).

The second consequence of mortality is that individuals have a life expectancy 1/vy. During their lifetime, the probability to have m children who inherit their parent's surname turns out to be an exponential distribution of the form

Recall that a is the probability that a new individual introduces a new surname. Though it is usually assumed that the distribution of offspring is Poisson-like, data collected over short periods of time yield distribution of offspring close to exponential,38 thus supporting the use of this model at least in appropriate social contexts.

The third consequence of mortality is that the total number of different surnames in a population might decrease. This situation holds, for instance, when the diversity is high and y changes from small values to values close to one. This represents a situation where the exponential growth stops and the size of a population keeps approximately constant. This is frequent in developed societies, as in Europe nowadays, where the fast growth experienced in the last two centuries has come to a halt.

For y = 0 the dynamical equations describing the process are (6.5) and (6.6), which are completed with an initial condition specifying in this case number and size of the founding families. When mortality is turned on, the update of the population has to be modified in order to include death events. To this end, it is useful to split the dynamics into two sub-steps, as follows. Equations (6.5) and (6.6) are used to yield intermediate values P'(1, s + 1) and P'(n, s +1), and the total population becomes N'(s +1) = N(s) + 1 at the first sub-step. The effect of mortality can be accounted for immediately after growth and mutation are applied, such that the final value for the total population once the update is completed reads p(m) = (1 — a)y[1 + (1 — a)yj m— 1

with w(s) representing a stochastic dichotomic process that takes the value 1 with probability y and 0 with probability 1 — y. The corresponding evolution equation for the abundance of families of size n is

+ N'(s+ 1) [(n + 1)P'(n + S + 1} " np,(n' S + 1)]'

where the bar indicates average over different realizations of the stochastic process. This dynamical equation cannot be solved exactly, though some reasonable assumptions make it possible to obtain approximate solutions. Assuming that the solution varies slowly with n and s, a continuous approximation becomes feasible, where the family size n and the step index s are replaced by continuous variables y and z, respectively.37

A relevant problem when analyzing real data for surname abundance is the typical time required to develop the asymptotic form of the solution in a reasonable range of family sizes, and starting with arbitrary initial conditions.39 Considering that the use of surnames is relatively recent in history, it is important to estimate whether present day societies would be close enough to the asymptotic regime, and thus whether the model can be applied to real situations. A quantitative answer to this question can be obtained by solving the model for surname dynamics using a first-order expansion in the continuous variables y and z. In this approximation, the solution consists of two parts. For y < yD(z),

The family size yD (z) that separates the two parts of the solution grows as time elapses, and is directly related to the genealogical depth of the population. As a function of real time, yD(t) = exp[v(1 — a — y)t]. This means that the transient time needed to observe the asymptotic regime (dominated by a power-law with exponent Z) in the family size distribution is logarithmic in the family size, t0 < ln y0. This explains why many real distributions of surname abundance are well described by the asymptotic solution in a broad range of values, even if the genealogical depth of most systems seems relatively small.

A more accurate solution to the problem with mortality is obtained by using a second-order expansion of Eq. (6.13). It reads

where U(a, b, x) is the logarithmic Kummer's function.40 For large family sizes, y ^ ro, this solution again predicts a power-law behavior of the form n(y, z) <x y -z. The exponent Z, defined in Eq. (6.15), presents two relevant limits. First, for n = 0 the known solution for Simon's model, Eq. (6.7), is recovered. Second, the limit a ^ 0 always converges to Z = 2, irrespectively of the value of n. For small family sizes, Eq. (6.18) yields a probability lower than in the case n = 0. This downward bending of the distribution of surname abundance at small sizes is in agreement with field data. Figure 6.7 represents several sets of data and the corresponding fits obtained from Eq. (6.18).

A similar continuous approximation to calculate frequency distributions in processes with birth, death, and mutation, yields a solution for this problem equivalent to Eq. (6.18).41 When that solution was used to fit the distribution of surnames in several European countries and in the USA, a good agreement between data and theoretical prediction was obtained. This reinforces the idea that the genealogical depth of those relatively young systems suffices to be close enough to the asymptotic, power-law regime.

The case n = 1 deserves some separate comments, since in this limit the qualitative properties of the system change. This situation corresponds to populations that are stationary in size N(s) = No, where the number of births equals the number of deaths. This model was used in the context of genetic inheritance to study the probability of fixation of alleles:42 Moran's model is analogous to Simon's model in populations of constant size. Eventually, the diversity supported by a population of constant size will reach a constant value, though the transient until this regime sets in depends, as it does for n < 1, on the initial condition. Further, it turns out that, for constant populations, the functional form of the surname abundance distribution changes with the actual values of the parameters: the solution to the dynamical equations depends on how the product aN0 compares with

family size

Fig. 6.7. Frequency of appearance of families with a given size. Data for Argentina correspond to almost 350,000 surnames in the whole 1996 Argentinian telephone book; for Berlin, 6400 surnames beginning by A in the 1996 telephone book have been used; data for Japan are adapted from Miyazima et al. (2000).

family size

Fig. 6.7. Frequency of appearance of families with a given size. Data for Argentina correspond to almost 350,000 surnames in the whole 1996 Argentinian telephone book; for Berlin, 6400 surnames beginning by A in the 1996 telephone book have been used; data for Japan are adapted from Miyazima et al. (2000).

unity. If mutation is frequent enough such that aNo > 1, the asymptotic distribution of family sizes is exponential,

n and the stationary number S of different surnames is

Was this article helpful?

## Post a comment