Estimating genetic drift and effective population size from

MEC676.fm Page 1171 Wednesday, June 30, 1999 3:02 PM Molecular Ecology (1999) 8, 1171–1178 Estimating genetic drift and effective population size from temporal shifts in dominant gene marker frequencies Blackwell Science, Ltd P. E . J O R D E , S . P A L M and N . RY M A N Division of Population Genetics, Stockholm University, S-10691 Stockholm, Sweden Abstract Measurement of temporal change in allele frequencies represents an indirect method for estimating the genetically effective size of populations. When allele frequencies are estimated for gene markers that display dominant gene expression, such as, e.g. random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) markers, the estimates can be seriously biased. We quantify bias for previous allele frequency estimators and present a new expression that is generally less biased and provides a more precise assessment of temporal allele frequency change. We further develop an estimator for effective population size that is appropriate when dealing with dominant gene markers. Comparison with estimates based on codominantly expressed genes, such as allozymes or microsatellites, indicates that about twice as many loci or sampled individuals are required when using dominant markers to achieve the same precision. Keywords: allele frequencies, effective population size, genetic drift, RAPD, AFLP, temporal method Received 7 October 1998; revision received 23 February 1999; accepted 23 February 1999 Introduction Recent developments in molecular genetic techniques have provided several types of genetic markers that are suitable for routine screening of genetic variability in populations of virtually any organism. Most of these techniques utilize PCR-based amplification of DNA fragments and can be applied to even minute amounts of material, including DNA extracted from old collections. With direct comparisons of the genetic characteristics of old samples with recent ones comes the possibility of detailed analyses of temporal genetic change within populations (see, e.g. Nielsen et al. 1997). One application of measurements of temporal change of allele frequencies is to estimate the genetically effective population size of natural or captive populations, a quantity of considerable interest within the fields of conservation biology and population management (e.g. Allendorf & Ryman 1987; Lande & Barrowclough 1987). This so-called ‘temporal method’ has been applied to estimate effective population sizes in a number of species, using allozyme markers Correspondence: P. E. Jorde. (On leave from the Department of Biology, University of Oslo, PO Box 1050 Blindern, N-0316 Oslo, Norway.) E-mail: p.e.jorde@bio.uio.no © 1999 Blackwell Science Ltd (Krimbas & Tsakas 1971; Begon et al. 1980; Hedgecock & Sly 1990; Waples 1990; Hedgecock et al. 1992; Jorde & Ryman 1996), microsatellites (Miller & Kapuscinski 1997), minisatellites (Scribner et al. 1997), and mitochondrial DNA haplotypes (Laikre et al. 1998). To date, estimation of genetic drift and effective size has been based on genes with codominant expression, for which individual allelic variants are observed directly. Many recent techniques, however, as well as some older ones based on blood groups or certain allozymes, do not allow for direct observations of genotypes. For instance, random amplified polymorphic DNA (RAPD: Williams et al. 1990) and amplified fragment length polymorphism fingerprinting (AFLP: Vos et al. 1995) both amplify DNA fragments that include certain sequence(s) that are recognized by the selected primer. As a result, only individuals that carry the specific sequence will display the fragment, whereas others will not, giving the phenotypes ‘presence’ or ‘absence’ of the fragment, respectively. It is typically difficult or impossible to tell whether the fragment is present in one or two copies, i.e. whether it occurs in heterozygous or in homozygous condition, and the amplified fragment is therefore regarded as dominant to the recessive condition of absence of the fragment (Lynch MEC676.fm Page 1172 Wednesday, June 30, 1999 3:02 PM 1172 P . E . J O R D E , S . P A L M and N . R Y M A N & Milligan 1994). Because dominant markers have different sampling properties from codominantly expressed ones (e.g. Jorde & Ryman 1990; Lynch & Milligan 1994) it is presently unclear how to apply them to estimate genetic drift and effective size, and how dominant gene markers compare to codominant ones with respect to bias and precision of the resulting estimates. Here, we address these questions and present new estimators for allele frequencies and for temporal allele frequency shifts that are appropriate when dealing with dominant gene markers of any kind. E(q2) = [E(q)]2 + V(q) = q2 + V(q) (2) where E denotes the expected value operator, so that E(q) = q, and V(q) is the sample variance of q. For large samples the variance is (e.g. Crow & Kimura 1970; p. 512): 1 – q2 V(q) = ------------4n (3) Substituting x for q2 in eqns 2 and 3 leads to the following estimator for the frequency of the recessive (‘absence’) allele: Methods Allele frequency estimates A problem specific to the use of gene markers with dominant gene expression concerns bias in estimates of allele frequencies and derived quantities. The standard method for estimating allele frequencies for dominant markers utilizes the expected relationship between gene and genotype frequencies in a randomly mating population (i.e. Hardy–Weinberg proportions) and estimates the frequency of the recessive allele (q) from the observed number of recessive homozygote individuals (X) in a sample of n individuals: q = X/n = x of that quantity (Elandt-Johnson 1971; p. 104). In the present context of q and q2: (1) where x = X/n is the observed proportion of recessive homozygotes, i.e. the proportion of individuals that do not display the dominant marker phenotype. In eqn 1 and elsewhere, we assume that the genotypes occur in Hardy–Weinberg proportions so that x represents an unbiased estimate of the population frequency of recessive homozygotes (q2). The assumption of Hardy– Weinberg genotype proportions is necessary in order to estimate allele frequencies at dominant loci. In this respect dominant markers have a disadvantage relative to codominant ones because supplementary information is needed to check whether this assumption is justified. In the context of estimating effective size the assumption of Hardy–Weinberg proportions should be reasonable, however, because the application of the temporal method typically rests on the assumption of a single, randomly mating population with no evolutionary forces other than genetic drift acting on the marker genes. Under these conditions, large deviation from Hardy–Weinberg proportions is unlikely to occur and allele frequencies at dominant markers can be estimated from the phenotype proportions. However, even when genotypes do occur in Hardy– Weinberg proportions, the estimator (eqn 1) often provides biased estimates of allele frequency (Jorde & Ryman 1990). This bias arises because, generally, the expectation of a squared quantity is larger than the squared expectation 1– x q = x + ---------4n (4) as an alternative to the standard formula (1). Lynch & Milligan (1994), also noting the bias in (1), suggested another estimator x q = -------------------------x(1 – x) 1 – ----------------8nx2 (5) This last expression (5) cannot be applied when no recessive homozygotes are found in the sample, i.e. when x = 0, and q is set to zero in those cases. Figure 1 compares bias in the three estimators (1) (4), and (5) over a wide range of sample sizes and population allele frequencies. For each estimator the expected value, E(q), was calculated as the average estimate over all possible numbers of recessive homozygotes (X) observed, weighted by the probability of obtaining that number in a sample of n individuals when the true proportion of recessive homozygotes in the population is q 2, i.e. the n binomial probability Bin(X; n; q2) =   (q2)X(1 − q2)n−X. X For estimator (5) we used an estimate of q = 0 whenever X = 0 occurred in the calculations. It is obvious from the results depicted in Fig. 1 that none of the three estimators is unbiased over all combinations of allele frequency and sample size. Bias is primarily a concern when samples are small and the recessive allele is rare, often resulting in quite erroneous allele frequency estimates. Comparing the three estimators we note that the standard one (1) and that of Lynch & Milligan (1994) (5) behave similarly with respect to sample size and population allele frequency, with maximum bias expected for small samples and intermediate to low allele frequency. Overall, the latter estimator represents only a minor improvement over the standard formula with respect to bias. Our suggested estimator (4) behaves quite differently from the other two with significant bias expected primarily for very low q values. In particular, for q = 0 (i.e. when the recessive © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 MEC676.fm Page 1173 Wednesday, June 30, 1999 3:02 PM E S T I M A T I N G E F F E C T I V E S I Z E 1173 1.0 a frequency change because such measurements must in any case be limited to polymorphic marker genes with q > 0. Furthermore, reasonably accurate estimates of temporal change cannot be expected unless fairly large samples are used (below) and it is not clear from Fig. 1 which estimator to prefer under those circumstances. (equation 1) 0.8 0.6 0.4 Estimating temporal change 0.2 −0.06 −0.02 −0.04 −0.02 0.0 Population frequency of recessive allele 20 40 60 80 100 1.0 b (equation 5) 0.8 0.6 0.4 0.2 −0.06 −0.04 20 40 −0.02 −0.02 0.0 60 80 100 1.0 c (equation 4) Temporal shifts in allele frequencies are typically estimated from two or more samples of individuals taken at different occasions. Various strategies for sampling have been devised, viz. sampling before or after the individuals have reproduced, sampling individuals with or without replacement, and with one or more generations lapsing between samples (Nei & Tajima 1981; Waples 1989). For the purpose of evaluating the usefulness of dominant gene markers to estimate effective size we consider the simplest situation and assume that generation intervals are discrete (nonoverlapping) and that there are two samples, each consisting of n individuals, that are drawn exactly one generation apart before reproduction (i.e. sample plan II of Nei & Tajima 1981). The extension to other situations can easily be accommodated within the framework used here and is not discussed further (see Nei & Tajima 1981; Pollak 1983; Waples 1989, 1990; Jorde & Ryman 1995). One commonly used measure of temporal allele frequency shifts between samples is provided by Pollak (1983). For di-allelic loci this measure can be written as ( qx – q y )2 Fk = ---------------------qz ( 1 – qz ) 0.8 0.6 0.4 0.2 0.02 0.0 0.10 20 0.06 0.08 40 60 80 0.04 100 Sample size Fig. 1 Expected amount of bias, E(q) – q, in allele frequency estimates for dominant gene markers under various combinations of sample size (n) and population frequency of the recessive allele (q). (a) The standard estimator (1). (b) Lynch and Milligan’s estimator (5). (c) The proposed estimator (4). Lines represent areas of equal bias indicated by numbers (i.e. isoclines). Note that estimator (4) tends to yield estimates of q that are too high (i.e. positive bias), whereas the other two are biased downwards and yield estimates that are too low. allele is lacking) this estimator always yields a (biased) value of √1/(4n) (cf. eqn 4). While this particular property of (4) may in some cases be undesirable, it need not be of major concern when estimating temporal allele © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 (6) where qx and qy is the estimated frequency of the recessive allele in the first and in the second generation, respectively, and qz = (qx + qy)/2 is their mean. In the case of codominantly expressed, selectively neutral genes the expected value of Fk has a relatively simple relationship with the effective population size, Ne, namely (cf. Nei & Tajima 1981; Waples 1989): E(Fk) ≈ 1/(2Ne) + 1/n, and estimation of Ne can be done on the basis of that relationship: 1 Ne = -----------------------2Fk – 1/ñ (7) where ñ is the harmonic mean of the two sample sizes. The term 1/ñ may be viewed as a ‘sample-correction’ to Fk, accounting for the fact that Fk (eqn 6) provides an upward biased estimate of temporal allele frequency change. With dominant gene expression the statistical properties of the allele frequency estimates are more complicated, as discussed above, and we need to find the expected relationship between Fk and Ne in this situation. We address this problem by finding the expected value of the observed allele frequency shift between the two samples under dominant gene expression. Allowing Eδ to designate MEC676.fm Page 1174 Wednesday, June 30, 1999 3:02 PM 1174 P . E . J O R D E , S . P A L M and N . R Y M A N the expected value operator for the change in allele frequency (i.e. genetic drift) during the generation lapsing between the two samples, and Eσ that for sampling from the population, we have: E[(qx − qy)2] = E[(qx − qx + qx − qy)2] = E[(qx − qx) – (qy − qx)]2 = Eσ[(qx − qx)2] − 2Eσ[(qx − qx)Eδ(qy − qx)] + EσEδ[(qy − qx)2]. Here, Eσ(qx − qx) = 0 and Eσ [(qx − qx)2] = (1 – q2x)/(4n) is the sample variance of qx (eqn 3). Further, EσEδ[(qy − qx)2] = Eσ [(qy − qy)2] + Eδ [(qy − qx)2], where the first term is the sample variance of qy and the second term is the variance of the temporal allele frequency change (i.e. genetic drift) in the population over a generation, or qx(1 – qx)/(2Ne) (cf. Crow & Kimura 1970; eqn 7.3.2). Putting this together we obtain the expected squared difference in observed allele frequencies for samples drawn one generation apart and scored for a dominant locus: qx ( 1 – qx ) 1 – q 2x 1 – q 2 y E[(qx − qy)2] = ---------------------+ ---------------- + ---------------4n 4n 2N e qx ( 1 – qx ) 1 – q 2x - + ---------------≈ --------------------2N e 2ñ (8) where the approximation holds when the true shift in allele frequency over one generation is not very large, so that q2y ≈ q2x, as would be the case when Ne is not extremely small. Even when Ne is very small this simplifying assumption is reasonable because then the term involving 1/(2Ne) in equation 8 is likely to dominate over terms in 1/(4n) and the entire expression becomes rather insensitive to the numerators of the latter. Making this simplification, the expectation of the denominator of (6) becomes approximately equal to qx(1 – qx) and, to the extent that the expectation of a ratio can be taken as the ratio of the expectations of its numerator and denominator, we have the expectation for Fk for dominant gene markers: 1 – q 2x 1 E(Fk) ≈ --------- + ----------------------------2N e 2ñqx(1 – qx) (9) Ideally, we would like to have a measure of temporal allele frequency change that is related directly to 1/(2Ne) and that is independent of sample size (n) and allele frequency (qx). Equation 9 suggests that the modified measure Fk′ satisfies this criterion: ( 1 – q 2x ) Fk′ = Fk − ----------------------------2ñqx(1 – qx) (10) Here, the estimated frequency qx is substituted for the unknown true frequency qx in (9). Note that, when sample sizes are small and Ne is reasonably large, the average allele frequency over the two samples, qz, may be a more precise estimate of qx and we use this average in place of qx in (10) when calculating Fk′ in the following. In analogy with the codominant case (cf. eqn 7), the expression subtracted from Fk in (10) represents the contribution to Fk that is expected because of sampling errors in the allele frequency estimates. Hence, Fk′ should represent an unbiased estimate of genetic drift, assuming that the gene markers are selectively neutral, and can be applied directly to estimate the effective size of the population: 1 1 Ne = ---------- = -----------------------------------------------2F′k ( 1 – q 2z ) 2 Fk – ----------------------------2ñqz(1 – qz) (11) where Fk is calculated from eqn 6 using allele frequencies estimated from one of the three alternative estimators (1) (4) or (5). Bias In order to check the amount of bias that can be expected in the proposed estimator for Ne (eqn 11) we consider the situation of an infinitely large population being sampled at two occasions. With an infinite size there is no genetic drift between the sampling events and Fk′ should be zero: corresponding to an infinite estimate of Ne. We check this when Fk′ is calculated from sample allele frequencies that are estimated using either of the three estimators above. With two samples, each consisting of n individuals, there are (n + 1)2 possible combinations of the observed number of recessive homozygotes (X and Y, respectively) in the two samples, with X and Y ranging from 0 to n. Summing the values of Fk′ that result for each combination of X and Y and weighting by the probability, Bin(X; n; q2) × Bin(Y; n; q2), of that observation yields the expected value of Fk′. (In these calculations we ignore cases with no observed variation, i.e. when X = Y = 0 or X = Y = n, because these observations yield no information about genetic drift.) The resulting expected values of Fk′ for various sample sizes (n) and true allele frequencies (q) are presented in Fig. 2. Recalling that Fk′ should be zero in the case of an infinite population, the values depicted in Fig. 2 indicate that bias in Fk′ may be a problem for dominant markers. In particular, bias can be enormous when allele frequencies are estimated using either eqn 1 or 5, with maximum bias of 0.6 and 0.8, respectively, yielding an estimate of Ne (eqn 11) that is less than one! In comparison, our proposed estimator (4) has a maximum amount of bias of an order of magnitude below that of the other two (Fig. 2c). Also, bias arising from this estimator is generally negative, so that Ne would be judged too high, if anything, rather than too low. (Note that a negative estimate of Ne is typically interpreted as an infinite estimate.) For all three estimators, however, bias drops rapidly with increasing sample size and increasing frequency of © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 MEC676.fm Page 1175 Wednesday, June 30, 1999 3:02 PM E S T I M A T I N G E F F E C T I V E S I Z E 1175 a (equation 1) 0.005 (corresponding to the dotted lines in Fig. 2), an even wider domain of ‘unbiasedness’ results, in particular for estimator (4). Unless sample sizes are very small, most gene markers should lie within this domain and yield approximately unbiased estimates as judged by this criterion. E( F′k ) 0.6 0.4 0.2 0 1 0.8 0.6 0 20 40 n 80 0.2 100 q 0 b (equation 5) E( F′ k) 0.4 60 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0 20 40 n 60 80 0.2 100 q 0 c (equation 4) E( F′ k) 0.4 0.02 0 −0.02 −0.04 −0.06 −0.08 1 0.8 0 0.6 20 40 n 0.4 60 80 0.2 q 0 100 Fig. 2 Expected estimated amount of temporal allele frequency shift (bias) when sampling from an infinite population and scoring a dominant locus. The surfaces depict the expected value of Fk′ (eqn 10) under various sample sizes (n) and population frequencies (q) of the recessive allele. (a) Using the standard expression (1) for allele frequency estimation. (b) Using estimator (5). (c) Using estimator (4). Note the difference in scale of the graphs. The dashed and the dotted line along the figure base represent the isoclines Fk′ = 0.001 and 0.005, respectively, and describe areas of fairly low bias. (Fk′ for an infinite population should be zero.) the recessive allele, implying that most of the bias can be eliminated if analyses of temporal change are restricted to large samples and highly polymorphic marker loci. This restricted range corresponds to the area in Figs 2a–c described by the dashed line: above this line bias is less than 0.001 and the effective size is estimated to be 500 or larger, which should probably be considered acceptable. Relaxing the criteria for acceptable bias to, for example, © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 Precision An important consideration, in addition to bias, when estimating Ne from temporal shift in allele frequency is the precision of the estimate. Because genetic drift is a stochastic process, a single observation of temporal allele frequency shift is insufficient for achieving any degree of reasonable accuracy in the estimated Ne. Instead, estimation of Ne should be based on multiple observations, typically from several independently segregating gene markers, and an average Fk′ over observations (gene loci and/or generations) should be used when estimating effective size from Ne = 1/(2Fk′) (eqn 11). We performed computer simulations to check on the precision of the estimates for Ne under various conditions, using uniformly distributed random population allele frequencies, 0.1 ≤ q ≤ 0.9, in the first generation. Random binomial drawings of 2Ne genes for reproduction and n phenotypes (recessive and dominant types) for sampling were performed over one generation, using various true effective sizes (Ne = 20; 100; 500), sample sizes (n = 20; 100), and number of independently segregating dominant marker loci (k = 5; 20; 100). Average Fk′ (eqs 6 and 10), its standard deviation, sFk′, and the corresponding estimate for effective size (Ne; eqn 11) were calculated over 100 000 computer runs using allele frequency estimates provided by eqns 1, 4, and 5. In all cases pairs of samples that were both monomorphic for either phenotype were excluded from the calculations. For comparison, we also performed an analogous set of simulations using codominantly expressed di-allelic loci. Table 1 shows the estimates Fk′ and Ne resulting from the computer simulations. Comparing the estimates of Ne with the true values (leftmost column) when based on different estimators for allele frequencies reveals large amounts of bias in many cases. In particular, estimates of Ne based on allele frequencies calculated from (1) or (5) are generally too low, often seriously so. The point estimates of Ne based on these equations are also highly sensitive to sample size and appear quite misleading under almost all conditions. This finding is not unexpected in view of the large upward bias in Fk′ noted in Fig. 2 for these estimators. In contrast, calculating allele frequencies from estimator (4) yields estimates of Ne that, while not necessarily unbiased, appear more reasonable. In brief, this estimator tends to yield estimates that are too large when based on small samples and estimates that are somewhat too small for large samples. This latter Estimator (5) Estimator (4) Codominant loci © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 Ne n k Fk′ sFk′ Ne Fk′ sFk′ Ne Fk′ sFk′ Ne Fk′ sFk′ Ne 20 20 20 20 20 20 100 100 100 100 100 100 500 500 500 500 500 500 20 20 20 100 100 100 20 20 20 100 100 100 20 20 20 100 100 100 5 20 100 5 20 100 5 20 100 5 20 100 5 20 100 5 20 100 0.0922 0.0919 0.0916 0.0342 0.0341 0.0340 0.0684 0.0687 0.0687 0.0116 0.0116 0.0117 0.0636 0.0637 0.0637 0.0072 0.0072 0.0072 0.1058 0.0530 0.0236 0.0372 0.0186 0.0083 0.0963 0.0482 0.0215 0.0255 0.0128 0.0057 0.0944 0.0469 0.0211 0.0235 0.0118 0.0053 5 5 5 15 15 15 7 7 7 43 43 43 8 8 8 69 69 69 0.1045 0.1041 0.1037 0.0343 0.0342 0.0342 0.0808 0.0811 0.0811 0.0119 0.0119 0.0120 0.0761 0.0762 0.0762 0.0076 0.0076 0.0076 0.1139 0.0570 0.0254 0.0380 0.0190 0.0085 0.1055 0.0528 0.0236 0.0266 0.0133 0.0060 0.1036 0.0515 0.0232 0.0248 0.0124 0.0056 5 5 5 15 15 15 6 6 6 42 42 42 7 7 7 66 65 66 0.0211 0.0213 0.0211 0.0254 0.0253 0.0253 0.0035 0.0037 0.0037 0.0054 0.0054 0.0054 – 0.0001 0.0000 0.0000 0.0015 0.0015 0.0014 0.0618 0.0310 0.0138 0.0279 0.0140 0.0062 0.0518 0.0260 0.0116 0.0162 0.0081 0.0036 0.0500 0.0249 0.0111 0.0141 0.0071 0.0032 24 24 24 20 20 20 144 135 136 93 93 92 inf inf inf 344 338 341 0.0258 0.0257 0.0256 0.0256 0.0255 0.0256 0.0056 0.0055 0.0055 0.0050 0.0051 0.0050 0.0016 0.0015 0.0017 0.0010 0.0010 0.0010 0.0461 0.0231 0.0103 0.0222 0.0111 0.0050 0.0345 0.0171 0.0077 0.0094 0.0047 0.0022 0.0321 0.0160 0.0071 0.0069 0.0035 0.0014 19 19 19 20 20 20 90 91 90 100 99 99 305 324 297 506 495 489 MEC676.fm Page 1176 Wednesday, June 30, 1999 3:02 PM Estimator (1) 1176 P . E . J O R D E , S . P A L M and N . R Y M A N Table 1 Estimated amount of genetic drift over one generation (Fk′: eqn 10) and effective size (Ne: eqn 11) in computer simulations (each based on 100 000 replicates). Initial allele frequencies (qx ) varied uniformly between 0.1 and 0.9 among simulations and we used three different estimators when assessing the frequencies of the recessive gene from samples of individuals, viz. the standard estimator (eqn 1), Lynch and Milligan’s estimator (5), and our new estimator (4). Estimates based on codominant loci are included for comparison. Ne is the true effective size, n is the sample size (number of diploid individuals) in each generation, and k is the number of loci MEC676.fm Page 1177 Wednesday, June 30, 1999 3:02 PM E S T I M A T I N G E F F E C T I V E S I Z E 1177 downward bias for large samples is not very pronounced, however, and should in most cases be judged acceptable. As a side issue, we note that when using codominant markers (e.g. eqn 7) there is a certain downward bias in Ne for small samples (cf. rightmost column in Table 1 for n = 20). Estimates of temporal allele frequency shifts based on (4) also have considerably lower standard deviation (sFk′) than those obtained from the other two estimators for dominant alleles, implying higher precision of the point estimate. However, comparing the standard deviations to their means, it is evident that the precision is not very high in any of the estimates, regardless of which formula is used to calculate allele frequencies. This fairly low precision is, albeit to a lesser extent, also observed for codominant genes (cf. Table 1), and is only partially attributable to dominant gene expression. For estimates calculated on the basis of eqn 4, using dominant marker genes reduces precision (as measured by sFk′) by a factor of about 2/3 relative to what can be obtained if the same number of di-allelic codominant marker genes was used. Discussion Analysis of genetic drift from dominantly expressed gene markers has a number of disadvantages relative to using codominantly expressed ones. From the statistical point of view, problems with dominant markers arise because only phenotypes are observed, rather than the genes directly. One consequence of this is that, with the same number of diploid individuals screened, the number of observations (phenotypes or genes) is only half that for codominant markers, resulting in reduced precision. Further, when the genotypes occur in Hardy–Weinberg proportions (as assumed herein) the frequencies estimated for dominant and recessive alleles are not necessarily unbiased and neither are estimates of temporal allele frequency shift and effective population size. We have presented an estimator (eqn 4) for allele frequencies that, while not always unbiased, does minimize bias in measures of temporal allele frequency change and effective size. This estimator also yields more precise estimates of these quantities (i.e. having a lower standard deviation) than either the standard estimator (eqn 1) or the one proposed by Lynch & Milligan (1994) (eqn 5). We conclude that for measurements of temporal change, allele frequencies should preferably be calculated from eqn 4. While using the proposed estimator for allele frequencies should yield the least biased and most precise estimates of effective size for dominant gene markers, it is clear from the large standard deviations of Fk′ reported in Table 1 that the precision is often quite poor: in many of the simulations the standard deviation is larger than the mean value, implying particularly uncertain point estimates © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178 in those situations. However, it is also clear from the table that precision can be increased by using larger sample sizes and scoring more marker loci. The standard deviation of Fk′ declines approximately inversely with the square root of the number of loci or individuals (cf. Table 1). This implies that in order to increase the precision for dominant markers to match that for codominant ones, i.e. reducing sFk′ by approximately 2/3, about twice the number of loci or individuals need to be sampled [(3/2)2 = 2.25]. As when using codominant markers, precision can be increased further by sampling more than two generations (Nei & Tajima 1981; Waples 1989) or age classes (Waples 1990; Jorde & Ryman 1995). With samples taken from multiple generations or age classes the expected temporal shifts are larger than over a single generation and, hence, more easily measured in the presence of sampling errors. Measures of temporal allele frequency change calculated from dominant loci can further be combined with measures based on other kinds of gene markers, including codominantly expressed allozymes or microsatellites. Because these latter markers often segregate for multiple alleles, Fk′ for each marker locus should be weighted by the number of ‘independent’ alleles used for that locus, or K − 1 for a locus with K alleles. This assures that F for each allele is given equal weight in the average Fk′ and implies that each di-allelic marker locus with dominance is given a weight of 1 (one), whereas codominant loci with multiple alleles receive a higher weight. At this point it does not seem appropriate, without examining this question specifically, to reduce the weight for dominant markers further on account of their lower precision. This is because giving unduly low (or high) weight to the temporal shift that happened to occur at one or a few alleles may compromise rather than improve precision of the average. For similar reasons, Fk′ values are generally not weighted by sample size either. Acknowledgements This study was supported through grants to N.R. from the Swedish Natural Science Research Council and the Swedish research program on Sustainable Coastal Zone Management (SUCOZOMA), founded by the Foundation for Strategic Environmental Research (MISTRA). P.E.J. was supported by a Marie Curie postdoctoral fellowship from the European Commission. References Allendorf FW, Ryman N (1987) Genetic management of hatchery stocks. In: Population Genetics and Fishery Management (eds Ryman N, Utter F), pp. 141 –159. Washington Sea Grant Program, University of Washington Press, Seattle. Begon M, Krimbas CB, Loukas M (1980) The genetics of Drosophila subobscura populations. XV. The effective size of a natural MEC676.fm Page 1178 Wednesday, June 30, 1999 3:02 PM 1178 P . E . J O R D E , S . P A L M and N . R Y M A N population estimated by three independent methods. Heredity, 43, 335 – 350. Crow JF, Kimura M (1970) An Introduction to Population Genetics Theory. Harper & Row, New York. Elandt-Johnson RC (1971) Probability Models and Statistical Methods in Genetics. John Wiley & Sons, Inc., New York. Hedgecock D, Chow V, Waples RS (1992) Effective population numbers of shellfish broodstocks estimated from temporal variance in allelic frequencies. Aquaculture, 108, 215–232. Hedgecock D, Sly F (1990) Genetic drift and effective population sizes of hatchery propagated stocks of the Pacific oyster Crassostrea gigas. Aquaculture, 88, 21–28. Jorde PE, Ryman N (1990) Allele frequency estimation at loci with incomplete codominant expression. Heredity, 65, 429–433. Jorde PE, Ryman N (1995) Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics, 139, 1077 –1090. Jorde PE, Ryman N (1996) Demographic genetics of brown trout (Salmo trutta) and estimation of effective population size from temporal change of allele frequencies. Genetics, 143, 1369–1381. Krimbas CB, Tsakas S (1971) The genetics of Dacus oleae. V. Changes of esterase polymorphism in a natural population following insecticide control — selection or drift? Evolution, 25, 454–460. Laikre L, Jorde PE, Ryman N (1998) Temporal change of mitochondrial DNA haplotype frequencies and female effective size in a brown trout (Salmo trutta) population. Evolution, 52, 910 – 915. Lande R, Barrowclough GF (1987) Effective population size, genetic variation, and their use in population management. In: Viable Populations for Conservation (ed. Soulé ME), pp. 87–123. Cambridge University Press, Cambridge, UK. Lynch M, Milligan BG (1994) Analysis of population genetic structure with RAPD markers. Molecular Ecology, 3, 91–99. Miller LM, Kapuscinski AR (1997) Historical analysis of genetic variation reveals low effective population size in a nothern pike (Esox lucius) population. Genetics, 147, 1249–1258. Nei M, Tajima F (1981) Genetic drift and estimation of effective population size. Genetics, 98, 625–640. Nielsen EE, Hansen MM, Loeschcke V (1997) Analysis of microsatellite DNA from old scale samples of Atlantic salmon Salmo salar: a comparison of genetic composition over 60 years. Molecular Ecology, 6, 487–492. Pollak E (1983) A new method for estimating the effective population size from allele frequency changes. Genetics, 104, 531– 548. Scribner KT, Arntzen JW, Burke T (1997) Effective number of breeding adults in Bufo bufo estimated from age-specific variation at minisatellite loci. Molecular Ecology, 6, 701– 712. Vos P, Hogers R, Bleeker M, Reijans M et al. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research, 23, 4407–4414. Waples RS (1989) A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121, 379–391. Waples RS (1990) Conservation genetics of Pacific salmon. III. Estimating effective population size. Journal of Heredity, 81, 277–289. Williams JGK, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Research, 18, 6531– 6535. This study was conducted at the Division of Population Genetics, Stockholm University, headed by Nils Ryman. The Division’s research relates to the genetic structure of natural populations, genetic conservation, and to the genetic effects of human interference with natural populations. This paper will be a part of Stefan Palm’s PhD thesis. Per Erik Jorde is presently a postdoctoral fellow at the University of Oslo, working on genetic monitoring. © 1999 Blackwell Science Ltd, Molecular Ecology, 8, 1171–1178

Estimating genetic drift and effective population size from

Related documents

Products

Support

Estimating genetic drift and effective population size from

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib