Abstract:
Posters COMPARISON OF SEGREGATION ANALYSIS FOR NORMAL AND BINARY DISEASES TRAITS H. Ilahi and H. N. Kadarmideen Statistical Animal Genetics Group, Institute of Animal Sciences , Swiss Federal Institute of Technology, ETH Zurich CH- 8092, Switzerland SUMMARY Seg regation analysis was performed on normal and binary d ata in order to compare accuracy and bias of parameters estimated on different scales . Data simulated and analysed were: data on the underlying normally distributed liability (NDL) and binary data created by truncating NDL data based on two thresholds corresponding to two incidences. The parameters estimated under mixed inheritance (H1 ) for normal trait were similar to the true values of parameters used in the simulation. The major gene variance was howe ver underestimated. On the other hand, the estimated parameters under polygenic inheritance (H0 ) were overestimated, especially for the genetic and the permanent environmental variances. Under H0, fo r binary trait and for both incidences, the estimates of heritabilities and repeatabilities were the same and high. However, under H1 , these estimates were very low and slightly higher for 40% incidence. Using a low incidence (15%), the results show an overestimation of the unfavorable genotype A1A1 frequency and underestimation of A 1A2 and A2 A2 genotypes frequencies. In the case of high incidence (40%) however, there is an overestimation of the favorable genotype A2 A2 frequency and underestimation of unfavorable genotype A 1A1 frequency. For the normal trait , the estimated heritabilities and repeatabilities were lowered from polygenic H0 to mixed inheritance H1 . However, for binary trait these estimates for both incidences were dra matically lowered from H0 to H1 . Following these preliminary results, i could be concluded that power for detecting major gene is higher for NDL than 0/1 data and t estimates are more biased for 0/1 than NDL data. Keywords : Normal trait , binary trait, incidence, major gene, mixed model, segregation analysis INTRODUCTION In domestic animal populations, genetic analyses of quantitative traits have been thoroughly addressed for traits whose phenotypes are controlled by many genes (polygene) each having a small effect, and follow a continuous distribution (normal). However, in many cases phenotypes and especially disease traits are expressed in two or more categories, representing binary and categorical traits, respectively . Several of these traits are controlled by few genes with a large effect (major genes or quantitative trait loci, QTLs) and polygene. The objective of this study was to compare the segregation analysis applied to trait on the norma l versus binary scale with two incidences (15 and 40%) using simulated data. In a first step we test if the transformation of normally distributed liability data to binary data has an influence (bias) on the estimation of genetic parameters of the population. In a second step we compare the segregation analysis for binary traits using two different incidences . 376 AAABG Vol 15 MATERIALS AND M ETHODS Simulation of the normal ly distributed liability (NDL) data . The data were simulated using a mixed inheritance model (polygene + major gene) and according to a hierarchic al and balanced family structure: one population consists of 20 sire families with 100 dams per sire , which resulted in 2000 dams. A total of 3 records/phenotypes per dam was simulated (i.e. 300 records per sire ). Therefore the total number of records in the population is 6000. We assumed more than one phenotype per dam, because in the most cases , livestock data sets are consisted of repeated records. The phenotypic data were simulated as follows: y ij = mi + aij + pe ij + eij where y ij is the phenotype, mi is the effect of the ith genotype at major gene, a 2 aij ~ N (0, g ) , pe ij ij is the polygenic effect of the jth individual bearing the ith genotype, peij ~ N (0, 2 pe is the permanent environmental effect, 2 pe ) and eij is the residual 2 effect, eij ~ N (0, e ) , where 2 , g and 2 e are polygenic, permanent environmental and residual variances, respectively . The single major gene is assumed to be an additive, biallelic (A 1 and A 2), autosomal locus with Mendelian transmission probabilities . We consider here that p1 = 0.6 and p2 (= 1- p1 ) are the frequencies of alleles A1 and A2 . Th ree genotypes can be encountered: A 1A1 , A1 A2 2 2 and A2A2 , with a frequency of p1 , 2 p1 p2 and p2 , respectively. The A2 allele is assumed to increase the trait value, and is called the favorable allele. Further, we assume no dominance and the additive allele effect a was 3.7 phenotypic standard deviation units of the trait . The phenotypic data were simulated using heritability h 2 of 0.41 and repeatability r of 0.52. The genotype of the offspring was determined according the Mendelian transmission probabilities. The polygenic effect of the offspring was determined as the summation of the mean of the parents' polygenic effect and the Mendelian sampling effect. The true values of parameters (major gene and polygene) used in the simulation of the population are illustrated in the Table 1 . Simulation of binary (0/1) data. The liability models for analysis of binary dat a were first proposed by (Wright 1934) and have been thoroughly investigated (e.g. Kadarmideen et al . 2000 applie d liability models to QTL mapping). The simulated normal trait was standardized using the average , � and the standard deviation, P of the trait as: y* = ( y - � ) / P , where y* is the s tandardized normal data with N (0,1) . Then, based on the liability concepts, the y * could be transformed into binary data as follows: b If y* > t then yb = 1 , and if y * t then y b = 0 , where t is the threshold point. Here y taking value of `1' could be considered as diseased and `0' as healthy, thus representing liability model for complex diseases. The values for thresholds t were chosen in such a way that it represents two scenarios: a less common disease with 15% and more common disease with 40% incidence. Therefore the corresponding values of t were: t = 1.036 for 15% and t = 0.253 for 40% (Falconer and Mackay, 1996). Both the underlying normally distributed liability data (NDL) and binary (0/1) data resulting from truncating the same NDL data were kept for segregation analyse s. 377 Posters Statistical a nalys e s . There were 3 types of data sets. The original NDL data and two binary data sets with 15% and 40% incidences. Same segregation analys is method was performed on all the normal and binary d ata sets . Simulations and analyses were replicated 100 times for each combination of parameters . Different values of parameters were used as initial values for the calculations of the estimated parameters. The segregation analysis method used in this study was based on the comparison of the likelihoods under 2 inheritance hypotheses (Le Roy et al., 1990, Ilahi et al ., 2000 and Bodin et al ., 2002): M ixed inheritance hypothesis (H1) . This model describes the genetic transmission of the simulated trait by polygenic effects and a single major gene effect. The parameters to be estimated are: the mean of each genotype( � frequencies A1 A1 ,� A1 A 2 ,� A2 A2 2 ) , the three variance components ( g , 2 pe 2 , e) and the genotypic f ( A1 A1 ) , f ( A2 A2 ) and ( f ( A1 A2 ) = 1 - f ( A1 A1 ) - f ( A2 A2 )) . These estimated parameters allowed the computat ion of the `residual' heritability, and the repeatability. Polygenic inheritance hypothesis (H0). This model, which is a sub-model of the H1 mixed inheritance hypothesis, is given by � A1 A1 = � A1 A2 = � A2 A2 = � . In th is case the parameters to be estimated are: � , 2 g, 2 pe , 2 e from which we can compute the `total' heritability, and the repeatability. The likelihoods 2 d l 0 and l 1 were computed respectively for both hypotheses H0 and H1 , the likelihood ratio is given by LR = 2 log (l 0 / l 1 ) . This likelihood ratio is compared to the value of with degrees of freedom d equal the difference in number of para meters between the mixed and polygenic inheritance hypotheses (Le Roy et al. 1990, Kadarmideen et al. 2000 ). In this analysis, d = 4 . The estimation of parameters maximising the likelihoods was carried out using the Gauss -Hermit quadrature (D01BAF) and optimization (E04JBF) subroutines of the NAG Fortran Library with a quasi-Newton algorithm in which the derivatives were estimated by finite differences. RESULTS AND DISCUSSION The results of parameter estimates by segregation analyses for normal and binary traits under both polygenic and mixed inheritance models are given in Tables 1 and 2, respectively. The mean of the likelihood ratio ( LR ) fo r the normal trait , comparing mixed and polygenic models was about 165, greatly 2 exceeding 13.3, the tabulated value of 4 distribution at 1% significance level. This ha s confirmed the true mixed genetic determinism of the simulated trait . Using normal trait , the estimated parameters under mixed inheritance (H1 ) were similar to the true values of parameters used in the simulation . The major gene variance was however unde restima t ed (Table 1) . On the other hand, the estimated parameters under polygenic inheritance (H0 ) were overestimated, espec ially for the genetic and the permanent environmental variances. This is explained by the genetic model used in the simulation of data set: the major gene has a large additive effect on the trait. Moreover, under H0 , the major gene effect was not taken into account to explain the genetic variability of the analysed trait , which resulted in overestimation of genetic and permanent environmental variances. 378 AAABG Vol 15 For binary trait with both incidences (15 and 40%) under H0 , the estimates of heritabilities and repeatabilities were the same and high. In the case of H1 , however, the segregation analysis method used in this study did not allow t he estimation of the permanent environmental variance. This may be due to the loss of genetic variability and information when normal distributed data were truncated to 0/ 1 binary form (Kadarmideen et al . 2000). In a recent study Miyake et al. (2002) using segregation analyses for binary traits, have also found similar problems in the estimat ion of variance components and to obtain a good convergence to true values . Us ing a low incidence (15%), the results of segregation analysis for binary trait show an overestimation of the unfavorable genotype A 1A1 frequency and underestimation of A 1A2 and A2 A2 genotypes f requencies. In the case of high incidence (40%) however, there is an overestimation of the favorable genotype A2 A2 frequency and underestimation of unfavorable genotype A 1A1 frequency, (Table 2). This corresponds to earlier findings that statistical power is lower and bias is higher for low incidence than for intermediate incidence (Kadarmideen et al. 2000). For the normal trait , the estimated heritabilities and repeatabilities were lowered from H0 to H1 , from 0.54 to 0.38 and from 0.80 to 0.51, respectively. This was expected and d ue to the taking into account of major gene effect in H1. However, in the binary trait these estimates for both incidences were dra matically lowered from H0 to H1. It decreased from 0.38 to 0. 01 and from 0.60 to 0.01 for 15% incidence, and from 0.39 to 0.012 and 0.60 to 0. 01 2 for 40% incidence, respectively . We can observe that the estimated of residual variance did not change from H0 to H1, an underestimation of genetic variance especially for binary trait with low incidence and non estimation of the perma nent environmental variance. This indicated that the parameters estimated on the binary scale are biased. This paper s howed the possibility of apply ing segregation analysis to binary traits with intermediate incidence under mixed inheritance. However, more research is needed to apply and to investigate more appropriate statistical methods and softwares to detect major genes segregating in binary or categorical traits. The method used in this study for segregation analysis for binary traits show a weakness on the estimation of all the parameters and to give an expected likelihood values in both polygenic and mixed inheritance models of the population. Further analysis and other alternative methods using Bayesian methodology (e.g. Janss et al. 1995, 1998) are required. REFERENCES Bodin, L., SanCristobal- Gaudy, M., Lecerf, F., Mulsant, P., Bib_, B., Lajous, D., Belloc, J.P., Eychenne, F., Amigues, Y. and Elsen, J.M. (2002) Genet. Sel. Evol. 34 :447. Falconer, D.S. and Mackay, T.F.C. (1996) Introduction to quantitative genetics 4th ed. Long. London. Ilahi, H., Manfredi, E., Chastin, P., Monod, F., Elsen, J.M. and Le Roy, P. (2000) Genet. Res. 75 :315. Janss, L.L.G. (1998). 6th WCGAPL. 27 :459. Janss, L.L. G. and Van Arendonk, J.A.M. (1995) Theor. Appl. Genet. 91 :1137. Kadarmideen, H.N., Janss, L.L.G., Dekkers, J.C.M. (2000) Genet. Res . 76 :305. Le Roy, P., Naveau, J., Elsen, JM. and Sellier, P. (1990) Genet. Res. 55 :33. Miayke, T., Sasaki, Y., Dolf, G. and Gaillard, C. (2002) 7th WCGAPL. Communication N� 21 -05. Wright, S. (1934) Genetics. 19 :506. 379 Posters Table 1 . True values of parameters and parameter estimates by segregation analyses for normal trait ( averages and standard deviations of 100 replicates ) Parameters � � A1 A1 � A1 A2 � A2 A2 f ( A1 A1 ) f ( A1 A2 ) f ( A2 A2 ) 2 g 2 pe 2 e 2 m Heritability Repeatability True values 0 -3.50 0 3.50 0.36 0.48 0.16 1.44 0.36 1.69 5.88 0.41 0.52 Polygenic inheritance (H0 ) -0.10 ( � 4.74 ( � 2.38 ( � 1.68 ( � 0.54 ( � 0.80 ( � 0.28) -3.47 ( 0.04 ( 3.55 ( 0.35 ( 0.48 ( 0.17 ( 1.30 ( 0.45 ( 1.68 ( 3.45 (( 0.38 ( 0.51 ( Mixed inheritance ( H1 ) - � 0.18) � 0.20) � � � � � � 0.05) � 0.03) � 0.18) � 0.80) 0.06) 0.03) 0.26) 0.09) 0.19) 0.77) 0.66) 0.03) 0.07) 0.01) � 0.02) Table 2. Parameter estimates by segregation analyses for binary trait using two incidences (average and standard de viations of 100 replicates ) Parameters Incidence= 15% Polygenic inher. Mixed inheri . (H0 ) (H1 ) 0.190 ( � 0.022) 0.023 ( � 0.004) 0.050 ( � 0.005) 0.852 ( � 0.011) 0.53 ( � 0.08) 0.38 ( � 0.10) 0.09 ( � 0.05 0.051 ( � 0.010) 0.0005 ( � 0.000) 0.030 ( � 0.010) 0.000 0.053 ( � 0.003) 0.049 ( � 0.002) 0.050 ( � 0.010) 0.38 ( � 0. 07) 0.010 ( � 0.004) 0.60 ( � 0.02) 0.010 ( � 0.004) Incidence= 40% Polygenic inher. Mixed in heri . ( H1) (H0 ) 0.410 ( � 0.072) 0.053 ( � 0.023) 0.096 ( � 0.018) 0.882 ( � 0.019) 0.19 ( � 0.09) 0.51 ( � 0.14) 0.30 ( � 0.12) 0.091 ( � 0.021) 0.001 ( � 0.00) 0.050 ( � 0.018) 0.00 0.091 ( � 0.004) 0.084 ( � 0.009) 0.090 ( � 0.022) 0.39 ( � 0.08) 0.012 ( � 0.007) 0.60 ( � 0.02) 0.012 ( � 0.007) � � A1 A1 � A1 A2 � A2 A2 f ( A1 A1 ) f ( A1 A2 ) f ( A2 A2 ) 2 g 2 pe 2 e 2 m Heritability Repeatability 380