Analysis of binary response data by generalized linear models.

Livestock Library/Manakin Repository

Show simple item record

dc.contributor Nicholls, PJ
dc.contributor Tyrrell, RN
dc.date.accessioned 2012-01-25T12:26:48Z
dc.date.available 2012-01-25T12:26:48Z
dc.date.issued 1980
dc.identifier.citation Proc. Aust. Soc. Anim. Prod. (1980) 13: 337-340
dc.identifier.uri http://livestocklibrary.com.au/handle/1234/7204
dc.description.abstract ANALYSIS OF BINARY RESPONSE DATA BY GENERALISED LINEAR MODELS P.J. NICHOLLS* and R.N. TYRRELL** SUMMARY In a linear model approach to the analysis of binary data, parameters of the model may be estimated by the method of maximum likelihood, or by weighted or unweighted least squares. The technique of generalized linear models permits maximum likelihood estimates of parameters to be obtained by iterative weighted least squares, with tests of significance based on analysis of deviance, a generalization of the analysis of variance. The iterative method is applied to data on the mating activity of maiden ewes and the results compared with analysis by unweighted least squares. INTRODUCTION A response variable taking the form of a simple dichotomy, e.g. success/ failure, or presence/absence, is often encountered in animal production research. Such binary variables are assumed to follow the binomial distribution, a success taking the value 1 with probability p and a failure the value 0 with probability l-p. For a random sample of n observations with r successes, the sample has estimated p equal to r/n, with sample mean p=np and variance np(l-p). Several methods have been used to assess the influence of sources of variation such as treatments or attributes on binary data. If responses can be crossclassified by one or more independent factors, association between the response variable and factors can be tested by contingency table methods (Snedecor and Cochran 1967). A more general class of problems can be considered, however, from a linear model approach, comparable with regression analysis or analysis of variance for normally distributed data. For a dependent variable z and independent where the b's are parameters to be estimated and e is a random error with zero mean and constant variance. The x's can be either quantitative or qualitative, or a mixture of both, as in comparison of regressions between groups. Techniques for applying linear models to a binary response include the methods of unweighted or weighted least squares, and the method of maximum likelihood. Application of least squares has the problem that binomial variance depends on p, and homogeneity of variance can only be approximately satisfied for small ranges of p, particularly for p near 0 or 1. When the data are grouped in, sayr the method of weighted least squares, with weights ni/pi(l-piI, could be used. A further problem with linear models on the p scale is that the finite (0,l) range limits the validity of parameters; the model is unlikely to be a reliable predictor for p near 0 or 1. Stimulus-response curves for binomial responses usually have -f sigmoid shape; two transformations of p which have this shape are the probit @ (p) and the logit ln(p/(l-p)) (Cox 1970), both of which * Department of Agriculture, Biometrical Branch, Sydney, N.S.W. **c/- Department of Agriculture, Division of Animal Production, Sydney, N.S.W. 337 Animal Production in Australia For example, on the logit scale the difference in p from 0.90 to 0.95 is equal to that from 0.50 to 0.679. The well-known maximum likelihood method of probit analysis permits fitting models with quantitative variables, but it involves an iterative procedure. Weighted least squares is non-iterative, but the presence of quantitative variables precludes its use unless each can be satisfactorily grouped. If not, the only non-iterative procedure available for fitting linear models with quantitative variables may be unweighted least squares on the 0,l values. Generalized linear models Nelder and Wedderburn (1972), extending the method of probit analysis, introduced generalized linear models for data distributed according to some exponential family, including the binomial. A generalized linear model for binomial data is defined to have three components: i) ii) and iii) a random component from the observation z with mean v, a systematic component from the independent variables x a transformation y = f(p), such as the logit ln(p/(n-p) ), linking the mean 1-1 with the linear predictor y. With the three components specified, Nelder and Wedderburn showed that maximum likelihood estimates of the parameters b. of the linear predictor could 1 be obtained by iterative weighted least squares. Given a model with ordered terms, the procedure involves stepwise addition of terms to sub-models and, at each stage, determination of the deviance, i.e. minus twice the difference between the maximised log-likelihood for the current sub-model and that for the saturated model (for which all predicted values co'n3 tide with the observations). Each deviance is distributed approximately as x with degrees of freedom (d-f.) N-k-l, with N the number of observations and k the number of parameters fitted to that stage. The adequacy of any sub-model may then be assessed. An analysis of deviance table, analogous to an analysis of variance sable, can be constructed from first differences of the deviances, permitting x tests of parameters fitted at each stage. The generalized linear model method is available in the programs GLIM and GENSTAT, and has been incorporated in the N.S.W. Department of Agriculture's REG, the program used to perform the analyses discussed in the example. APPLICATION The dataaretaken from an experiment at Trangie, N.S.W., investigating the influence of level of nutrition (N), age (A), weight (W) and gain in weight (G) on the onset of oestrus of 689 maiden ewes. The binary response variate took the value 1 if a ewe mated during the first 21 days of the joining period, zero, otherwise. Ewes from 3 age groups (born in one of 3 consecutive fortnights) were kept on one of three levels of nutrition up to joining - low, high and low-thenhigh. Weights at the commencement of joining and gain in weight over the four weeks to joining were recorded for each ewe. Table 1 summarizes the results for each nutrition-age group. 338 Animal Production in Australia TABLE 1 Summary of the mating results for each nutrition-age group, with the mean and range for weight (W) and gain in weight (G) given in kg Assuming the low-high nutrition level to be mid-way between low and high, linear and quadratic contrasts for nutrition (Nl, Nq) and for age (Al,Aq) were formed, with polynomial terms up to cubic for weight (Wl, Wq, W C) and quadratic for gain (Gl, Gq). All two - and three-factor interactions were considered, but four-factor interactions were ignored. Unweighted least squares analyses were first performed on the (untransformed) Or1 data, and although the conditions of normality and homogeneity of error do not hold, the usual F-tests were applied with Type I error rate P=O.O5. After several re-orderings of terms the single d.f. model reduced to that shown on the right-hand side of Table 2. TABLE 2 Analysis of deviance and analysis of variance for the model determined by unweighted least squares 339 Animal Production in Australia The analysis of deviance table for the same model, resulting from application of the iterative weighted least squares method with logit transformation, is given on the left-hand side of Table 2. After complete reduction of the model, no terms were found significant by the iterative method which were not also significant by unweighted least squares. The NlAlWl coefficient, significant in the unweighted analysis (P'O.O22), was not`detected as significant (P'O.14) by the iterative method (p scale: -0.0114+0.0049; logit scale: -0.060+0.040). The individual Wl coefficients contributing to the NlxAlxWl interaction are shown with their standard errors for both scales in Table 3. TABLE 3 Individual Wl coefficients contributing to the NlxAlxWl interaction The coefficient for oldest ewes on high nutrition (where 94% mated) is relatively much smaller on the p scale than on the logit scale, suggesting that the significant NlxAlxWl interaction is due to the finite (0,l) range of the p scale. For the unweighted analysis model, 37 fitted values exceeded 1 on the p scale - 34 occurred in three of the groups in Table 3 (given in brackets), while 3 were in the oldest low-high group. Clearly, acceptance of the p scale model suggested by the unweighted analysis poses problems of inference due to the finite range of the response and non-homogeneity of error. The logit scale model without interaction terms involving Wl was: While the iterative process leads to a less complex model, it is very time consuming. From three to seven iterations were needed to fit the individual terms of the model in Table 1, stepwise fitting of all 14 terms (mean included) taking a total of 221 sec. The experience with this example suggests that for moderate to large sets of binary response data, with complex models that include ungrouped quantitative variables, an efficient procedure for determining the most parsimonious model would be: preliminary reduction of the model by unweighted least squares, with a suitable Type I error rate, followed by iterative weighted least squares on the reduced model using the logit transformation. With data that can be satisfactorily grouped, or which are adequately replicated, preliminary reduction of the model by ordinary weighted least squares would be more efficient. REFERENCES COX, D.R. (1970). 'Analysis of Binary Data (Chapman and Hall: London) NELDER, J.A. and WEDDERBURN, R.W.M. (1972). J.R. Statist. Soc. A, 135: 370. SNEDECOR, G.W. and COCHRAN, W.G. (1967). 'Statistical Methods' 6thd. (Iowa State University Press: Ames). 340
dc.publisher ASAP
dc.source.uri http://www.asap.asn.au/livestocklibrary/1980/Nicholls80.PDF
dc.title Analysis of binary response data by generalized linear models.
dc.type Research
dc.identifier.volume 13
dc.identifier.page 337-340


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Livestock Library


Advanced Search

Browse

My Account