A Bayesian Test of Independence for Two-way Contingency Tables Under Cluster Sampling

Bhatta, Dilli

Etd

A Bayesian Test of Independence for Two-way Contingency Tables Under Cluster Sampling

Public

We consider a Bayesian approach to the study of independence in a two-way contingency table obtained from a two-stage cluster sampling design. We study the association between two categorical variables when (a) there are no covariates and (b) there are covariates at both unit and cluster levels. Our main idea for the Bayesian test of independence is to convert the cluster sample into an equivalent simple random sample which provides a surrogate of the original sample. Then, this surrogate sample is used to compute the Bayes factor to make an inference about independence. For the test of independence without covariates, the Rao-Scott corrections to the standard chi-squared (or likelihood ratio) statistic were developed. They are ``large sample' methods and provide appropriate inference when there are large cell counts. However, they are less successful when there are small cell counts. We have developed the methodology to overcome the limitations of Rao-Scott correction. We have used a hierarchical Bayesian model to convert the observed cluster samples to simple random samples. This provides the surrogate samples which can be used to derive the distribution of the Bayes factor to make an inference about independence. We have used a sampling-based method to fit the model. For the test of independence with covariates, we first convert the cluster sample with covariates to a cluster sample without covariates. We use multinomial logistic regression model with random effects to accommodate the cluster effects. Our idea is to fit the cluster samples to the random effect models and predict the new samples by adjusting with the covariates. This provides the cluster sample without covariates. We then use a hierarchical Bayesian model to convert this cluster sample to a simple random sample which allows us to calculate the Bayes factor to make an inference about independence. We use Markov chain Monte Carlo methods to fit our models. We apply our first method to the Third International Mathematics and Science Study (1995) for third grade U.S. students in which we study the association between the mathematics test scores and the communities the students come from, and science test scores and the communities the students come from. We also provide a simulation study which establishes our methodology as a viable alternative to the Rao-Scott approximations for relatively small two-stage cluster samples. We apply our second method to the data from the Trend in International Mathematics and Science Study (2007) for fourth grade U.S. students to assess the association between the mathematics and science scores represented as categorical variables and also provide the simulation study. The result shows that if there is strong association between two categorical variables, there is no difference between the significance of the test in using the model (a) with covariates and (b) without covariates. However, in simulation studies, there is a noticeable difference in the significance of the test between the two models when there are borderline cases (i.e., situations where there is marginal significance).

Creator