Ecological inference
Small area estimation
Model comparison in the presence of non-ignorable
missing responses
Software for ecological inference
We provide software for the R and WinBUGS statistical packages, to
implement our framework for combining ecological with individual-level
data. These are intended to be useful to practitioners wishing to use
these models in an applied context.
All the software on this page is freely available, and can be
redistributed under the terms of the GNU General Public Licence.
Overview
The software implements models for inferring the individual-level
relationships between exposures and outcomes, using aggregate data
alone, individual data alone, or a combination of aggregate and
individual data. The models are described in our article
Improving ecological inference using individual-level data (Statistics in Medicine, in press. PDF working copy).
To summarise, suppose we have one or both of
- An aggregate dataset with one record for each aggregate group, for
example a geographical area, or a stratum within area, for example
from a population census.
- An individual-level dataset, for example from a sample survey
study. There need not be the same number of individuals per area, and
there may be some areas in the aggregate dataset with no individuals.
These contain
- a disease outcome available on individuals as a binary response,
or from areas as a number or proportion of disease cases.
- any number of contextual explanatory variables.
- any number of binary explanatory variables, or exposures,
available directly from individuals or as proportions over areas.
- any number of continuous exposures with a multivariate normal
distribution within areas. These are available directly from
individuals, or from the aggregate data as area-level means and,
optionally within-area covariances.
- any number of contextual explanatory variables, that is,
characteristics specific to areas, constant within areas.
- optionally, the joint within-area distribution of the
n binary exposures, available as a matrix with
2n columns, and the same number of rows as the aggregate
data, containing the number of individuals in each area with each of
the distinct combinations of the exposures.
R package ecoreg
We have developed a package
ecoreg for the R statistical
software.
ecoreg
fits a range of models for this form of ecological and individual
data, using maximum likelihood estimation. Models with random
area-level intercepts are supported, with likelihoods calculated using
Gauss-Hermite integration.
This is available from the CRAN repository of R
packages. A detailed guide to the methodology and use of
ecoreg is available with the package, in the file
ecoreg-guide.pdf in the doc subdirectory of
the installed package, and in standard R help pages, for example
help(eco). If you are new to R, please read its
documentation, beginning with the manual An Introduction to
R, to familiarise yourself with the environment.
Queries or bug reports on this software should be addressed to Chris Jackson.
A short tutorial (or "vignette") with examples on how to use the package is
included in the package, and can also be downloaded
here (PDF).
- Windows installation
-
Download the binary package ecoreg_0.1.1.zip, and unzip into your R
library tree. The library tree is commonly c:\Program
Files\R\rw2011\library, where rw2011 changes
according to the current version of R.
- Linux / Unix / Mac OS X installation
-
Download the source package ecoreg_0.1.1.tar.gz, unzip into any directory, and execute from the command line
R CMD INSTALL ecoreg.
A detailed guide to the methodology and use of
ecoreg
is available with the package, in the file
ecoreg-guide.pdf in the
doc subdirectory of
the installed package, and in standard R help pages, for example
help(eco). If you are new to R, please read its
documentation, beginning with the manual
An Introduction to
R, to familiarise yourself with the environment.
This is a pre-release, development version of ecoreg.
When officially released, ecoreg will be published at the
Comprehensive R Archive
Network (CRAN), the official repository for contributed R
packages.
WinBUGS resources
Here we provide compound documents for WinBUGS 1.4, containing model
specifications, example data, instructions and documentation for a
range of models for ecological and individual data. One document is
provided for each specific model. From these examples, users familiar
with WinBUGS should be able to adapt the format of the model
specification and data for their own specific case. See the
BUGS website for more
information about WinBUGS.
Files to accompany Improving ecological
inference using individual-level data (Statistics in Medicine, 2006)
- eco2-agg.odc
- Ecological inference with one binary and one continuous exposure, and aggregate data alone.
- eco2-indiv.odc
- Ecological inference with one binary and one continuous exposure, and individual data alone.
- eco2-agg-indiv.odc
- Ecological inference with one binary and one continuous exposure, and combined individual and aggregate data.
- eco2-agg-indiv-spatial.odc
- Ecological inference with one binary and one continuous exposure, combined individual and aggregate data, and spatially-correlated random
effects.
- eco3-agg-indiv.odc
- Ecological inference with three binary exposures, where the within-area joint distribution
of the exposures is available.
Files to accompany Hierarchical related regression for
combining aggregate and individual data in studies of socio-economic
disease risk factors
- hrr.odc
- Hierarchical related regression for hospital admission for cardiovascular disease.
Note: By clicking the above links, the files may not be displayed properly. But the original .odc files can be downloaded here and can then be opened in WinBUGS.
Queries or bug reports on this software should be addressed to Chris Jackson.
Files to accompany workshop on Introduction to methods for
analysis of combined individual and aggregate social science data
(course details including course notes can be found
here.)
Winbugs code
for practical demonstration
Software for small-area estimation
R package SAE
Small area estimation using EBLUP estimators. (Virgilio Gómez Rubio, Nicola Salvati).
A short tutorial (or "vignette") comparing various small area
estimation methods, is included in the package, and can also be
downloaded here (PDF).
Windows installation
Download the binary package SAE_0.07.zip, and unzip into your R
library tree. The library tree is commonly c:\Program
Files\R\R-2.4.1\library, where R-2.4.1 changes
according to the current version of R.
Linux / Unix / Mac OS X installation
Download the source package SAE_0.07.tar.gz, unzip into any directory, and execute from the command line
R CMD INSTALL SAE.
Queries or bug reports on SAE should be addressed to Virgilio Gómez Rubio.
WinBUGS code for SAE
Models for full data
The following code implementes some Spatial Bayesian Models for Small Area
Estimatio when the target variable is Normal. See the working paper on
Bayesian Small Area Estimation for
details.
- Area Level Model
- Area Level Model with unstructured and spatially correlated random effects.
- Unit Level Model 1
- Unit Level Model with unstructured and spatially correlated random effects,
and same within area variation.
- Unit Level Model 2
- Unit Level Model with unstructured and spatially correlated random effects,
and different within area variation.
- Unit Level Model 3
- Unit Level Model with unstructured and spatially correlated random effects,
and hierarchical structure on the different within area variation.
Models with missing data
The following code is similar but it can handle missing data, i.e.,
provide estimates in areas that have not been included in the survey.
- Area Level Model (missing data)
- Area Level Model with unstructured and spatially correlated random effects.
- Unit Level Model 1 (missing data)
- Unit Level Model with unstructured and spatially correlated random effects,
and same within area variation.
- Unit Level Model 2 (missing data)
- Unit Level Model with unstructured and spatially correlated random effects,
and different within area variation.
- Unit Level Model 3 (missing data)
- Unit Level Model with unstructured and spatially correlated random effects,
and hierarchical structure on the different within area variation.
Models for the ranking of areas
The following code implements Small Area Estimation and different methods for
the ranking of areas. The value 29 and 58 correspond to the rank of the 10% and
20% areas with the lowest income used in our examples and should be changed.
- Area Level Model
- Area Level Model with unstructured and spatially correlated random effects.
- Area Level Model (missing data)
- Area Level Model with unstructured and spatially correlated random effects.
- Unit Level Model 2
- Unit Level Model with unstructured and spatially correlated random effects,
and different within area variation.
- Unit Level Model 2 (missing data)
- Unit Level Model with unstructured and spatially correlated random effects,
and different within area variation.
Software for comparing models in the presence of non-ignorable missing
responses
An R script containing the
functions used for calculating the DIC based on the observed data
likelihood (DICO) presented in the paper, Using DIC to compare selection models with non-ignorable missing responses.
An illustrative example of the DICO functions
An example of the use of the DICO functions can be implemented by downloading the files below and executing the commands in the example R script, which calls the functions in DICOcodeWebV1.R.
DICO.zip contains all 4 files.