3-hour tutorial on Small Area Estimation

The R User Conference 2008

August 12-14, Technische Universität Dortmund, Germany

The tutorial will take place on 11 August 2008, 14:00 - 17:30.

Tutorial outline

This tutorial is an introduction to Small Area Estimation (SAE) and how to compute a range of estimators with R. SAE is gaining popularity because of an increasing interest in providing estimators at different administrative scales, which usually involves many small areas. Examples that will be described in this tutorial include estimation of the average income per household, rate of unemployment and the relative risk of a certain disease.

Data sources frecuently in SAE include survey data with direct information of the poplation under study and other aggregate administrative data sources. These two data components can efficiently be combined to provide reliable estimates in small areas.

First of all, direct estimators will be described. These family of estimators rely almost entirely on the survey data. Hence, it may be difficult or impossible to provide estimates in those areas that have not been included in the survey sample. The Generalised Regression Estimator will also be discussed. survey package will be used in this part of the tutorial.

Regression models provide a suitable framework to borrow information for data from different areas and provide good estimates for all small areas. The tutorial will focus on linear models. Synthetic and composite estimators will be used in this section.

Mixed-effects models also play an important role in SAE. Estimates can be improved by included random effects that accomodate between-area differences better. We will illustrate how to use the nlme package to fit mixed-effects models with R in some SAE applications. SAE will also be used to illustrate the computation of some Spatial EBLUP estimators.

Bayesian hierarchical models have proven very useful in SAE. Important applications include disease mapping, environmental modelling of pollutants and many more. We will illustate how to fit some of these models using R2WinBUGS and WinBUGS. Spatial random effects are a particularly interesting example because they can be used to model spatial patterns and are specially useful to improve estimation in non-sampled areas. Different spatial packages, including spdep and maptools, will be used to export geographical information to be used with WinBUGS.

Finally, some other examples of non-linear models for SAE will be illustrated. In particular, how logistic regression can be used to estimate the rate of unemployment using a combination of individual and administrative data.

Maps are a useful way of reporting small area estimates. When the examples involve geographical information, sp and maptools packages will be used to illustrate how to handle and display maps in R.

Tutorial requirements

All participants must have a working knowledge of R. Prior knowledge of statistics and small area estimation methods is desirable.

Summary of course contents

Tutorial instructor

Dr. Virgilio Gómez-Rubio, Imperial College London

References

Course materials

Lecture notes

A bundle with all the course data, slides, SAE2 package and R script is available here .

IMPORTANT: A port of the EURAREA macros to R can be found at the DACSEIS Project website. Thanks to Prof. Ralf Münnick for pointing this out.

The SAE2 package can be downloaded from here: source package / Windows binaries

Furthermore, you can also check some materials from a previous course on spatial data analysis here. Look for unit 9.

R

R software and packages
This is the main site to download R and the packages that we will use in the course.

An Introduction to R (HTML, PDF)
Introductory tutorial to R, ideal for beginners. It contains a description of data types, commands, etc. and how to make basic statistical analysis.

WinBUGS

Software for Bayesian data analysis.