SPIDA 2010: Summer Program in Data Analysis

Description of Topics

SPIDA Program: June 2 - 10, 2010
Dates	Topic	Instructor
Wednesday June 2nd	Introduction to R: Workshop	Glenn Stalker

Thursday June 3rd	Review of Linear Models	John Fox
Friday June 4th	Logit and Probit Models for Categorical Response Variables
Saturday June 5th	Generalized Linear Models (GLMs)

Monday June 7th	Mixed Models for Hierarchical Data	Georges Monette
Tuesday June 8th	Mixed Models for Longitudinal Data
Wednesday June 9th	Further Topics
Thursday June 10th	Extensions of Mixed Models

June 2nd
Introduction to R: Workshop

The statistical programming language and computing environment R is a highly programmable and extensible environment for statistical analysis. Students will learn the essentials of importing data, data management and statistical modeling. Programming in R will be introduced, as will the impressive graphing capabilities of the software. R is an easily acquired free open-source program supported on Windows, UNIX/Linux and Macintosh operating systems. The workshop assumes no previous experience with R.

Linear and Generalized Linear Models

by John Fox

Linear and generalized-linear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixed-effects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and log-linear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The course will also cover the implementation of linear and generalized linear models in R. The course assumes that participants have previously been exposed to applied linear regression analysis. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.

June 3rd
Review of Linear Models

Multiple regression analysis
Dummy-variable regression
Principle of marginality and "effect displays" for complex models with interactions
Regression "diagnostics" for unusual data, non-constant error variance, and nonlinearity; how to correct these problems

June 4th
Logit and Probit Models for Categorical Response Variables

Why the linear model should not be applied directly to categorical data
Logit and probit models for dichotomous (two-category) responses
Logit models for unordered and ordered polytomous (many-category) responses
Effect displays for logit models

June 5th
Generalized Linear Models (GLMs)

The structure of GLMs: linear predictors, distributional families, and link functions.
How the familiar linear, logit, and probit models fit into the GLM framework.
Poisson regression models for count data; handling over-dispersed count data.
Time permitting: Loglinear models for contingency tables.
Diagnostics for GLMs.

Linear and Non-linear Multilevel Models

by Georges Monette

Mixed models are useful for a wide range of data structures and research questions. They can be used for the analysis of nested multilevel (hierarchical) data such as students nested in classes nested in schools, where research questions may involve relationships among student level variables or between student level and school level variables. Mixed models can also be used for the analysis of longitudinal data where measurements are obtained on a number of occasions on each of a number of subjects. They can also be used to fit flexible functions of time using non-parametric splines. Mixed models can handle unbalanced data (measurements at different times and time-varying predictor variables) and missing data, under some assumptions. All these features make them a very interesting method for the analysis of growth curves in accelerated longitudinal designs, such as that used by the Canadian National Population Health Survey (NPHS).

We will primarily use two R packages for mixed models, nlme and lme4. Both are needed since each package has important applications that cannot be done by the other. We will learn how to use R to fit mixed models, produce exploratory, diagnostic and presentation graphs of data. We will also see many examples of specialized functions written in R for the analysis of mixed models.

Prospective participants are encouraged to explore http://cran.r-project.org and its facilities for mixed models.

June 7th
Mixed Models for Hierarchical Data

A multilevel example
Fixed effects models for hierarchical data
Visualizing multiple fits: data space versus beta space
Random effects
Hierarchical models
Formulating hierarchical models as mixed models
Anatomy of mixed models: from the simplest to the largest
Estimation of fixed effects, inference for fixed effects
Formulating research questions as linear hypotheses
Prediction of random effects
Variance of random effects, estimation and inference

June 8th
Mixed Models for Longitudinal Data

Modeling dependency in time
"G side" versus "R side" variance models: overlapping and distinct functions
Modeling interesting functions of time: polynomial, periodic, discontinuous, flexible splines
Using mixed models for non-parametric splines
Assumptions and diagnostics for mixed models

June 9th
Further Topics

Interpreting variance of random effects
Model selection, REML versus ML
Contextual variables: contextual effects versus compositional effects
Disentangling age-period-cohort effects
Causal inference with longitudinal data, generalized propensity score
Dealing with missing data

June 10th
Extensions of Mixed Models

Non-linear mixed models: applications, asymptotic functions of time
Non-normal responses: generalized linear mixed models for dichotomous and count responses
Review and examples

top

June 2nd Introduction to R: Workshop

Linear and Generalized Linear Models

by John Fox

June 3rd Review of Linear Models

June 4th Logit and Probit Models for Categorical Response Variables

June 5th Generalized Linear Models (GLMs)