June 2nd
Introduction to R: Workshop
The statistical programming language and computing environment R is a highly programmable and extensible environment for statistical analysis. Students will learn the essentials of importing data, data management and statistical modeling. Programming in R will be introduced, as will the impressive graphing capabilities of the software. R is an easily acquired free opensource program supported on Windows, UNIX/Linux and Macintosh operating systems. The workshop assumes no previous experience with R.
Linear and Generalized Linear Models
by John Fox
Linear and generalizedlinear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixedeffects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and loglinear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The course will also cover the implementation of linear and generalized linear models in R. The course assumes that participants have previously been exposed to applied linear regression analysis. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.
June 3rd
Review of Linear Models
 Multiple regression analysis
 Dummyvariable regression
 Principle of marginality and "effect displays" for complex models with interactions
 Regression "diagnostics" for unusual data, nonconstant error variance, and nonlinearity; how to correct these problems
June 4th
Logit and Probit Models for Categorical Response Variables
 Why the linear model should not be applied directly to categorical data
 Logit and probit models for dichotomous (twocategory) responses
 Logit models for unordered and ordered polytomous (manycategory) responses
 Effect displays for logit models
June 5th
Generalized Linear Models (GLMs)
 The structure of GLMs: linear predictors, distributional families, and link functions.
 How the familiar linear, logit, and probit models fit into the GLM framework.
 Poisson regression models for count data; handling overdispersed count data.
 Time permitting: Loglinear models for contingency tables.
 Diagnostics for GLMs.
Linear and Nonlinear Multilevel Models
by Georges Monette
Mixed models are useful for a wide range of data structures and research questions. They can be used for the analysis of nested multilevel (hierarchical) data such as students nested in classes nested in schools, where research questions may involve relationships among student level variables or between student level and school level variables. Mixed models can also be used for the analysis of longitudinal data where measurements are obtained on a number of occasions on each of a number of subjects. They can also be used to fit flexible functions of time using nonparametric splines. Mixed models can handle unbalanced data (measurements at different times and timevarying predictor variables) and missing data, under some assumptions. All these features make them a very interesting method for the analysis of growth curves in accelerated longitudinal designs, such as that used by the Canadian National Population Health Survey (NPHS).
We will primarily use two R packages for mixed models, nlme and lme4. Both are needed since each package has important applications that cannot be done by the other. We will learn how to use R to fit mixed models, produce exploratory, diagnostic and presentation graphs of data. We will also see many examples of specialized functions written in R for the analysis of mixed models.
Prospective participants are encouraged to explore http://cran.rproject.org and its facilities for mixed models.
June 7th
Mixed Models for Hierarchical Data
 A multilevel example
 Fixed effects models for hierarchical data
 Visualizing multiple fits: data space versus beta space
 Random effects
 Hierarchical models
 Formulating hierarchical models as mixed models
 Anatomy of mixed models: from the simplest to the largest
 Estimation of fixed effects, inference for fixed effects
 Formulating research questions as linear hypotheses
 Prediction of random effects
 Variance of random effects, estimation and inference
June 8th
Mixed Models for Longitudinal Data
 Modeling dependency in time
 "G side" versus "R side" variance models: overlapping and distinct functions
 Modeling interesting functions of time: polynomial, periodic, discontinuous, flexible splines
 Using mixed models for nonparametric splines
 Assumptions and diagnostics for mixed models
June 9th
Further Topics
 Interpreting variance of random effects
 Model selection, REML versus ML
 Contextual variables: contextual effects versus compositional effects
 Disentangling ageperiodcohort effects
 Causal inference with longitudinal data, generalized propensity score
 Dealing with missing data
June 10th
Extensions of Mixed Models
 Nonlinear mixed models: applications, asymptotic functions of time
 Nonnormal responses: generalized linear mixed models for dichotomous and count responses
 Review and examples
