Introduction to R: Workshop
The statistical programming language and computing environment R is a highly programmable and extensible environment for statistical analysis. Students will learn the essentials of importing data, data management and statistical modeling. Programming in R will be introduced, as will the impressive graphing capabilities of the software. R is an easily acquired free open-source program supported on Windows, UNIX/Linux and Macintosh operating systems. The workshop assumes no previous experience with R.
Linear and Generalized Linear Models
by John Fox
Linear and generalized-linear models encompass the most commonly used statistical methods in social research, and provide the basis for most other statistical methods, such as the mixed-effects models taken up in the second part of this year's SPIDA. This workshop provides a review of applied linear regression analysis ("linear models"), followed by an introduction to generalized linear models, with an emphasis on models for categorical response variables (e.g., logit models) and for count data (e.g., Poisson regression and log-linear models). Particular attention will be paid to "diagnostic" methods for determining whether a fitted linear or generalized model adequately represents the data, and to the interpretation of results, including methods for visualizing models. The course will also cover the implementation of linear and generalized linear models in R. The course assumes that participants have previously been exposed to applied linear regression analysis. Each day there will be a lab session in which participants have an opportunity to apply the material covered in that day's lecture, using R to conduct analyses of datasets provided by the instructor.
Review of Linear Models
- Multiple regression analysis
- Dummy-variable regression
- Principle of marginality and "effect displays" for complex models with interactions
- Regression "diagnostics" for unusual data, non-constant error variance, and nonlinearity; how to correct these problems
Logit and Probit Models for Categorical Response Variables
- Why the linear model should not be applied directly to categorical data
- Logit and probit models for dichotomous (two-category) responses
- Logit models for unordered and ordered polytomous (many-category) responses
- Effect displays for logit models
Generalized Linear Models (GLMs)
- The structure of GLMs: linear predictors, distributional families, and link functions.
- How the familiar linear, logit, and probit models fit into the GLM framework.
- Poisson regression models for count data; handling over-dispersed count data.
- Time permitting: Loglinear models for contingency tables.
- Diagnostics for GLMs.
Linear and Non-linear Multilevel Models
by Georges Monette
Mixed models are useful for a wide range of data structures and research questions. They can be used for the analysis of nested multilevel (hierarchical) data such as students nested in classes nested in schools, where research questions may involve relationships among student level variables or between student level and school level variables. Mixed models can also be used for the analysis of longitudinal data where measurements are obtained on a number of occasions on each of a number of subjects. They can also be used to fit flexible functions of time using non-parametric splines. Mixed models can handle unbalanced data (measurements at different times and time-varying predictor variables) and missing data, under some assumptions. All these features make them a very interesting method for the analysis of growth curves in accelerated longitudinal designs, such as that used by the Canadian National Population Health Survey (NPHS).
We will primarily use two R packages for mixed models, nlme and lme4. Both are needed since each package has important applications that cannot be done by the other. We will learn how to use R to fit mixed models, produce exploratory, diagnostic and presentation graphs of data. We will also see many examples of specialized functions written in R for the analysis of mixed models.
Prospective participants are encouraged to explore http://cran.r-project.org and its facilities for mixed models.
Mixed Models for Hierarchical Data
- A multilevel example
- Fixed effects models for hierarchical data
- Visualizing multiple fits: data space versus beta space
- Random effects
- Hierarchical models
- Formulating hierarchical models as mixed models
- Anatomy of mixed models: from the simplest to the largest
- Estimation of fixed effects, inference for fixed effects
- Formulating research questions as linear hypotheses
- Prediction of random effects
- Variance of random effects, estimation and inference
Mixed Models for Longitudinal Data
- Modeling dependency in time
- "G side" versus "R side" variance models: overlapping and distinct functions
- Modeling interesting functions of time: polynomial, periodic, discontinuous, flexible splines
- Using mixed models for non-parametric splines
- Assumptions and diagnostics for mixed models
- Interpreting variance of random effects
- Model selection, REML versus ML
- Contextual variables: contextual effects versus compositional effects
- Disentangling age-period-cohort effects
- Causal inference with longitudinal data, generalized propensity score
- Dealing with missing data
Extensions of Mixed Models
- Non-linear mixed models: applications, asymptotic functions of time
- Non-normal responses: generalized linear mixed models for dichotomous and count responses
- Review and examples