| |

| |

Preface | |

| |

| |

A Chapter by Chapter Summary | |

| |

| |

| |

A Brief Introduction to R | |

| |

| |

| |

A Short R Session | |

| |

| |

| |

R must be installed! | |

| |

| |

| |

Using the console (or command line) window | |

| |

| |

| |

Reading data from a file | |

| |

| |

| |

Entry of data at the command line | |

| |

| |

| |

Online help | |

| |

| |

| |

Quitting R | |

| |

| |

| |

The Uses of R | |

| |

| |

| |

The R Language | |

| |

| |

| |

R objects | |

| |

| |

| |

Retaining objects between sessions | |

| |

| |

| |

Vectors in R | |

| |

| |

| |

Concatenation--joining vector objects | |

| |

| |

| |

Subsets of vectors | |

| |

| |

| |

Patterned data | |

| |

| |

| |

Missing values | |

| |

| |

| |

Factors | |

| |

| |

| |

Data Frames | |

| |

| |

| |

Variable names | |

| |

| |

| |

Applying a function to the columns of a data frame | |

| |

| |

| |

Data frames and matrices | |

| |

| |

| |

Identification of rows that include missing values | |

| |

| |

| |

R Packages | |

| |

| |

| |

Data sets that accompany R packages | |

| |

| |

| |

Looping | |

| |

| |

| |

R Graphics | |

| |

| |

| |

The function plot () and allied functions | |

| |

| |

| |

Identification and location on the figure region | |

| |

| |

| |

Plotting mathematical symbols | |

| |

| |

| |

Row by column layouts of plots | |

| |

| |

| |

Graphs--additional notes | |

| |

| |

| |

Additional Points on the Use of R in This Book | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Styles of Data Analysis | |

| |

| |

| |

Revealing Views of the Data | |

| |

| |

| |

Views of a single sample | |

| |

| |

| |

Patterns in grouped data | |

| |

| |

| |

Patterns in bivariate data--the scatterplot | |

| |

| |

| |

Multiple variables and times | |

| |

| |

| |

Lattice (trellis style) graphics | |

| |

| |

| |

What to look for in plots | |

| |

| |

| |

Data Summary | |

| |

| |

| |

Mean and median | |

| |

| |

| |

Standard deviation and inter-quartile range | |

| |

| |

| |

Correlation | |

| |

| |

| |

Statistical Analysis Strategies | |

| |

| |

| |

Helpful and unhelpful questions | |

| |

| |

| |

Planning the formal analysis | |

| |

| |

| |

Changes to the intended plan of analysis | |

| |

| |

| |

Recap | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Statistical Models | |

| |

| |

| |

Regularities | |

| |

| |

| |

Mathematical models | |

| |

| |

| |

Models that include a random component | |

| |

| |

| |

Smooth and rough | |

| |

| |

| |

The construction and use of models | |

| |

| |

| |

Model formulae | |

| |

| |

| |

Distributions: Models for the Random Component | |

| |

| |

| |

Discrete distributions | |

| |

| |

| |

Continuous distributions | |

| |

| |

| |

The Uses of Random Numbers | |

| |

| |

| |

Simulation | |

| |

| |

| |

Sampling from populations | |

| |

| |

| |

Model Assumptions | |

| |

| |

| |

Random sampling assumptions--independence | |

| |

| |

| |

Checks for normality | |

| |

| |

| |

Checking other model assumptions | |

| |

| |

| |

Are non-parametric methods the answer? | |

| |

| |

| |

Why models matter--adding across contingency tables | |

| |

| |

| |

Recap | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

An Introduction to Formal Inference | |

| |

| |

| |

Standard Errors | |

| |

| |

| |

Population parameters and sample statistics | |

| |

| |

| |

Assessing accuracy--the standard error | |

| |

| |

| |

Standard errors for differences of means | |

| |

| |

| |

The standard error of the median | |

| |

| |

| |

Resampling to estimate standard errors: bootstrapping | |

| |

| |

| |

Calculations Involving Standard Errors: the t-Distribution | |

| |

| |

| |

Confidence Intervals and Hypothesis Tests | |

| |

| |

| |

One- and two-sample intervals and tests for means | |

| |

| |

| |

Confidence intervals and tests for proportions | |

| |

| |

| |

Confidence intervals for the correlation | |

| |

| |

| |

Contingency Tables | |

| |

| |

| |

Rare and endangered plant species | |

| |

| |

| |

Additional notes | |

| |

| |

| |

One-Way Unstructured Comparisons | |

| |

| |

| |

Displaying means for the one-way layout | |

| |

| |

| |

Multiple comparisons | |

| |

| |

| |

Data with a two-way structure | |

| |

| |

| |

Presentation issues | |

| |

| |

| |

Response Curves | |

| |

| |

| |

Data with a Nested Variation Structure | |

| |

| |

| |

Degrees of freedom considerations | |

| |

| |

| |

General multi-way analysis of variance designs | |

| |

| |

| |

Resampling Methods for Tests and Confidence Intervals | |

| |

| |

| |

The one-sample permutation test | |

| |

| |

| |

The two-sample permutation test | |

| |

| |

| |

Bootstrap estimates of confidence intervals | |

| |

| |

| |

Further Comments on Formal Inference | |

| |

| |

| |

Confidence intervals versus hypothesis tests | |

| |

| |

| |

If there is strong prior information, use it! | |

| |

| |

| |

Recap | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Regression with a Single Predictor | |

| |

| |

| |

Fitting a Line to Data | |

| |

| |

| |

Lawn roller example | |

| |

| |

| |

Calculating fitted values and residuals | |

| |

| |

| |

Residual plots | |

| |

| |

| |

The analysis of variance table | |

| |

| |

| |

Outliers, Influence and Robust Regression | |

| |

| |

| |

Standard Errors and Confidence Intervals | |

| |

| |

| |

Confidence intervals and tests for the slope | |

| |

| |

| |

SEs and confidence intervals for predicted values | |

| |

| |

| |

Implications for design | |

| |

| |

| |

Regression versus Qualitative ANOVA Comparisons | |

| |

| |

| |

Assessing Predictive Accuracy | |

| |

| |

| |

Training/test sets, and cross-validation | |

| |

| |

| |

Cross-validation--an example | |

| |

| |

| |

Bootstrapping | |

| |

| |

| |

A Note on Power Transformations | |

| |

| |

| |

Size and Shape Data | |

| |

| |

| |

Allometric growth | |

| |

| |

| |

There are two regression lines! | |

| |

| |

| |

The Model Matrix in Regression | |

| |

| |

| |

Recap | |

| |

| |

| |

Methodological References | |

| |

| |

| |

Exercises | |

| |

| |

| |

Multiple Linear Regression | |

| |

| |

| |

Basic Ideas: Book Weight and Brain Weight Examples | |

| |

| |

| |

Omission of the intercept term | |

| |

| |

| |

Diagnostic plots | |

| |

| |

| |

Further investigation of influential points | |

| |

| |

| |

Example: brain weight | |

| |

| |

| |

Multiple Regression Assumptions and Diagnostics | |

| |

| |

| |

Influential outliers and Cook's distance | |

| |

| |

| |

Component plus residual plots | |

| |

| |

| |

Further types of diagnostic plot | |

| |

| |

| |

Robust and resistant methods | |

| |

| |

| |

A Strategy for Fitting Multiple Regression Models | |

| |

| |

| |

Preliminaries | |

| |

| |

| |

Model fitting | |

| |

| |

| |

An example--the Scottish hill race data | |

| |

| |

| |

Measures for the Comparison of Regression Models | |

| |

| |

| |

R[superscript 2] and adjusted R[superscript 2] | |

| |

| |

| |

AIC and related statistics | |

| |

| |

| |

How accurately does the equation predict? | |

| |

| |

| |

An external assessment of predictive accuracy | |

| |

| |

| |

Interpreting Regression Coefficients--the Labor Training Data | |

| |

| |

| |

Problems with Many Explanatory Variables | |

| |

| |

| |

Variable selection issues | |

| |

| |

| |

Principal components summaries | |

| |

| |

| |

Multicollinearity | |

| |

| |

| |

A contrived example | |

| |

| |

| |

The variance inflation factor (VIF) | |

| |

| |

| |

Remedying multicollinearity | |

| |

| |

| |

Multiple Regression Models--Additional Points | |

| |

| |

| |

Confusion between explanatory and dependent variables | |

| |

| |

| |

Missing explanatory variables | |

| |

| |

| |

The use of transformations | |

| |

| |

| |

Non-linear methods--an alternative to transformation? | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Exploiting the Linear Model Framework | |

| |

| |

| |

Levels of a Factor--Using Indicator Variables | |

| |

| |

| |

Example--sugar weight | |

| |

| |

| |

Different choices for the model matrix when there are factors | |

| |

| |

| |

Polynomial Regression | |

| |

| |

| |

Issues in the choice of model | |

| |

| |

| |

Fitting Multiple Lines | |

| |

| |

| |

Methods for Passing Smooth Curves through Data | |

| |

| |

| |

Scatterplot smoothing--regression splines | |

| |

| |

| |

Other smoothing methods | |

| |

| |

| |

Generalized additive models | |

| |

| |

| |

Smoothing Terms in Multiple Linear Models | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Logistic Regression and Other Generalized Linear Models | |

| |

| |

| |

Generalized Linear Models | |

| |

| |

| |

Transformation of the expected value on the left | |

| |

| |

| |

Noise terms need not be normal | |

| |

| |

| |

Log odds in contingency tables | |

| |

| |

| |

Logistic regression with a continuous explanatory variable | |

| |

| |

| |

Logistic Multiple Regression | |

| |

| |

| |

A plot of contributions of explanatory variables | |

| |

| |

| |

Cross-validation estimates of predictive accuracy | |

| |

| |

| |

Logistic Models for Categorical Data--an Example | |

| |

| |

| |

Poisson and Quasi-Poisson Regression | |

| |

| |

| |

Data on aberrant crypt foci | |

| |

| |

| |

Moth habitat example | |

| |

| |

| |

Residuals, and estimating the dispersion | |

| |

| |

| |

Ordinal Regression Models | |

| |

| |

| |

Exploratory analysis | |

| |

| |

| |

Proportional odds logistic regression | |

| |

| |

| |

Other Related Models | |

| |

| |

| |

Loglinear models | |

| |

| |

| |

Survival analysis | |

| |

| |

| |

Transformations for Count Data | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Multi-level Models, Time Series and Repeated Measures | |

| |

| |

| |

Introduction | |

| |

| |

| |

Example--Survey Data, with Clustering | |

| |

| |

| |

Alternative models | |

| |

| |

| |

Instructive, though faulty, analyses | |

| |

| |

| |

Predictive accuracy | |

| |

| |

| |

A Multi-level Experimental Design | |

| |

| |

| |

The ANOVA table | |

| |

| |

| |

Expected values of mean squares | |

| |

| |

| |

The sums of squares breakdown | |

| |

| |

| |

The variance components | |

| |

| |

| |

The mixed model analysis | |

| |

| |

| |

Predictive accuracy | |

| |

| |

| |

Different sources of variance--complication or focus of interest? | |

| |

| |

| |

Within and between Subject Effects--an Example | |

| |

| |

| |

Time Series--Some Basic Ideas | |

| |

| |

| |

Preliminary graphical explorations | |

| |

| |

| |

The autocorrelation function | |

| |

| |

| |

Autoregressive (AR) models | |

| |

| |

| |

Autoregressive moving average (ARMA) models--theory | |

| |

| |

| |

Regression Modeling with Moving Average Errors--an Example | |

| |

| |

| |

Repeated Measures in Time--Notes on the Methodology | |

| |

| |

| |

The theory of repeated measures modeling | |

| |

| |

| |

Correlation structure | |

| |

| |

| |

Different approaches to repeated measures analysis | |

| |

| |

| |

Further Notes on Multi-level Modeling | |

| |

| |

| |

An historical perspective on multi-level models | |

| |

| |

| |

Meta-analysis | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Tree-based Classification and Regression | |

| |

| |

| |

The Uses of Tree-based Methods | |

| |

| |

| |

Problems for which tree-based regression may be used | |

| |

| |

| |

Tree-based regression versus parametric approaches | |

| |

| |

| |

Summary of pluses and minuses | |

| |

| |

| |

Detecting Email Spam--an Example | |

| |

| |

| |

Choosing the number of splits | |

| |

| |

| |

Terminology and Methodology | |

| |

| |

| |

Choosing the split--regression trees | |

| |

| |

| |

Within and between sums of squares | |

| |

| |

| |

Choosing the split--classification trees | |

| |

| |

| |

The mechanics of tree-based regression--a trivial example | |

| |

| |

| |

Assessments of Predictive Accuracy | |

| |

| |

| |

Cross-validation | |

| |

| |

| |

The training/test set methodology | |

| |

| |

| |

Predicting the future | |

| |

| |

| |

A Strategy for Choosing the Optimal Tree | |

| |

| |

| |

Cost-complexity pruning | |

| |

| |

| |

Prediction error versus tree size | |

| |

| |

| |

Detecting Email Spam--the Optimal Tree | |

| |

| |

| |

The one-standard-deviation rule | |

| |

| |

| |

Interpretation and Presentation of the rpart Output | |

| |

| |

| |

Data for female heart attack patients | |

| |

| |

| |

Printed Information on Each Split | |

| |

| |

| |

Additional Notes | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

Multivariate Data Exploration and Discrimination | |

| |

| |

| |

Multivariate Exploratory Data Analysis | |

| |

| |

| |

Scatterplot matrices | |

| |

| |

| |

Principal components analysis | |

| |

| |

| |

Discriminant Analysis | |

| |

| |

| |

Example--plant architecture | |

| |

| |

| |

Classical Fisherian discriminant analysis | |

| |

| |

| |

Logistic discriminant analysis | |

| |

| |

| |

An example with more than two groups | |

| |

| |

| |

Principal Component Scores in Regression | |

| |

| |

| |

Propensity Scores in Regression Comparisons--Labor Training Data | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

| |

The R System--Additional Topics | |

| |

| |

| |

Graphs in R | |

| |

| |

| |

Functions--Some Further Details | |

| |

| |

| |

Common useful functions | |

| |

| |

| |

User-written R functions | |

| |

| |

| |

Functions for working with dates | |

| |

| |

| |

Data input and output | |

| |

| |

| |

Input | |

| |

| |

| |

Data output | |

| |

| |

| |

Factors--Additional Comments | |

| |

| |

| |

Missing Values | |

| |

| |

| |

Lists and Data Frames | |

| |

| |

| |

Data frames as lists | |

| |

| |

| |

Reshaping data frames; reshape () | |

| |

| |

| |

Joining data frames and vectors--cbind () | |

| |

| |

| |

Conversion of tables and arrays into data frames | |

| |

| |

| |

Merging data frames--merge () | |

| |

| |

| |

The function sapply () and related functions | |

| |

| |

| |

Splitting vectors and data frames into lists--split () | |

| |

| |

| |

Matrices and Arrays | |

| |

| |

| |

Outer products | |

| |

| |

| |

Arrays | |

| |

| |

| |

Classes and Methods | |

| |

| |

| |

Printing and summarizing model objects | |

| |

| |

| |

Extracting information from model objects | |

| |

| |

| |

Data-bases and Environments | |

| |

| |

| |

Workspace management | |

| |

| |

| |

Function environments, and lazy evaluation | |

| |

| |

| |

Manipulation of Language Constructs | |

| |

| |

| |

Further Reading | |

| |

| |

| |

Exercises | |

| |

| |

Epilogue--Models | |

| |

| |

| |

S-PLUS Differences | |

| |

| |

References | |

| |

| |

Index of R Symbols and Functions | |

| |

| |

Index of Terms | |

| |

| |

Index of Names | |