| |
| |
Preface | |
| |
| |
A Chapter by Chapter Summary | |
| |
| |
| |
A Brief Introduction to R | |
| |
| |
| |
A Short R Session | |
| |
| |
| |
R must be installed! | |
| |
| |
| |
Using the console (or command line) window | |
| |
| |
| |
Reading data from a file | |
| |
| |
| |
Entry of data at the command line | |
| |
| |
| |
Online help | |
| |
| |
| |
Quitting R | |
| |
| |
| |
The Uses of R | |
| |
| |
| |
The R Language | |
| |
| |
| |
R objects | |
| |
| |
| |
Retaining objects between sessions | |
| |
| |
| |
Vectors in R | |
| |
| |
| |
Concatenation--joining vector objects | |
| |
| |
| |
Subsets of vectors | |
| |
| |
| |
Patterned data | |
| |
| |
| |
Missing values | |
| |
| |
| |
Factors | |
| |
| |
| |
Data Frames | |
| |
| |
| |
Variable names | |
| |
| |
| |
Applying a function to the columns of a data frame | |
| |
| |
| |
Data frames and matrices | |
| |
| |
| |
Identification of rows that include missing values | |
| |
| |
| |
R Packages | |
| |
| |
| |
Data sets that accompany R packages | |
| |
| |
| |
Looping | |
| |
| |
| |
R Graphics | |
| |
| |
| |
The function plot () and allied functions | |
| |
| |
| |
Identification and location on the figure region | |
| |
| |
| |
Plotting mathematical symbols | |
| |
| |
| |
Row by column layouts of plots | |
| |
| |
| |
Graphs--additional notes | |
| |
| |
| |
Additional Points on the Use of R in This Book | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Styles of Data Analysis | |
| |
| |
| |
Revealing Views of the Data | |
| |
| |
| |
Views of a single sample | |
| |
| |
| |
Patterns in grouped data | |
| |
| |
| |
Patterns in bivariate data--the scatterplot | |
| |
| |
| |
Multiple variables and times | |
| |
| |
| |
Lattice (trellis style) graphics | |
| |
| |
| |
What to look for in plots | |
| |
| |
| |
Data Summary | |
| |
| |
| |
Mean and median | |
| |
| |
| |
Standard deviation and inter-quartile range | |
| |
| |
| |
Correlation | |
| |
| |
| |
Statistical Analysis Strategies | |
| |
| |
| |
Helpful and unhelpful questions | |
| |
| |
| |
Planning the formal analysis | |
| |
| |
| |
Changes to the intended plan of analysis | |
| |
| |
| |
Recap | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Statistical Models | |
| |
| |
| |
Regularities | |
| |
| |
| |
Mathematical models | |
| |
| |
| |
Models that include a random component | |
| |
| |
| |
Smooth and rough | |
| |
| |
| |
The construction and use of models | |
| |
| |
| |
Model formulae | |
| |
| |
| |
Distributions: Models for the Random Component | |
| |
| |
| |
Discrete distributions | |
| |
| |
| |
Continuous distributions | |
| |
| |
| |
The Uses of Random Numbers | |
| |
| |
| |
Simulation | |
| |
| |
| |
Sampling from populations | |
| |
| |
| |
Model Assumptions | |
| |
| |
| |
Random sampling assumptions--independence | |
| |
| |
| |
Checks for normality | |
| |
| |
| |
Checking other model assumptions | |
| |
| |
| |
Are non-parametric methods the answer? | |
| |
| |
| |
Why models matter--adding across contingency tables | |
| |
| |
| |
Recap | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
An Introduction to Formal Inference | |
| |
| |
| |
Standard Errors | |
| |
| |
| |
Population parameters and sample statistics | |
| |
| |
| |
Assessing accuracy--the standard error | |
| |
| |
| |
Standard errors for differences of means | |
| |
| |
| |
The standard error of the median | |
| |
| |
| |
Resampling to estimate standard errors: bootstrapping | |
| |
| |
| |
Calculations Involving Standard Errors: the t-Distribution | |
| |
| |
| |
Confidence Intervals and Hypothesis Tests | |
| |
| |
| |
One- and two-sample intervals and tests for means | |
| |
| |
| |
Confidence intervals and tests for proportions | |
| |
| |
| |
Confidence intervals for the correlation | |
| |
| |
| |
Contingency Tables | |
| |
| |
| |
Rare and endangered plant species | |
| |
| |
| |
Additional notes | |
| |
| |
| |
One-Way Unstructured Comparisons | |
| |
| |
| |
Displaying means for the one-way layout | |
| |
| |
| |
Multiple comparisons | |
| |
| |
| |
Data with a two-way structure | |
| |
| |
| |
Presentation issues | |
| |
| |
| |
Response Curves | |
| |
| |
| |
Data with a Nested Variation Structure | |
| |
| |
| |
Degrees of freedom considerations | |
| |
| |
| |
General multi-way analysis of variance designs | |
| |
| |
| |
Resampling Methods for Tests and Confidence Intervals | |
| |
| |
| |
The one-sample permutation test | |
| |
| |
| |
The two-sample permutation test | |
| |
| |
| |
Bootstrap estimates of confidence intervals | |
| |
| |
| |
Further Comments on Formal Inference | |
| |
| |
| |
Confidence intervals versus hypothesis tests | |
| |
| |
| |
If there is strong prior information, use it! | |
| |
| |
| |
Recap | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Regression with a Single Predictor | |
| |
| |
| |
Fitting a Line to Data | |
| |
| |
| |
Lawn roller example | |
| |
| |
| |
Calculating fitted values and residuals | |
| |
| |
| |
Residual plots | |
| |
| |
| |
The analysis of variance table | |
| |
| |
| |
Outliers, Influence and Robust Regression | |
| |
| |
| |
Standard Errors and Confidence Intervals | |
| |
| |
| |
Confidence intervals and tests for the slope | |
| |
| |
| |
SEs and confidence intervals for predicted values | |
| |
| |
| |
Implications for design | |
| |
| |
| |
Regression versus Qualitative ANOVA Comparisons | |
| |
| |
| |
Assessing Predictive Accuracy | |
| |
| |
| |
Training/test sets, and cross-validation | |
| |
| |
| |
Cross-validation--an example | |
| |
| |
| |
Bootstrapping | |
| |
| |
| |
A Note on Power Transformations | |
| |
| |
| |
Size and Shape Data | |
| |
| |
| |
Allometric growth | |
| |
| |
| |
There are two regression lines! | |
| |
| |
| |
The Model Matrix in Regression | |
| |
| |
| |
Recap | |
| |
| |
| |
Methodological References | |
| |
| |
| |
Exercises | |
| |
| |
| |
Multiple Linear Regression | |
| |
| |
| |
Basic Ideas: Book Weight and Brain Weight Examples | |
| |
| |
| |
Omission of the intercept term | |
| |
| |
| |
Diagnostic plots | |
| |
| |
| |
Further investigation of influential points | |
| |
| |
| |
Example: brain weight | |
| |
| |
| |
Multiple Regression Assumptions and Diagnostics | |
| |
| |
| |
Influential outliers and Cook's distance | |
| |
| |
| |
Component plus residual plots | |
| |
| |
| |
Further types of diagnostic plot | |
| |
| |
| |
Robust and resistant methods | |
| |
| |
| |
A Strategy for Fitting Multiple Regression Models | |
| |
| |
| |
Preliminaries | |
| |
| |
| |
Model fitting | |
| |
| |
| |
An example--the Scottish hill race data | |
| |
| |
| |
Measures for the Comparison of Regression Models | |
| |
| |
| |
R[superscript 2] and adjusted R[superscript 2] | |
| |
| |
| |
AIC and related statistics | |
| |
| |
| |
How accurately does the equation predict? | |
| |
| |
| |
An external assessment of predictive accuracy | |
| |
| |
| |
Interpreting Regression Coefficients--the Labor Training Data | |
| |
| |
| |
Problems with Many Explanatory Variables | |
| |
| |
| |
Variable selection issues | |
| |
| |
| |
Principal components summaries | |
| |
| |
| |
Multicollinearity | |
| |
| |
| |
A contrived example | |
| |
| |
| |
The variance inflation factor (VIF) | |
| |
| |
| |
Remedying multicollinearity | |
| |
| |
| |
Multiple Regression Models--Additional Points | |
| |
| |
| |
Confusion between explanatory and dependent variables | |
| |
| |
| |
Missing explanatory variables | |
| |
| |
| |
The use of transformations | |
| |
| |
| |
Non-linear methods--an alternative to transformation? | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Exploiting the Linear Model Framework | |
| |
| |
| |
Levels of a Factor--Using Indicator Variables | |
| |
| |
| |
Example--sugar weight | |
| |
| |
| |
Different choices for the model matrix when there are factors | |
| |
| |
| |
Polynomial Regression | |
| |
| |
| |
Issues in the choice of model | |
| |
| |
| |
Fitting Multiple Lines | |
| |
| |
| |
Methods for Passing Smooth Curves through Data | |
| |
| |
| |
Scatterplot smoothing--regression splines | |
| |
| |
| |
Other smoothing methods | |
| |
| |
| |
Generalized additive models | |
| |
| |
| |
Smoothing Terms in Multiple Linear Models | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Logistic Regression and Other Generalized Linear Models | |
| |
| |
| |
Generalized Linear Models | |
| |
| |
| |
Transformation of the expected value on the left | |
| |
| |
| |
Noise terms need not be normal | |
| |
| |
| |
Log odds in contingency tables | |
| |
| |
| |
Logistic regression with a continuous explanatory variable | |
| |
| |
| |
Logistic Multiple Regression | |
| |
| |
| |
A plot of contributions of explanatory variables | |
| |
| |
| |
Cross-validation estimates of predictive accuracy | |
| |
| |
| |
Logistic Models for Categorical Data--an Example | |
| |
| |
| |
Poisson and Quasi-Poisson Regression | |
| |
| |
| |
Data on aberrant crypt foci | |
| |
| |
| |
Moth habitat example | |
| |
| |
| |
Residuals, and estimating the dispersion | |
| |
| |
| |
Ordinal Regression Models | |
| |
| |
| |
Exploratory analysis | |
| |
| |
| |
Proportional odds logistic regression | |
| |
| |
| |
Other Related Models | |
| |
| |
| |
Loglinear models | |
| |
| |
| |
Survival analysis | |
| |
| |
| |
Transformations for Count Data | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Multi-level Models, Time Series and Repeated Measures | |
| |
| |
| |
Introduction | |
| |
| |
| |
Example--Survey Data, with Clustering | |
| |
| |
| |
Alternative models | |
| |
| |
| |
Instructive, though faulty, analyses | |
| |
| |
| |
Predictive accuracy | |
| |
| |
| |
A Multi-level Experimental Design | |
| |
| |
| |
The ANOVA table | |
| |
| |
| |
Expected values of mean squares | |
| |
| |
| |
The sums of squares breakdown | |
| |
| |
| |
The variance components | |
| |
| |
| |
The mixed model analysis | |
| |
| |
| |
Predictive accuracy | |
| |
| |
| |
Different sources of variance--complication or focus of interest? | |
| |
| |
| |
Within and between Subject Effects--an Example | |
| |
| |
| |
Time Series--Some Basic Ideas | |
| |
| |
| |
Preliminary graphical explorations | |
| |
| |
| |
The autocorrelation function | |
| |
| |
| |
Autoregressive (AR) models | |
| |
| |
| |
Autoregressive moving average (ARMA) models--theory | |
| |
| |
| |
Regression Modeling with Moving Average Errors--an Example | |
| |
| |
| |
Repeated Measures in Time--Notes on the Methodology | |
| |
| |
| |
The theory of repeated measures modeling | |
| |
| |
| |
Correlation structure | |
| |
| |
| |
Different approaches to repeated measures analysis | |
| |
| |
| |
Further Notes on Multi-level Modeling | |
| |
| |
| |
An historical perspective on multi-level models | |
| |
| |
| |
Meta-analysis | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Tree-based Classification and Regression | |
| |
| |
| |
The Uses of Tree-based Methods | |
| |
| |
| |
Problems for which tree-based regression may be used | |
| |
| |
| |
Tree-based regression versus parametric approaches | |
| |
| |
| |
Summary of pluses and minuses | |
| |
| |
| |
Detecting Email Spam--an Example | |
| |
| |
| |
Choosing the number of splits | |
| |
| |
| |
Terminology and Methodology | |
| |
| |
| |
Choosing the split--regression trees | |
| |
| |
| |
Within and between sums of squares | |
| |
| |
| |
Choosing the split--classification trees | |
| |
| |
| |
The mechanics of tree-based regression--a trivial example | |
| |
| |
| |
Assessments of Predictive Accuracy | |
| |
| |
| |
Cross-validation | |
| |
| |
| |
The training/test set methodology | |
| |
| |
| |
Predicting the future | |
| |
| |
| |
A Strategy for Choosing the Optimal Tree | |
| |
| |
| |
Cost-complexity pruning | |
| |
| |
| |
Prediction error versus tree size | |
| |
| |
| |
Detecting Email Spam--the Optimal Tree | |
| |
| |
| |
The one-standard-deviation rule | |
| |
| |
| |
Interpretation and Presentation of the rpart Output | |
| |
| |
| |
Data for female heart attack patients | |
| |
| |
| |
Printed Information on Each Split | |
| |
| |
| |
Additional Notes | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
Multivariate Data Exploration and Discrimination | |
| |
| |
| |
Multivariate Exploratory Data Analysis | |
| |
| |
| |
Scatterplot matrices | |
| |
| |
| |
Principal components analysis | |
| |
| |
| |
Discriminant Analysis | |
| |
| |
| |
Example--plant architecture | |
| |
| |
| |
Classical Fisherian discriminant analysis | |
| |
| |
| |
Logistic discriminant analysis | |
| |
| |
| |
An example with more than two groups | |
| |
| |
| |
Principal Component Scores in Regression | |
| |
| |
| |
Propensity Scores in Regression Comparisons--Labor Training Data | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
| |
The R System--Additional Topics | |
| |
| |
| |
Graphs in R | |
| |
| |
| |
Functions--Some Further Details | |
| |
| |
| |
Common useful functions | |
| |
| |
| |
User-written R functions | |
| |
| |
| |
Functions for working with dates | |
| |
| |
| |
Data input and output | |
| |
| |
| |
Input | |
| |
| |
| |
Data output | |
| |
| |
| |
Factors--Additional Comments | |
| |
| |
| |
Missing Values | |
| |
| |
| |
Lists and Data Frames | |
| |
| |
| |
Data frames as lists | |
| |
| |
| |
Reshaping data frames; reshape () | |
| |
| |
| |
Joining data frames and vectors--cbind () | |
| |
| |
| |
Conversion of tables and arrays into data frames | |
| |
| |
| |
Merging data frames--merge () | |
| |
| |
| |
The function sapply () and related functions | |
| |
| |
| |
Splitting vectors and data frames into lists--split () | |
| |
| |
| |
Matrices and Arrays | |
| |
| |
| |
Outer products | |
| |
| |
| |
Arrays | |
| |
| |
| |
Classes and Methods | |
| |
| |
| |
Printing and summarizing model objects | |
| |
| |
| |
Extracting information from model objects | |
| |
| |
| |
Data-bases and Environments | |
| |
| |
| |
Workspace management | |
| |
| |
| |
Function environments, and lazy evaluation | |
| |
| |
| |
Manipulation of Language Constructs | |
| |
| |
| |
Further Reading | |
| |
| |
| |
Exercises | |
| |
| |
Epilogue--Models | |
| |
| |
| |
S-PLUS Differences | |
| |
| |
References | |
| |
| |
Index of R Symbols and Functions | |
| |
| |
Index of Terms | |
| |
| |
Index of Names | |