Skip to content

Data Analysis and Graphics Using R An Example-Based Approach

ISBN-10: 0521813360

ISBN-13: 9780521813365

Edition: 2003

Authors: John Maindonald, John A. Braun, R. Gill, B. D. Ripley, S. Ross

List price: $94.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Modern statistical software systems provide sophisticated tools for researchers who need to manipulate and display their data. Using such systems requires training both in the software itself and in the statistical methods that it relies on. Concentrating on the freely available R system, this book demonstrates recently implemented approaches and methods in statistical analysis. The authors introduce elementary concepts in statistics through examples of real-world data analysis drawn from the authors experience, both as teachers and as consultants. R code and data sets for all examples are available on the Internet. This emphasis on practical methodology combined with a tutorial approach makes the book accessible to anyone with a knowledge of undergraduate statistics, whether an upper-graduate student, a researcher, or a practising scientist or statistician. The methods demonstrated are suitable for use in a wide variety of disciplines, from social sciences to medicine, engineering and science.
Customers also bought

Book details

List price: $94.99
Copyright year: 2003
Publisher: Cambridge University Press
Publication date: 8/4/2003
Binding: Hardcover
Pages: 386
Size: 7.25" wide x 10.25" long x 1.25" tall
Weight: 1.870
Language: English

John Maindonald is Visiting Fellow at the Mathematical Sciences Institute at the Australian National University. He has collaborated extensively with scientists in a wide range of application areas, from medicine and public health to population genetics, machine learning, economic history, and forensic linguistics.

Preface
A Chapter by Chapter Summary
A Brief Introduction to R
A Short R Session
R must be installed!
Using the console (or command line) window
Reading data from a file
Entry of data at the command line
Online help
Quitting R
The Uses of R
The R Language
R objects
Retaining objects between sessions
Vectors in R
Concatenation--joining vector objects
Subsets of vectors
Patterned data
Missing values
Factors
Data Frames
Variable names
Applying a function to the columns of a data frame
Data frames and matrices
Identification of rows that include missing values
R Packages
Data sets that accompany R packages
Looping
R Graphics
The function plot () and allied functions
Identification and location on the figure region
Plotting mathematical symbols
Row by column layouts of plots
Graphs--additional notes
Additional Points on the Use of R in This Book
Further Reading
Exercises
Styles of Data Analysis
Revealing Views of the Data
Views of a single sample
Patterns in grouped data
Patterns in bivariate data--the scatterplot
Multiple variables and times
Lattice (trellis style) graphics
What to look for in plots
Data Summary
Mean and median
Standard deviation and inter-quartile range
Correlation
Statistical Analysis Strategies
Helpful and unhelpful questions
Planning the formal analysis
Changes to the intended plan of analysis
Recap
Further Reading
Exercises
Statistical Models
Regularities
Mathematical models
Models that include a random component
Smooth and rough
The construction and use of models
Model formulae
Distributions: Models for the Random Component
Discrete distributions
Continuous distributions
The Uses of Random Numbers
Simulation
Sampling from populations
Model Assumptions
Random sampling assumptions--independence
Checks for normality
Checking other model assumptions
Are non-parametric methods the answer?
Why models matter--adding across contingency tables
Recap
Further Reading
Exercises
An Introduction to Formal Inference
Standard Errors
Population parameters and sample statistics
Assessing accuracy--the standard error
Standard errors for differences of means
The standard error of the median
Resampling to estimate standard errors: bootstrapping
Calculations Involving Standard Errors: the t-Distribution
Confidence Intervals and Hypothesis Tests
One- and two-sample intervals and tests for means
Confidence intervals and tests for proportions
Confidence intervals for the correlation
Contingency Tables
Rare and endangered plant species
Additional notes
One-Way Unstructured Comparisons
Displaying means for the one-way layout
Multiple comparisons
Data with a two-way structure
Presentation issues
Response Curves
Data with a Nested Variation Structure
Degrees of freedom considerations
General multi-way analysis of variance designs
Resampling Methods for Tests and Confidence Intervals
The one-sample permutation test
The two-sample permutation test
Bootstrap estimates of confidence intervals
Further Comments on Formal Inference
Confidence intervals versus hypothesis tests
If there is strong prior information, use it!
Recap
Further Reading
Exercises
Regression with a Single Predictor
Fitting a Line to Data
Lawn roller example
Calculating fitted values and residuals
Residual plots
The analysis of variance table
Outliers, Influence and Robust Regression
Standard Errors and Confidence Intervals
Confidence intervals and tests for the slope
SEs and confidence intervals for predicted values
Implications for design
Regression versus Qualitative ANOVA Comparisons
Assessing Predictive Accuracy
Training/test sets, and cross-validation
Cross-validation--an example
Bootstrapping
A Note on Power Transformations
Size and Shape Data
Allometric growth
There are two regression lines!
The Model Matrix in Regression
Recap
Methodological References
Exercises
Multiple Linear Regression
Basic Ideas: Book Weight and Brain Weight Examples
Omission of the intercept term
Diagnostic plots
Further investigation of influential points
Example: brain weight
Multiple Regression Assumptions and Diagnostics
Influential outliers and Cook's distance
Component plus residual plots
Further types of diagnostic plot
Robust and resistant methods
A Strategy for Fitting Multiple Regression Models
Preliminaries
Model fitting
An example--the Scottish hill race data
Measures for the Comparison of Regression Models
R[superscript 2] and adjusted R[superscript 2]
AIC and related statistics
How accurately does the equation predict?
An external assessment of predictive accuracy
Interpreting Regression Coefficients--the Labor Training Data
Problems with Many Explanatory Variables
Variable selection issues
Principal components summaries
Multicollinearity
A contrived example
The variance inflation factor (VIF)
Remedying multicollinearity
Multiple Regression Models--Additional Points
Confusion between explanatory and dependent variables
Missing explanatory variables
The use of transformations
Non-linear methods--an alternative to transformation?
Further Reading
Exercises
Exploiting the Linear Model Framework
Levels of a Factor--Using Indicator Variables
Example--sugar weight
Different choices for the model matrix when there are factors
Polynomial Regression
Issues in the choice of model
Fitting Multiple Lines
Methods for Passing Smooth Curves through Data
Scatterplot smoothing--regression splines
Other smoothing methods
Generalized additive models
Smoothing Terms in Multiple Linear Models
Further Reading
Exercises
Logistic Regression and Other Generalized Linear Models
Generalized Linear Models
Transformation of the expected value on the left
Noise terms need not be normal
Log odds in contingency tables
Logistic regression with a continuous explanatory variable
Logistic Multiple Regression
A plot of contributions of explanatory variables
Cross-validation estimates of predictive accuracy
Logistic Models for Categorical Data--an Example
Poisson and Quasi-Poisson Regression
Data on aberrant crypt foci
Moth habitat example
Residuals, and estimating the dispersion
Ordinal Regression Models
Exploratory analysis
Proportional odds logistic regression
Other Related Models
Loglinear models
Survival analysis
Transformations for Count Data
Further Reading
Exercises
Multi-level Models, Time Series and Repeated Measures
Introduction
Example--Survey Data, with Clustering
Alternative models
Instructive, though faulty, analyses
Predictive accuracy
A Multi-level Experimental Design
The ANOVA table
Expected values of mean squares
The sums of squares breakdown
The variance components
The mixed model analysis
Predictive accuracy
Different sources of variance--complication or focus of interest?
Within and between Subject Effects--an Example
Time Series--Some Basic Ideas
Preliminary graphical explorations
The autocorrelation function
Autoregressive (AR) models
Autoregressive moving average (ARMA) models--theory
Regression Modeling with Moving Average Errors--an Example
Repeated Measures in Time--Notes on the Methodology
The theory of repeated measures modeling
Correlation structure
Different approaches to repeated measures analysis
Further Notes on Multi-level Modeling
An historical perspective on multi-level models
Meta-analysis
Further Reading
Exercises
Tree-based Classification and Regression
The Uses of Tree-based Methods
Problems for which tree-based regression may be used
Tree-based regression versus parametric approaches
Summary of pluses and minuses
Detecting Email Spam--an Example
Choosing the number of splits
Terminology and Methodology
Choosing the split--regression trees
Within and between sums of squares
Choosing the split--classification trees
The mechanics of tree-based regression--a trivial example
Assessments of Predictive Accuracy
Cross-validation
The training/test set methodology
Predicting the future
A Strategy for Choosing the Optimal Tree
Cost-complexity pruning
Prediction error versus tree size
Detecting Email Spam--the Optimal Tree
The one-standard-deviation rule
Interpretation and Presentation of the rpart Output
Data for female heart attack patients
Printed Information on Each Split
Additional Notes
Further Reading
Exercises
Multivariate Data Exploration and Discrimination
Multivariate Exploratory Data Analysis
Scatterplot matrices
Principal components analysis
Discriminant Analysis
Example--plant architecture
Classical Fisherian discriminant analysis
Logistic discriminant analysis
An example with more than two groups
Principal Component Scores in Regression
Propensity Scores in Regression Comparisons--Labor Training Data
Further Reading
Exercises
The R System--Additional Topics
Graphs in R
Functions--Some Further Details
Common useful functions
User-written R functions
Functions for working with dates
Data input and output
Input
Data output
Factors--Additional Comments
Missing Values
Lists and Data Frames
Data frames as lists
Reshaping data frames; reshape ()
Joining data frames and vectors--cbind ()
Conversion of tables and arrays into data frames
Merging data frames--merge ()
The function sapply () and related functions
Splitting vectors and data frames into lists--split ()
Matrices and Arrays
Outer products
Arrays
Classes and Methods
Printing and summarizing model objects
Extracting information from model objects
Data-bases and Environments
Workspace management
Function environments, and lazy evaluation
Manipulation of Language Constructs
Further Reading
Exercises
Epilogue--Models
S-PLUS Differences
References
Index of R Symbols and Functions
Index of Terms
Index of Names