| |
| |
Preface | |
| |
| |
Acknowledgments | |
| |
| |
List of Figures | |
| |
| |
List of Tables | |
| |
| |
| |
Introduction | |
| |
| |
| |
How to Read This Book? | |
| |
| |
| |
A Short Introduction to R | |
| |
| |
| |
Starting with R | |
| |
| |
| |
R Objects | |
| |
| |
| |
Vectors | |
| |
| |
| |
Vectorization | |
| |
| |
| |
Factors | |
| |
| |
| |
Generating Sequences | |
| |
| |
| |
Sub-Setting | |
| |
| |
| |
Matrices and Arrays | |
| |
| |
| |
Lists | |
| |
| |
| |
Data Frames | |
| |
| |
| |
Creating New Functions | |
| |
| |
| |
Objects, Classes, and Methods | |
| |
| |
| |
Managing Your Sessions | |
| |
| |
| |
A Short Introduction to MySQL | |
| |
| |
| |
Predicting Algae Blooms | |
| |
| |
| |
Problem Description and Objectives | |
| |
| |
| |
Data Description | |
| |
| |
| |
Loading the Data into R | |
| |
| |
| |
Data Visualization and Summarization | |
| |
| |
| |
Unknown Values | |
| |
| |
| |
Removing the Observations with Unknown Values | |
| |
| |
| |
Filling in the Unknowns with the Most Frequent Values | |
| |
| |
| |
Filling in the Unknown Values by Exploring Correlations | |
| |
| |
| |
Filling in the Unknown Values by Exploring Similarities between Cases | |
| |
| |
| |
Obtaining Prediction Models | |
| |
| |
| |
Multiple Linear Regression | |
| |
| |
| |
Regression Trees | |
| |
| |
| |
Model Evaluation and Selection | |
| |
| |
| |
Predictions for the Seven Algae | |
| |
| |
| |
Summary | |
| |
| |
| |
Predicting Stock Market Returns | |
| |
| |
| |
Problem Description and Objectives | |
| |
| |
| |
The Available Data | |
| |
| |
| |
Handling Time-Dependent Data in R | |
| |
| |
| |
Reading the Data from the CSV File | |
| |
| |
| |
Getting the Data from the Web | |
| |
| |
| |
Reading the Data from a MySQL Database | |
| |
| |
| |
Loading the Data into R Running on Windows | |
| |
| |
| |
Loading the Data into R Running on Linux | |
| |
| |
| |
Defining the Prediction Tasks | |
| |
| |
| |
What to Predict? | |
| |
| |
| |
Which Predictors? | |
| |
| |
| |
The Prediction Tasks | |
| |
| |
| |
Evaluation Criteria | |
| |
| |
| |
The Prediction Models | |
| |
| |
| |
How Will the Training Data Be Used? | |
| |
| |
| |
The Modeling Tools | |
| |
| |
| |
Artificial Neural Networks | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Multivariate Adaptive Regression Splines | |
| |
| |
| |
From Predictions into Actions | |
| |
| |
| |
How Will the Predictions Be Used? | |
| |
| |
| |
Trading-Related Evaluation Criteria | |
| |
| |
| |
Putting Everything Together: A Simulated Trader | |
| |
| |
| |
Model Evaluation and Selection | |
| |
| |
| |
Monte Carlo Estimates | |
| |
| |
| |
Experimental Comparisons | |
| |
| |
| |
Results Analysis | |
| |
| |
| |
The Trading System | |
| |
| |
| |
Evaluation of the Final Test Data | |
| |
| |
| |
An Online Trading System | |
| |
| |
| |
Summary | |
| |
| |
| |
Detecting Fraudulent Transactions | |
| |
| |
| |
Problem Description and Objectives | |
| |
| |
| |
The Available Data | |
| |
| |
| |
Loading the Data into R | |
| |
| |
| |
Exploring the Dataset | |
| |
| |
| |
Data Problems | |
| |
| |
| |
Unknown Values | |
| |
| |
| |
Few Transactions of Some Products | |
| |
| |
| |
Defining the Data Mining Tasks | |
| |
| |
| |
Different Approaches to the Problem | |
| |
| |
| |
Unsupervised Techniques | |
| |
| |
| |
Supervised Techniques | |
| |
| |
| |
Semi-Supervised Techniques | |
| |
| |
| |
Evaluation Criteria | |
| |
| |
| |
Precision and Recall | |
| |
| |
| |
Lift Charts and Precision/Recall Curves | |
| |
| |
| |
Normalized Distance to Typical Price | |
| |
| |
| |
Experimental Methodology | |
| |
| |
| |
Obtaining Outlier Rankings | |
| |
| |
| |
Unsupervised Approaches | |
| |
| |
| |
The Modified Box Plot Rule | |
| |
| |
| |
Local Outlier Factors (LOF) | |
| |
| |
| |
Clustering-Based Outlier Rankings (OR<sub>h</sub>) | |
| |
| |
| |
Supervised Approaches | |
| |
| |
| |
The Class Imbalance Problem | |
| |
| |
| |
Naive Bayes | |
| |
| |
| |
AdaBoost | |
| |
| |
| |
Semi-Supervised Approaches | |
| |
| |
| |
Summary | |
| |
| |
| |
Classifying Microarray Samples | |
| |
| |
| |
Problem Description and Objectives | |
| |
| |
| |
Brief Background on Microarray Experiments | |
| |
| |
| |
The ALL Dataset | |
| |
| |
| |
The Available Data | |
| |
| |
| |
Exploring the Dataset | |
| |
| |
| |
Gene (Feature) Selection | |
| |
| |
| |
Simple Filters Based on Distribution Properties | |
| |
| |
| |
ANOVA Filters | |
| |
| |
| |
Filtering Using Random Forests | |
| |
| |
| |
Filtering Using Feature Clustering Ensembles | |
| |
| |
| |
Predicting Cytogenetic Abnormalities | |
| |
| |
| |
Defining the Prediction Task | |
| |
| |
| |
The Evaluation Metric | |
| |
| |
| |
The Experimental Procedure | |
| |
| |
| |
The Modeling Techniques | |
| |
| |
| |
Random Forests | |
| |
| |
| |
k-Nearest Neighbors | |
| |
| |
| |
Comparing the Models | |
| |
| |
| |
Summary | |
| |
| |
Bibliography | |
| |
| |
Subject Index | |
| |
| |
Index of Data Mining Topics | |
| |
| |
Index of R Functions | |