| |
| |
List of Figures | |
| |
| |
List of Abbreviations | |
| |
| |
| |
Introduction | |
| |
| |
| |
Data Mining | |
| |
| |
| |
R | |
| |
| |
| |
Datasets | |
| |
| |
| |
The Iris Dataset | |
| |
| |
| |
The Bodyfat Dataset | |
| |
| |
| |
Data Import and Export | |
| |
| |
| |
Save and Load R Data | |
| |
| |
| |
Import from and Export to .CSV Files | |
| |
| |
| |
Import Data from SAS | |
| |
| |
| |
Import/Export via ODBC | |
| |
| |
| |
Read from Databases | |
| |
| |
| |
Output to and Input from EXCEL Files | |
| |
| |
| |
Data Exploration | |
| |
| |
| |
Have a Look at Data | |
| |
| |
| |
Explore Individual Variables | |
| |
| |
| |
Explore Multiple Variables | |
| |
| |
| |
More Explorations | |
| |
| |
| |
Save Charts into Files | |
| |
| |
| |
Decision Trees and Random Forest | |
| |
| |
| |
Decision Trees with Package party | |
| |
| |
| |
Decision Trees with Package rpart | |
| |
| |
| |
Random Forest | |
| |
| |
| |
Regression | |
| |
| |
| |
Linear Regression | |
| |
| |
| |
Logistic Regression | |
| |
| |
| |
Generalized Linear Regression | |
| |
| |
| |
Non-Linear Regression | |
| |
| |
| |
Clustering | |
| |
| |
| |
The k-Means Clustering | |
| |
| |
| |
The k-Medoids Clustering | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Density-Based Clustering | |
| |
| |
| |
Outlier Detection | |
| |
| |
| |
Univariate Outlier Detection | |
| |
| |
| |
Outlier Detection with LOF | |
| |
| |
| |
Outlier Detection by Clustering | |
| |
| |
| |
Outlier Detection from Time Series | |
| |
| |
| |
Discussions | |
| |
| |
| |
Time Series Analysis and Mining | |
| |
| |
| |
Time Series Data in R | |
| |
| |
| |
Time Series Decomposition | |
| |
| |
| |
Time Series Forecasting | |
| |
| |
| |
Time Series Clustering | |
| |
| |
| |
Dynamic Time Warping | |
| |
| |
| |
Synthetic Control Chart Time Series Data | |
| |
| |
| |
Hierarchical Clustering with Euclidean Distance | |
| |
| |
| |
Hierarchical Clustering with DTW Distance | |
| |
| |
| |
Time Series Classification | |
| |
| |
| |
Classification with Original Data | |
| |
| |
| |
Classification with Extracted Features | |
| |
| |
| |
k-NN Classification | |
| |
| |
| |
Discussions | |
| |
| |
| |
Further Readings | |
| |
| |
| |
Association Rules | |
| |
| |
| |
Basics of Association Rules | |
| |
| |
| |
The Titanic Dataset | |
| |
| |
| |
Association Rule Mining | |
| |
| |
| |
Removing Redundancy | |
| |
| |
| |
Interpreting Rules | |
| |
| |
| |
Visualizing Association Rules | |
| |
| |
| |
Discussions and Further Readings | |
| |
| |
| |
Text Mining | |
| |
| |
| |
Retrieving Text from Twitter | |
| |
| |
| |
Transforming Text | |
| |
| |
| |
Stemming Words | |
| |
| |
| |
Building a Term-Document Matrix | |
| |
| |
| |
Frequent Terms and Associations | |
| |
| |
| |
Word Cloud | |
| |
| |
| |
Clustering Words | |
| |
| |
| |
Clustering Tweets | |
| |
| |
| |
Clustering Tweets with the k-Means Algorithm | |
| |
| |
| |
Clustering Tweets with the k-Medoids Algorithm | |
| |
| |
| |
Packages, Further Readings, and Discussions | |
| |
| |
| |
Social Network Analysis | |
| |
| |
| |
Network of Terms | |
| |
| |
| |
Network of Tweets | |
| |
| |
| |
Two-Mode Network | |
| |
| |
| |
Discussions and Further Readings | |
| |
| |
| |
Case Study I: Analysis and Forecasting of House Price Indices | |
| |
| |
| |
Importing HPI Data | |
| |
| |
| |
Exploration of HPI Data | |
| |
| |
| |
Trend and Seasonal Components of HPI | |
| |
| |
| |
HPI Forecasting | |
| |
| |
| |
The Estimated Price of a Property | |
| |
| |
| |
Discussion | |
| |
| |
| |
Case Study II: Customer Response Prediction and Profit Optimization | |
| |
| |
| |
Introduction | |
| |
| |
| |
The Data of KDD Cup 1998 | |
| |
| |
| |
Data Exploration | |
| |
| |
| |
Training Decision Trees | |
| |
| |
| |
Model Evaluation | |
| |
| |
| |
Selecting the Best Tree | |
| |
| |
| |
Scoring | |
| |
| |
| |
Discussions and Conclusions | |
| |
| |
| |
Case Study III: Predictive Modeling of Big Data with Limited Memory | |
| |
| |
| |
Introduction | |
| |
| |
| |
Methodology | |
| |
| |
| |
Data and Variables | |
| |
| |
| |
Random Forest | |
| |
| |
| |
Memory Issue | |
| |
| |
| |
Train Models on Sample Data | |
| |
| |
| |
Build Models with Selected Variables | |
| |
| |
| |
Scoring | |
| |
| |
| |
Print Rules | |
| |
| |
| |
Print Rules in Text | |
| |
| |
| |
Print Rules for Scoring with SAS | |
| |
| |
| |
Conclusions and Discussion | |
| |
| |
| |
Online Resources | |
| |
| |
| |
R Reference Cards | |
| |
| |
| |
R | |
| |
| |
| |
Data Mining | |
| |
| |
| |
Data Mining with R | |
| |
| |
| |
Classification/Prediction with R | |
| |
| |
| |
Time Series Analysis with R | |
| |
| |
| |
Association Rule Mining with R | |
| |
| |
| |
Spatial Data Analysis with R | |
| |
| |
| |
Text Mining with R | |
| |
| |
| |
Social Network Analysis with R | |
| |
| |
| |
Data Cleansing and Transformation with R | |
| |
| |
| |
Big Data and Parallel Computing with R | |
| |
| |
R Reference Card for Data Mining | |
| |
| |
Bibliography | |
| |
| |
General Index | |
| |
| |
Package Index | |
| |
| |
Function Index | |