| |
| |
Preface | |
| |
| |
| |
What Is Data Mining? | |
| |
| |
| |
Big Data | |
| |
| |
| |
The Data Warehouse | |
| |
| |
| |
Timelines | |
| |
| |
| |
Types of Data-Mining Problems | |
| |
| |
| |
The Pedigree of Data Mining | |
| |
| |
| |
Databases | |
| |
| |
| |
Statistics | |
| |
| |
| |
Machine Learning | |
| |
| |
| |
Is Big Better? | |
| |
| |
| |
Strong Statistical Evaluation | |
| |
| |
| |
More Intensive Search | |
| |
| |
| |
More Controlled Experiments | |
| |
| |
| |
Is Big Necessary? | |
| |
| |
| |
The Tasks of Predictive Data Mining | |
| |
| |
| |
Data Preparation | |
| |
| |
| |
Data Reduction | |
| |
| |
| |
Data Modeling and Prediction | |
| |
| |
| |
Case and Solution Analyses | |
| |
| |
| |
Data Mining: Art or Science? | |
| |
| |
| |
An Overview of the Book | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Statistical Evaluation for Big Data | |
| |
| |
| |
The Idealized Model | |
| |
| |
| |
Classical Statistical Comparison and Evaluation | |
| |
| |
| |
It's Big but Is It Biased? | |
| |
| |
| |
Objective Versus Survey Data | |
| |
| |
| |
Significance and Predictive Value | |
| |
| |
| |
Too Many Comparisons? | |
| |
| |
| |
Classical Types of Statistical Prediction | |
| |
| |
| |
Predicting True-or-False: Classification | |
| |
| |
| |
Error Rates | |
| |
| |
| |
Forecasting Numbers: Regression | |
| |
| |
| |
Distance Measures | |
| |
| |
| |
Measuring Predictive Performance | |
| |
| |
| |
Independent Testing | |
| |
| |
| |
Random Training and Testing | |
| |
| |
| |
How Accurate Is the Error Estimate? | |
| |
| |
| |
Comparing Results for Error Measures | |
| |
| |
| |
Ideal or Real-World Sampling? | |
| |
| |
| |
Training and Testing from Different Time Periods | |
| |
| |
| |
Too Much Searching and Testing? | |
| |
| |
| |
Why Are Errors Made? | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Preparing the Data | |
| |
| |
| |
A Standard Form | |
| |
| |
| |
Standard Measurements | |
| |
| |
| |
Goals | |
| |
| |
| |
Data Transformations | |
| |
| |
| |
Normalizations | |
| |
| |
| |
Data Smoothing | |
| |
| |
| |
Differences and Ratios | |
| |
| |
| |
Missing Data | |
| |
| |
| |
Time-Dependent Data | |
| |
| |
| |
Time Series | |
| |
| |
| |
Composing Features from Time Series | |
| |
| |
| |
Current Values | |
| |
| |
| |
Moving Averages | |
| |
| |
| |
Trends | |
| |
| |
| |
Seasonal Adjustments | |
| |
| |
| |
Hybrid Time-Dependent Applications | |
| |
| |
| |
Multivariate Time Series | |
| |
| |
| |
Classification and Time Series | |
| |
| |
| |
Standard Cases with Time-Series Attributes | |
| |
| |
| |
Text Mining | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Data Reduction | |
| |
| |
| |
Selecting the Best Features | |
| |
| |
| |
Feature Selection from Means and Variances | |
| |
| |
| |
Independent Features | |
| |
| |
| |
Distance-Based Optimal Feature Selection | |
| |
| |
| |
Heuristic Feature Selection | |
| |
| |
| |
Principal Components | |
| |
| |
| |
Feature Selection by Decision Trees | |
| |
| |
| |
How Many Measured Values? | |
| |
| |
| |
Reducing and Smoothing Values | |
| |
| |
| |
Rounding | |
| |
| |
| |
K-Means Clustering | |
| |
| |
| |
Class Entropy | |
| |
| |
| |
How Many Cases? | |
| |
| |
| |
A Single Sample | |
| |
| |
| |
Incremental Samples | |
| |
| |
| |
Average Samples | |
| |
| |
| |
Specialized Case-Reduction Techniques | |
| |
| |
| |
Sequential Sampling over Time | |
| |
| |
| |
Strategic Sampling of Key Events | |
| |
| |
| |
Adjusting Prevalence | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Looking for Solutions | |
| |
| |
| |
Overview | |
| |
| |
| |
Math Solutions | |
| |
| |
| |
Linear Scoring | |
| |
| |
| |
Nonlinear Scoring: Neural Nets | |
| |
| |
| |
Advanced Statistical Methods | |
| |
| |
| |
Distance Solutions | |
| |
| |
| |
Logic Solutions | |
| |
| |
| |
Decision Trees | |
| |
| |
| |
Decision Rules | |
| |
| |
| |
What Do the Answers Mean? | |
| |
| |
| |
Is It Safe to Edit Solutions? | |
| |
| |
| |
Which Solution Is Preferable? | |
| |
| |
| |
Combining Different Answers | |
| |
| |
| |
Multiple Prediction Methods | |
| |
| |
| |
Multiple Samples | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
What's Best for Data Reduction and Mining? | |
| |
| |
| |
Let's Analyze Some Real Data | |
| |
| |
| |
The Experimental Methods | |
| |
| |
| |
The Empirical Results | |
| |
| |
| |
Significance Testing | |
| |
| |
| |
So What Did We Learn? | |
| |
| |
| |
Feature Selection | |
| |
| |
| |
Value Reduction | |
| |
| |
| |
Subsampling or All Cases? | |
| |
| |
| |
Graphical Trend Analysis | |
| |
| |
| |
Incremental Case Analysis | |
| |
| |
| |
Incremental Complexity Analysis | |
| |
| |
| |
Maximum Data Reduction | |
| |
| |
| |
Are There Winners and Losers in Performance? | |
| |
| |
| |
Getting the Best Results | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Art or Science? Case Studies in Data Mining | |
| |
| |
| |
Why These Case Studies? | |
| |
| |
| |
A Summary of Tasks for Predictive Data Mining | |
| |
| |
| |
A Checklist for Data Preparation | |
| |
| |
| |
A Checklist for Data Reduction | |
| |
| |
| |
A Checklist for Data Modeling and Prediction | |
| |
| |
| |
A Checklist for Case and Solution Analyses | |
| |
| |
| |
The Case Studies | |
| |
| |
| |
Transaction Processing | |
| |
| |
| |
Text Mining | |
| |
| |
| |
Outcomes Analysis | |
| |
| |
| |
Process Control | |
| |
| |
| |
Marketing and User Profiling | |
| |
| |
| |
Exploratory Analysis | |
| |
| |
| |
Looking Ahead | |
| |
| |
| |
Bibliographic and Historical Remarks | |
| |
| |
| |
Data-Miner Software Kit | |
| |
| |
References | |
| |
| |
Author Index | |
| |
| |
Subject Index | |