Skip to content

Discovering Knowledge in Data An Introduction to Data Mining

Best in textbook rentals since 2012!

ISBN-10: 0471666572

ISBN-13: 9780471666578

Edition: 2005

Authors: Daniel T. Larose

List price: $120.00
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

The term data mining was created to describe the indirect, automatic data analysis techniques that use more complex & sophisticated tools than were used in the past. This volume offers hands-on experience under real-world conditions & case studies.
Customers also bought

Book details

List price: $120.00
Copyright year: 2005
Publisher: John Wiley & Sons, Incorporated
Publication date: 11/18/2004
Binding: Hardcover
Pages: 222
Size: 6.35" wide x 9.60" long x 0.70" tall
Weight: 1.144

Preface
Introduction to Data Mining
What Is Data Mining?
Why Data Mining?
Need for Human Direction of Data Mining
Cross-Industry Standard Process: CRISP-DM
Analyzing Automobile Warranty Claims: Example of the CRISP-DM Industry Standard Process in Action
Fallacies of Data Mining
What Tasks Can Data Mining Accomplish?
Description
Estimation
Prediction
Classification
Clustering
Association
Predicting Abnormal Stock Market Returns Using Neural Networks
Mining Association Rules from Legal Databases
Predicting Corporate Bankruptcies Using Decision Trees
Profiling the Tourism Market Using k-Means Clustering Analysis
References
Exercises
Data Preprocessing
Why Do We Need to Preprocess the Data?
Data Cleaning
Handling Missing Data
Identifying Misclassifications
Graphical Methods for Identifying Outliers
Data Transformation
Min-Max Normalization
Z-Score Standardization
Numerical Methods for Identifying Outliers
References
Exercises
Exploratory Data Analysis
Hypothesis Testing versus Exploratory Data Analysis
Getting to Know the Data Set
Dealing with Correlated Variables
Exploring Categorical Variables
Using EDA to Uncover Anomalous Fields
Exploring Numerical Variables
Exploring Multivariate Relationships
Selecting Interesting Subsets of the Data for Further Investigation
Binning
Summary
References
Exercises
Statistical Approaches to Estimation and Prediction
Data Mining Tasks in Discovering Knowledge in Data
Statistical Approaches to Estimation and Prediction
Univariate Methods: Measures of Center and Spread
Statistical Inference
How Confident Are We in Our Estimates?
Confidence Interval Estimation
Bivariate Methods: Simple Linear Regression
Dangers of Extrapolation
Confidence Intervals for the Mean Value of y Given x
Prediction Intervals for a Randomly Chosen Value of y Given x
Multiple Regression
Verifying Model Assumptions
References
Exercises
k-Nearest Neighbor Algorithm
Supervised versus Unsupervised Methods
Methodology for Supervised Modeling
Bias-Variance Trade-Off
Classification Task
k-Nearest Neighbor Algorithm
Distance Function
Combination Function
Simple Unweighted Voting
Weighted Voting
Quantifying Attribute Relevance: Stretching the Axes
Database Considerations
k-Nearest Neighbor Algorithm for Estimation and Prediction
Choosing k
Reference
Exercises
Decision Trees
Classification and Regression Trees
C4.5 Algorithm
Decision Rules
Comparison of the C5.0 and CART Algorithms Applied to Real Data
References
Exercises
Neural Networks
Input and Output Encoding
Neural Networks for Estimation and Prediction
Simple Example of a Neural Network
Sigmoid Activation Function
Back-Propagation
Gradient Descent Method
Back-Propagation Rules
Example of Back-Propagation
Termination Criteria
Learning Rate
Momentum Term
Sensitivity Analysis
Application of Neural Network Modeling
References
Exercises
Hierarchical and k-Means Clustering
Clustering Task
Hierarchical Clustering Methods
Single-Linkage Clustering
Complete-Linkage Clustering
k-Means Clustering
Example of k-Means Clustering at Work
Application of k-Means Clustering Using SAS Enterprise Miner
Using Cluster Membership to Predict Churn
References
Exercises
Kohonen Networks
Self-Organizing Maps
Kohonen Networks
Example of a Kohonen Network Study
Cluster Validity
Application of Clustering Using Kohonen Networks
Interpreting the Clusters
Cluster Profiles
Using Cluster Membership as Input to Downstream Data Mining Models
References
Exercises
Association Rules
Affinity Analysis and Market Basket Analysis
Data Representation for Market Basket Analysis
Support, Confidence, Frequent Itemsets, and the A Priori Property
How Does the A Priori Algorithm Work (Part 1)? Generating Frequent Itemsets
How Does the A Priori Algorithm Work (Part 2)? Generating Association Rules
Extension from Flag Data to General Categorical Data
Information-Theoretic Approach: Generalized Rule Induction Method
J-Measure
Application of Generalized Rule Induction
When Not to Use Association Rules
Do Association Rules Represent Supervised or Unsupervised Learning?
Local Patterns versus Global Models
References
Exercises
Model Evaluation Techniques
Model Evaluation Techniques for the Description Task
Model Evaluation Techniques for the Estimation and Prediction Tasks
Model Evaluation Techniques for the Classification Task
Error Rate, False Positives, and False Negatives
Misclassification Cost Adjustment to Reflect Real-World Concerns
Decision Cost/Benefit Analysis
Lift Charts and Gains Charts
Interweaving Model Evaluation with Model Building
Confluence of Results: Applying a Suite of Models
Reference
Exercises
Epilogue: "We've Only Just Begun"
Index