Discovering Knowledge in Data An Introduction to Data Mining

Name: Discovering Knowledge in Data An Introduction to Data Mining
Price: 63.57 USD
Availability: InStock
ISBN: 9780471666578

ISBN-10: 0471666572

ISBN-13: 9780471666578

Edition: 2005

Authors: Daniel T. Larose

List price: $120.00

30 day, 100% satisfaction guarantee!

Marketplace

1 new & used from $63.57

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

The term data mining was created to describe the indirect, automatic data analysis techniques that use more complex & sophisticated tools than were used in the past. This volume offers hands-on experience under real-world conditions & case studies.

Book details

List price: $120.00
Copyright year: 2005
Publisher: John Wiley & Sons, Incorporated
Publication date: 11/18/2004
Binding: Hardcover
Pages: 222
Size: 6.35" wide x 9.60" long x 0.70" tall
Weight: 1.144



Preface



Introduction to Data Mining


What Is Data Mining?


Why Data Mining?


Need for Human Direction of Data Mining


Cross-Industry Standard Process: CRISP-DM



Analyzing Automobile Warranty Claims: Example of the CRISP-DM Industry Standard Process in Action


Fallacies of Data Mining


What Tasks Can Data Mining Accomplish?


Description


Estimation


Prediction


Classification


Clustering


Association



Predicting Abnormal Stock Market Returns Using Neural Networks



Mining Association Rules from Legal Databases



Predicting Corporate Bankruptcies Using Decision Trees



Profiling the Tourism Market Using k-Means Clustering Analysis


References


Exercises



Data Preprocessing


Why Do We Need to Preprocess the Data?


Data Cleaning


Handling Missing Data


Identifying Misclassifications


Graphical Methods for Identifying Outliers


Data Transformation


Min-Max Normalization


Z-Score Standardization


Numerical Methods for Identifying Outliers


References


Exercises



Exploratory Data Analysis


Hypothesis Testing versus Exploratory Data Analysis


Getting to Know the Data Set


Dealing with Correlated Variables


Exploring Categorical Variables


Using EDA to Uncover Anomalous Fields


Exploring Numerical Variables


Exploring Multivariate Relationships


Selecting Interesting Subsets of the Data for Further Investigation


Binning


Summary


References


Exercises



Statistical Approaches to Estimation and Prediction


Data Mining Tasks in Discovering Knowledge in Data


Statistical Approaches to Estimation and Prediction


Univariate Methods: Measures of Center and Spread


Statistical Inference


How Confident Are We in Our Estimates?


Confidence Interval Estimation


Bivariate Methods: Simple Linear Regression


Dangers of Extrapolation


Confidence Intervals for the Mean Value of y Given x


Prediction Intervals for a Randomly Chosen Value of y Given x


Multiple Regression


Verifying Model Assumptions


References


Exercises



k-Nearest Neighbor Algorithm


Supervised versus Unsupervised Methods


Methodology for Supervised Modeling


Bias-Variance Trade-Off


Classification Task


k-Nearest Neighbor Algorithm


Distance Function


Combination Function


Simple Unweighted Voting


Weighted Voting


Quantifying Attribute Relevance: Stretching the Axes


Database Considerations


k-Nearest Neighbor Algorithm for Estimation and Prediction


Choosing k


Reference


Exercises



Decision Trees


Classification and Regression Trees


C4.5 Algorithm


Decision Rules


Comparison of the C5.0 and CART Algorithms Applied to Real Data


References


Exercises



Neural Networks


Input and Output Encoding


Neural Networks for Estimation and Prediction


Simple Example of a Neural Network


Sigmoid Activation Function


Back-Propagation


Gradient Descent Method


Back-Propagation Rules


Example of Back-Propagation


Termination Criteria


Learning Rate


Momentum Term


Sensitivity Analysis


Application of Neural Network Modeling


References


Exercises



Hierarchical and k-Means Clustering


Clustering Task


Hierarchical Clustering Methods


Single-Linkage Clustering


Complete-Linkage Clustering


k-Means Clustering


Example of k-Means Clustering at Work


Application of k-Means Clustering Using SAS Enterprise Miner


Using Cluster Membership to Predict Churn


References


Exercises



Kohonen Networks


Self-Organizing Maps


Kohonen Networks


Example of a Kohonen Network Study


Cluster Validity


Application of Clustering Using Kohonen Networks


Interpreting the Clusters


Cluster Profiles


Using Cluster Membership as Input to Downstream Data Mining Models


References


Exercises



Association Rules


Affinity Analysis and Market Basket Analysis


Data Representation for Market Basket Analysis


Support, Confidence, Frequent Itemsets, and the A Priori Property


How Does the A Priori Algorithm Work (Part 1)? Generating Frequent Itemsets


How Does the A Priori Algorithm Work (Part 2)? Generating Association Rules


Extension from Flag Data to General Categorical Data


Information-Theoretic Approach: Generalized Rule Induction Method


J-Measure


Application of Generalized Rule Induction


When Not to Use Association Rules


Do Association Rules Represent Supervised or Unsupervised Learning?


Local Patterns versus Global Models


References


Exercises



Model Evaluation Techniques


Model Evaluation Techniques for the Description Task


Model Evaluation Techniques for the Estimation and Prediction Tasks


Model Evaluation Techniques for the Classification Task


Error Rate, False Positives, and False Negatives


Misclassification Cost Adjustment to Reflect Real-World Concerns


Decision Cost/Benefit Analysis


Lift Charts and Gains Charts


Interweaving Model Evaluation with Model Building


Confluence of Results: Applying a Suite of Models


Reference


Exercises


Epilogue: "We've Only Just Begun"


Index