Data Mining for Business Intelligence Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner

Name: Data Mining for Business Intelligence Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner
Price: 9.71 USD
Availability: InStock
ISBN: 9780470084854

ISBN-10: 0470084855

ISBN-13: 9780470084854

Edition: 2007

Authors: Nitin R. Patel, Galit Shmueli, Peter C. Bruce

List price: $126.50

30 day, 100% satisfaction guarantee!

Marketplace

2 new & used from $9.71

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

Providing a theoretical and practical understanding of the key methods of classification, prediction, reduction and exploration that are at the heart of data mining, this book also presents a business decision-making context for these methods and uses cases and data to illustrate their application.

Book details

List price: $126.50
Copyright year: 2007
Publisher: John Wiley & Sons, Incorporated
Publication date: 12/11/2006
Binding: Hardcover
Pages: 298
Size: 7.25" wide x 10.25" long x 1.00" tall
Weight: 1.496
Language: English



Foreword


Preface


Acknowledgments


Introduction


What Is Data Mining?


Where Is Data Mining Used?


The Origins of Data Mining


The Rapid Growth of Data Mining


Why Are There So Many Different Methods?


Terminology and Notation


Road Maps to This Book


Overview of the Data Mining Process


Introduction


Core Ideas in Data Mining


Supervised and Unsupervised Learning


The Steps in Data Mining


Preliminary Steps


Building a Model: Example with Linear Regression


Using Excel for Data Mining


Problems


Data Exploration and Dimension Reduction


Introduction


Practical Considerations


House Prices in Boston


Data Summaries


Data Visualization


Correlation Analysis


Reducing the Number of Categories in Categorical Variables


Principal Components Analysis


Breakfast Cereals


Principal Components


Normalizing the Data


Using Principal Components for Classification and Prediction


Problems


Evaluating Classification and Predictive Performance


Introduction


Judging Classification Performance


Accuracy Measures


Cutoff for Classification


Performance in Unequal Importance of Classes


Asymmetric Misclassification Costs


Oversampling and Asymmetric Costs


Classification Using a Triage Strategy


Evaluating Predictive Performance


Problems


Multiple Linear Regression


Introduction


Explanatory vs. Predictive Modeling


Estimating the Regression Equation and Prediction


Example: Predicting the Price of Used Toyota Corolla Automobiles


Variable Selection in Linear Regression


Reducing the Number of Predictors


How to Reduce the Number of Predictors


Problems


Three Simple Classification Methods


Introduction


Predicting Fraudulent Financial Reporting


Predicting Delayed Flights


The Naive Rule


Naive Bayes


Conditional Probabilities and Pivot Tables


A Practical Difficulty


A Solution: Naive Bayes


Advantages and Shortcomings of the naive Bayes Classifier


k-Nearest Neighbors


Riding Mowers


Choosing k


k-NN for a Quantitative Response


Advantages and Shortcomings of k-NN Algorithms


Problems


Classification and Regression Trees


Introduction


Classification Trees


Recursive Partitioning


Example 1: Riding Mowers


Measures of Impurity


Evaluating the Performance of a Classification Tree


Acceptance of Personal Loan


Avoiding Overfitting


Stopping Tree Growth: CHAID


Pruning the Tree


Classification Rules from Trees


Regression Trees


Prediction


Measuring Impurity


Evaluating Performance


Advantages, Weaknesses, and Extensions


Problems


Logistic Regression


Introduction


The Logistic Regression Model


Example: Acceptance of Personal Loan


Model with a Single Predictor


Estimating the Logistic Model from Data: Computing Parameter Estimates


Interpreting Results in Terms of Odds


Why Linear Regression Is Inappropriate for a Categorical Response


Evaluating Classification Performance


Variable Selection


Evaluating Goodness of Fit


Example of Complete Analysis: Predicting Delayed Flights


Data Preprocessing


Model Fitting and Estimation


Model Interpretation


Model Performance


Goodness of fit


Variable Selection


Logistic Regression for More Than Two Classes


Ordinal Classes


Nominal Classes


Problems


Neural Nets


Introduction


Concept and Structure of a Neural Network


Fitting a Network to Data


Tiny Dataset


Computing Output of Nodes


Preprocessing the Data


Training the Model


Classifying Accident Severity


Avoiding overfitting


Using the Output for Prediction and Classification


Required User Input


Exploring the Relationship Between Predictors and Response


Advantages and Weaknesses of Neural Networks


Problems


Discriminant Analysis


Introduction


Example 1: Riding Mowers


Example 2: Personal Loan Acceptance


Distance of an Observation from a Class


Fisher's Linear Classification Functions


Classification Performance of Discriminant Analysis


Prior Probabilities


Unequal Misclassification Costs


Classifying More Than Two Classes


Medical Dispatch to Accident Scenes


Advantages and Weaknesses


Problems


Association Rules


Introduction


Discovering Association Rules in Transaction Databases


Example 1: Synthetic Data on Purchases of Phone Faceplates


Generating Candidate Rules


The Apriori Algorithm


Selecting Strong Rules


Support and Confidence


Lift Ratio


Data Format


The Process of Rule Selection


Interpreting the Results


Statistical Significance of Rules


Example 2: Rules for Similar Book Purchases


Summary


Problems


Cluster Analysis


Introduction


Example: Public Utilities


Measuring Distance Between Two Records


Euclidean Distance


Normalizing Numerical Measurements


Other Distance Measures for Numerical Data


Distance Measures for Categorical Data


Distance Measures for Mixed Data


Measuring Distance Between Two Clusters


Hierarchical (Agglomerative) Clustering


Minimum Distance (Single Linkage)


Maximum Distance (Complete Linkage)


Group Average (Average Linkage)


Dendrograms: Displaying Clustering Process and Results


Validating Clusters


Limitations of Hierarchical Clustering


Nonhierarchical Clustering: The k-Means Algorithm


Initial Partition into k Clusters


Problems


Cases


Charles Book Club


German Credit


Tayko Software Cataloger


Segmenting Consumers of Bath Soap


Direct-Mail Fundraising


Catalog Cross-Selling


Predicting Bankruptcy


References


Index