Data Science for Business What You Need to Know about Data Mining and Data-Analytic Thinking

Name: Data Science for Business What You Need to Know about Data Mining and Data-Analytic Thinking
Price: 9.73 USD
Availability: InStock
ISBN: 9781449361327

ISBN-10: 1449361323

ISBN-13: 9781449361327

Edition: 2013

Authors: Foster Provost, Tom Fawcett

List price: $39.99

30 day, 100% satisfaction guarantee!

Sell

Get cash fast!

Buy new: $34.60

Marketplace

5 new & used from $9.73

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

Data Science for Business is intended for those who need to understand data science/data mining, and those who want to develop their skill at data-analytic thinking. This is not a book about algorithms. Instead it presents a set of fundamental principles for extracting useful knowledge from data. These fundamental principles are the foundation for many algorithms and techniques for data mining, but also underlie the processes and methods for approaching business problems data-analytically, evaluating particular data science solutions, and evaluating general data science plans.After reading the book, the reader should be able to:Envision data science opportunitiesDiscuss data science…

Book details

List price: $39.99
Copyright year: 2013
Publisher: O'Reilly Media, Incorporated
Publication date: 8/16/2013
Binding: Paperback
Pages: 408
Size: 7.09" wide x 9.13" long x 0.86" tall
Weight: 1.694
Language: English

Foster Provost is Professor and NEC Faculty Fellow at the NYU Stern School of Business, where he teaches in the MBA, Business Analytics, and Data Science programs. Former Editor-in-Chief for the journal Machine Learning, Professor Provost has co-founded several successful companies focusing on data science for marketing.

Tom Fawcett was a member of punk band the Native Hipsters and now works as a consultant for a Soho clothing company



Preface



Introduction: Data-Analytic Thinking


The Ubiquity of Data Opportunities


Example: Hurricane Frances


Example: Predicting Customer Churn


Data Science, Engineering, and Data-Driven Decision Making


Data Processing and "Big Data"


From Big Data 1.0 to Big Data 2.0


Data and Data Science Capability as a Strategic Asset


Data-Analytic Thinking


This Book


Data Mining and Data Science, Revisited


Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist


Summary



Business Problems and Data Science Solutions


Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining.


From Business Problems to Data Mining Tasks


Supervised Versus Unsupervised Methods


Data Mining and Its Results


The Data Mining Process


Business Understanding


Data Understanding


Data Preparation


Modeling


Evaluation


Deployment


Implications for Managing the Data Science Team


Other Analytics Techniques and Technologies


Statistics


Database Querying


Data Warehousing


Regression Analysis


Machine Learning and Data Mining


Answering Business Questions with These Techniques


Summary



Introduction to Predictive Modeling: From Correlation to Supervised Segmentation.


Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection.


Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction.


Models, Induction, and Prediction


Supervised Segmentation


Selecting Informative Attributes


Example: Attribute Selection with Information Gain


Supervised Segmentation with Tree-Structured Models


Visualizing Segmentations


Trees as Sets of Rules


Probability Estimation


Example: Addressing the Churn Problem with Tree Induction


Summary



Fitting a Model to Data


Fundamental concepts: Finding "optimal" model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions.


Exemplary techniques: Linear regression; Logistic regression; Support-vector machines.


Classification via Mathematical Functions


Linear Discriminant Functions


Optimizing an Objective Function


An Example of Mining a Linear Discriminant from Data


Linear Discriminant Functions for Scoring and Ranking Instances


Support Vector Machines, Briefly


Regression via Mathematical Functions


Class Probability Estimation and Logistic "Regression"


Logistic Regression: Some Technical Details


Example: Logistic Regression versus Tree Induction


Nonlinear Functions, Support Vector Machines, and Neural Networks


Summary



Overfitting and Its Avoidance


Fundamental concepts: Generalization; Fitting and overfitting; Complexity control. Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization.


Generalization


Overfitting


Overfitting Examined


Holdout Data and Fitting Graphs


Overfitting in Tree Induction


Overfitting in Mathematical Functions


Example: Overfitting Linear Functions


Example: Why Is Overfitting Bad?


From Holdout Evaluation to Cross-Validation


The Churn Dataset Revisited


Learning Curves


Overfitting Avoidance and Complexity Control


Avoiding Overfitting with Tree Induction


A General Method for Avoiding Overfitting


Avoiding Overfitting for Parameter Optimization


Summary



Similarity, Neighbors, and Clusters


Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation.


Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity.


Similarity and Distance


Nearest-Neighbor Reasoning


Example: Whiskey Analytics


Nearest Neighbors for Predictive Modeling


How Many Neighbors and How Much Influence?


Geometric Interpretation, Overfitting, and Complexity Control


Issues with Nearest-Neighbor Methods


Some Important Technical Details Relating to Similarities and Neighbors


Heterogeneous Attributes


Other Distance Functions


Combining Functions: Calculating Scores from Neighbors


Clustering


Example: Whiskey Analytics Revisited


Hierarchical Clustering


Nearest Neighbors Revisited: Clustering Around Centroids


Example: Clustering Business News Stories


Understanding the Results of Clustering


Using Supervised Learning to Generate Cluster Descriptions


Stepping Back: Solving a Business Problem Versus Data Exploration


Summary



Decision Analytic Thinking I: What Is a Good Model?


Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines.


Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison.


Evaluating Classifiers


Plain Accuracy and Its Problems


The Confusion Matrix


Problems with Unbalanced Classes


Problems with Unequal Costs and Benefits


Generalizing Beyond Classification


A Key Analytical Framework: Expected Value


Using Expected Value to Frame Classifier Use


Using Expected Value to Frame Classifier Evaluation


Evaluation, Baseline Performance, and Implications for Investments in Data


Summary



Visualizing Model Performance


Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results.


Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves.


Ranking Instead of Classifying


Profit Curves


ROC Graphs and Curves


The Area Under the ROC Curve (AUC)


Cumulative Response and Lift Curves


Example: Performance Analytics for Churn Modeling


Summary



Evidence and Probabilities


Fundamental concepts: Explicit evidence combination with Bayes' Rule; Probabilistic reasoning via assumptions of conditional independence.


Exemplary techniques: Naive Bayes classification; Evidence lift.


Example: Targeting Online Consumers With Advertisements


Combining Evidence Probabilistically


Joint Probability and Independence


Bayes' Rule


Applying Bayes' Rule to Data Science


Conditional Independence and Naive Bayes


Advantages and Disadvantages of Naive Bayes


A Model of Evidence "Lift"


Example: Evidence Lifts from Facebook "Likes"


Evidence in Action: Targeting Consumers with Ads


Summary



Representing and Mining Text


Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining.


Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models.


Why Text Is Important


Why Text Is Difficult


Representation


Bag of Words


Term Frequency


Measuring Sparseness: Inverse Document Frequency


Combining Them: TFIDF


Example: Jazz Musicians


The Relationship of IDF to Entropy


Beyond Bag of Words


N-gram Sequences


Named Entity Extraction


Topic Models


Example: Mining News Stories to Predict Stock Price Movement


The Task


The Data


Data Preprocessing


Results


Summary



Decision Analytic Thinking II: Toward Analytical Engineering


Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available.


Exemplary technique: Expected value as a framework for data science solution design.


Targeting the Best Prospects for a Charity Mailing


The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces


A Brief Digression on Selection Bias


Our Churn Example Revisited with Even More Sophistication


The Expected Value Framework: Structuring a More Complicated Business Problem


Assessing the Influence of the Incentive


From an Expected Value Decomposition to a Data Science Solution


Summary



Other Data Science Tasks and Techniques


Fundamental concepts: Our fundamental concepts as the basis of many common data science techniques; The importance of familiarity with the building blocks of data science.


Exemplary techniques: Association and co - occurrences; Behavior profiling; Link prediction; Data reduction; Latent information mining; Movie recommendation; Bias-variance decomposition of error; Ensembles of models; Causal reasoning from data.


Co-occurrences and Associations: Finding Items That Go Together


Measuring Surprise: Lift and Leverage


Example: Beer and Lottery Tickets


Associations Among Facebook Likes


Profiling: Finding Typical Behavior


Link Prediction and Social Recommendation


Data Reduction, Latent Information, and Movie Recommendation


Bias, Variance, and Ensemble Methods


Data-Driven Causal Explanation and a Viral Marketing Example


Summary



Data Science and Business Strategy


Fundamental concepts: Our principles as the basis of success for a data-driven business; Acquiring and sustaining competitive advantage via data science; The importance of careful curation of data science capability.


Thinking Data-Analytically, Redux


Achieving Competitive Advantage with Data Science


Sustaining Competitive Advantage with Data Science


Formidable Historical Advantage


Unique Intellectual Property


Unique Intangible Collateral Assets


Superior Data Scientists


Superior Data Science Management


Attracting and Nurturing Data Scientists and Their Teams


Examine Data Science Case Studies


Be Ready to Accept Creative Ideas from Any Source


Be Ready to Evaluate Proposals for Data Science Projects


Example Data Mining Proposal


Flaws in the Big Red Proposal


A Firm's Data Science Maturity



Conclusion


The Fundamental Concepts of Data Science


Applying Our Fundamental Concepts to a New Problem: Mining Mobile Device Data


Changing the Way We Think about Solutions to Business Problems


What Data Can't Do: Humans in the Loop, Revisited


Privacy, Ethics, and Mining Data About Individuals


Is There More to Data Science?


Final Example: From Crowd-Sourcing to Cloud-Sourcing


Final Words



Proposal Review Guide



Another Sample Proposal


Glossary


Bibliography


Index