Data Mining with R Learning with Case Studies

Name: Data Mining with R Learning with Case Studies
Price: 5.7 USD
Availability: InStock
ISBN: 9781439810187

ISBN-10: 1439810184

ISBN-13: 9781439810187

Edition: 2010

Authors: Luis Torgo

List price: $69.99

30 day, 100% satisfaction guarantee!

Marketplace

1 new & used from $5.70

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

This book provides a self-contained introduction to the use of R for exploratory data mining and machine learning. Employing a practical, learn-by-doing approach, the author presents a series of representative case studies from ecology, financial prediction, fraud detection, and bioinformatics, including all of the necessary steps, code, and data. These examples demonstrate how to address important data mining issues, such as handling data sets with too many variables, and illustrate key concepts, including outlier detection and semisupervised learning. A supporting web page provides additional code and data for further study.

Book details

List price: $69.99
Copyright year: 2010
Publisher: Taylor & Francis Group
Publication date: 11/19/2010
Binding: Mixed Media
Pages: 305
Size: 6.25" wide x 9.25" long x 0.75" tall
Weight: 1.254



Preface


Acknowledgments


List of Figures


List of Tables



Introduction



How to Read This Book?



A Short Introduction to R



Starting with R



R Objects



Vectors



Vectorization



Factors



Generating Sequences



Sub-Setting



Matrices and Arrays



Lists



Data Frames



Creating New Functions



Objects, Classes, and Methods



Managing Your Sessions



A Short Introduction to MySQL



Predicting Algae Blooms



Problem Description and Objectives



Data Description



Loading the Data into R



Data Visualization and Summarization



Unknown Values



Removing the Observations with Unknown Values



Filling in the Unknowns with the Most Frequent Values



Filling in the Unknown Values by Exploring Correlations



Filling in the Unknown Values by Exploring Similarities between Cases



Obtaining Prediction Models



Multiple Linear Regression



Regression Trees



Model Evaluation and Selection



Predictions for the Seven Algae



Summary



Predicting Stock Market Returns



Problem Description and Objectives



The Available Data



Handling Time-Dependent Data in R



Reading the Data from the CSV File



Getting the Data from the Web



Reading the Data from a MySQL Database



Loading the Data into R Running on Windows



Loading the Data into R Running on Linux



Defining the Prediction Tasks



What to Predict?



Which Predictors?



The Prediction Tasks



Evaluation Criteria



The Prediction Models



How Will the Training Data Be Used?



The Modeling Tools



Artificial Neural Networks



Support Vector Machines



Multivariate Adaptive Regression Splines



From Predictions into Actions



How Will the Predictions Be Used?



Trading-Related Evaluation Criteria



Putting Everything Together: A Simulated Trader



Model Evaluation and Selection



Monte Carlo Estimates



Experimental Comparisons



Results Analysis



The Trading System



Evaluation of the Final Test Data



An Online Trading System



Summary



Detecting Fraudulent Transactions



Problem Description and Objectives



The Available Data



Loading the Data into R



Exploring the Dataset



Data Problems



Unknown Values



Few Transactions of Some Products



Defining the Data Mining Tasks



Different Approaches to the Problem



Unsupervised Techniques



Supervised Techniques



Semi-Supervised Techniques



Evaluation Criteria



Precision and Recall



Lift Charts and Precision/Recall Curves



Normalized Distance to Typical Price



Experimental Methodology



Obtaining Outlier Rankings



Unsupervised Approaches



The Modified Box Plot Rule



Local Outlier Factors (LOF)



Clustering-Based Outlier Rankings (OR<sub>h</sub>)



Supervised Approaches



The Class Imbalance Problem



Naive Bayes



AdaBoost



Semi-Supervised Approaches



Summary



Classifying Microarray Samples



Problem Description and Objectives



Brief Background on Microarray Experiments



The ALL Dataset



The Available Data



Exploring the Dataset



Gene (Feature) Selection



Simple Filters Based on Distribution Properties



ANOVA Filters



Filtering Using Random Forests



Filtering Using Feature Clustering Ensembles



Predicting Cytogenetic Abnormalities



Defining the Prediction Task



The Evaluation Metric



The Experimental Procedure



The Modeling Techniques



Random Forests



k-Nearest Neighbors



Comparing the Models



Summary


Bibliography


Subject Index


Index of Data Mining Topics


Index of R Functions