Skip to content

Data Mining with R Learning with Case Studies

Best in textbook rentals since 2012!

ISBN-10: 1439810184

ISBN-13: 9781439810187

Edition: 2010

Authors: Luis Torgo

List price: $69.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

This book provides a self-contained introduction to the use of R for exploratory data mining and machine learning. Employing a practical, learn-by-doing approach, the author presents a series of representative case studies from ecology, financial prediction, fraud detection, and bioinformatics, including all of the necessary steps, code, and data. These examples demonstrate how to address important data mining issues, such as handling data sets with too many variables, and illustrate key concepts, including outlier detection and semisupervised learning. A supporting web page provides additional code and data for further study.
Customers also bought

Book details

List price: $69.99
Copyright year: 2010
Publisher: Taylor & Francis Group
Publication date: 11/19/2010
Binding: Mixed Media
Pages: 305
Size: 6.25" wide x 9.25" long x 0.75" tall
Weight: 1.254

Preface
Acknowledgments
List of Figures
List of Tables
Introduction
How to Read This Book?
A Short Introduction to R
Starting with R
R Objects
Vectors
Vectorization
Factors
Generating Sequences
Sub-Setting
Matrices and Arrays
Lists
Data Frames
Creating New Functions
Objects, Classes, and Methods
Managing Your Sessions
A Short Introduction to MySQL
Predicting Algae Blooms
Problem Description and Objectives
Data Description
Loading the Data into R
Data Visualization and Summarization
Unknown Values
Removing the Observations with Unknown Values
Filling in the Unknowns with the Most Frequent Values
Filling in the Unknown Values by Exploring Correlations
Filling in the Unknown Values by Exploring Similarities between Cases
Obtaining Prediction Models
Multiple Linear Regression
Regression Trees
Model Evaluation and Selection
Predictions for the Seven Algae
Summary
Predicting Stock Market Returns
Problem Description and Objectives
The Available Data
Handling Time-Dependent Data in R
Reading the Data from the CSV File
Getting the Data from the Web
Reading the Data from a MySQL Database
Loading the Data into R Running on Windows
Loading the Data into R Running on Linux
Defining the Prediction Tasks
What to Predict?
Which Predictors?
The Prediction Tasks
Evaluation Criteria
The Prediction Models
How Will the Training Data Be Used?
The Modeling Tools
Artificial Neural Networks
Support Vector Machines
Multivariate Adaptive Regression Splines
From Predictions into Actions
How Will the Predictions Be Used?
Trading-Related Evaluation Criteria
Putting Everything Together: A Simulated Trader
Model Evaluation and Selection
Monte Carlo Estimates
Experimental Comparisons
Results Analysis
The Trading System
Evaluation of the Final Test Data
An Online Trading System
Summary
Detecting Fraudulent Transactions
Problem Description and Objectives
The Available Data
Loading the Data into R
Exploring the Dataset
Data Problems
Unknown Values
Few Transactions of Some Products
Defining the Data Mining Tasks
Different Approaches to the Problem
Unsupervised Techniques
Supervised Techniques
Semi-Supervised Techniques
Evaluation Criteria
Precision and Recall
Lift Charts and Precision/Recall Curves
Normalized Distance to Typical Price
Experimental Methodology
Obtaining Outlier Rankings
Unsupervised Approaches
The Modified Box Plot Rule
Local Outlier Factors (LOF)
Clustering-Based Outlier Rankings (OR<sub>h</sub>)
Supervised Approaches
The Class Imbalance Problem
Naive Bayes
AdaBoost
Semi-Supervised Approaches
Summary
Classifying Microarray Samples
Problem Description and Objectives
Brief Background on Microarray Experiments
The ALL Dataset
The Available Data
Exploring the Dataset
Gene (Feature) Selection
Simple Filters Based on Distribution Properties
ANOVA Filters
Filtering Using Random Forests
Filtering Using Feature Clustering Ensembles
Predicting Cytogenetic Abnormalities
Defining the Prediction Task
The Evaluation Metric
The Experimental Procedure
The Modeling Techniques
Random Forests
k-Nearest Neighbors
Comparing the Models
Summary
Bibliography
Subject Index
Index of Data Mining Topics
Index of R Functions