Data Mining Practical Machine Learning Tools and Techniques with Java Implementations

Name: Data Mining Practical Machine Learning Tools and Techniques with Java Implementations
Price: 5.69 USD
Availability: InStock
ISBN: 9781558605527

ISBN-10: 1558605525

ISBN-13: 9781558605527

Edition: 1999

Authors: Ian H. Witten, Eibe Frank, Jim Gray

List price: $60.95

30 day, 100% satisfaction guarantee!

Marketplace

3 new & used from $5.69

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

This work offers a grounding in machine learning concepts combined with practical advice on applying machine learning tools and techniques in real-world data mining situations.

Book details

List price: $60.95
Copyright year: 1999
Publisher: Elsevier Science & Technology Books
Publication date: 10/11/1999
Binding: Paperback
Pages: 371
Size: 7.25" wide x 9.25" long x 1.00" tall
Weight: 1.430
Language: English

Eibe Frank is a researcher in the Machine Learning group at the University of Waikato. He holds a degree in computer science from the University of Karlsruhe in Germany and is the author of several papers, both presented at machine learning conferences and published in machine learning journals.



Foreword


Preface



What's it all about?



Data mining and machine learning


Describing structural patterns


Machine learning


Data mining



Simple examples: The weather problem and others


The weather problem


Contact lenses: An idealized problem


Irises: A classic numeric dataset


CPU performance: Introducing numeric prediction


Labor negotiations: A more realistic example


Soybean classification: A classic machine learning success



Fielded applications


Decisions involving judgment


Screening images


Load forecasting


Diagnosis


Marketing and sales



Machine learning and statistics



Generalization as search


Enumerating the concept space


Bias



Data mining and ethics



Further reading



Input: Concepts, instances, attributes



What's a concept?



What's in an example?



What's in an attribute?



Preparing the input


Gathering the data together


Arff format


Attribute types


Missing values


Inaccurate values


Getting to know your data



Further reading



Output: Knowledge representation



Decision tables



Decision trees



Classification rules



Association rules



Rules with exceptions



Rules involving relations



Trees for numeric prediction



Instance-based representation



Clusters



Further reading



Algorithms: The basic methods



Inferring rudimentary rules


Missing values and numeric attributes


Discussion



Statistical modeling


Missing values and numeric attributes


Discussion



Divide and conquer: Constructing decision trees


Calculating information


Highly branching attributes


Discussion



Covering algorithms: Constructing rules


Rules versus trees


A simple covering algorithm


Rules versus decision lists



Mining association rules


Item sets


Association rules


Generating rules efficiently


Discussion



Linear models


Numeric prediction


Classification


Discussion



Instance-based learning


The distance function


Discussion



Further reading



Credibility: Evaluating what's been learned



Training and testing



Predicting performance



Cross-validation



Other estimates


Leave-one-out


The bootstrap



Comparing data mining schemes



Predicting probabilities


Quadratic loss function


Informational loss function


Discussion



Counting the cost


Lift charts


ROC curves


Cost-sensitive learning


Discussion



Evaluating numeric prediction



The minimum description length principle



Applying MDL to clustering



Further reading



Implementations: Real machine learning schemes



Decision trees


Numeric attributes


Missing values


Pruning


Estimating error rates


Complexity of decision tree induction


From trees to rules


C4.5: Choices and options


Discussion



Classification rules


Criteria for choosing tests


Missing values, numeric attributes


Good rules and bad rules


Generating good rules


Generating good decision lists


Probability measure for rule evaluation


Evaluating rules using a test set


Obtaining rules from partial decision trees


Rules with exceptions


Discussion



Extending linear classification: Support vector machines


The maximum margin hyperplane


Nonlinear class boundaries


Discussion



Instance-based learning


Reducing the number of exemplars


Pruning noisy exemplars


Weighting attributes


Generalizing exemplars


Distance functions for generalized exemplars


Generalized distance functions


Discussion



Numeric prediction


Model trees


Building the tree


Pruning the tree


Nominal attributes


Missing values


Pseudo-code for model tree induction


Locally weighted linear regression


Discussion



Clustering


Iterative distance-based clustering


Incremental clustering


Category utility


Probability-based clustering


The EM algorithm


Extending the mixture model


Bayesian clustering


Discussion



Moving on: Engineering the input and output



Attribute selection


Scheme-independent selection


Searching the attribute space


Scheme-specific selection



Discretizing numeric attributes


Unsupervised discretization


Entropy-based discretization


Other discretization methods


Entropy-based versus error-based discretization


Converting discrete to numeric attributes



Automatic data cleansing


Improving decision trees


Robust regression


Detecting anomalies



Combining multiple models


Bagging


Boosting


Stacking


Error-correcting output codes



Further reading



Nuts and bolts: Machine learning algorithms in Java



Getting started



Javadoc and the class library


Classes, instances, and packages


The weka.core package


The weka.classifiers package


Other packages


Indexes



Processing datasets using the machine learning programs


Using M5'


Generic options


Scheme-specific options


Classifiers


Meta-learning shemes


Filters


Association rules


Clustering



Embedded machine learning


A simple message classifier



Writing new learning schemes


An example classifier


Conventions for implementing classifiers


Writing filters


An example filter


Conventions for writing filters



Looking forward



Learning from massive datasets



Visualizing machine learning


Visualizing the input


Visualizing the output



Incorporating domain knowledge



Text mining


Finding key phrases for documents


Finding information in running text


Soft parsing



Mining the World Wide Web



Further reading


References


Index


About the authors