Skip to content

Minimum Description Length Principle

ISBN-10: 0262072815

ISBN-13: 9780262072816

Edition: 2007

Authors: Peter D. Gr�nwald, Jorma Rissanen

List price: $55.00
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!


The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern. This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology. Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.
Customers also bought

Book details

List price: $55.00
Copyright year: 2007
Publisher: MIT Press
Publication date: 3/23/2007
Binding: Hardcover
Pages: 736
Size: 7.00" wide x 9.00" long x 1.50" tall
Weight: 2.684
Language: English

Peter D. Gr�nwald is a researcher at CWI, the National Research Institute for Mathematics andComputer Science, Amsterdam, the Netherlands. He is also affiliated with EURANDOM, the EuropeanResearch Institute for the Study of Stochastic Phenomena, Eindhoven, the Netherlands.

List of Figures
Series Foreword
Introductory Material
Learning, Regularity, and Compression
Regularity and Learning
Regularity and Compression
Solomonoff's Breakthrough - Kolmogorov Complexity
Making the Idea Applicable
Crude MDL, Refined MDL and Universal Coding
From Crude to Refined MDL
Universal Coding and Refined MDL
Refined MDL for Model Selection
Refined MDL for Prediction and Hypothesis Selection
Some Remarks on Model Selection
Model Selection among Non-Nested Models
Goals of Model vs. Point Hypothesis Selection
The MDL Philosophy
MDL, Occam's Razor, and the "True Model"
Answer to Criticism No. 1
Answer to Criticism No. 2
History and Forms of MDL
What Is MDL?
MDL Literature
Summary and Outlook
Probabilistic and Statistical Preliminaries
General Mathematical Preliminaries
Probabilistic Preliminaries
Definitions; Notational Conventions
Probabilistic Sources
Limit Theorems and Statements
Probabilistic Models
Probabilistic Model Classes
Kinds of Probabilistic Models
Terminological Preliminaries
Modeling Preliminaries: Goals and Methods for Inductive Inference
Basic Concepts of Bayesian Statistics
Summary and Outlook
Information-Theoretic Preliminaries
Coding Preliminaries
Restriction to Prefix Coding Systems; Descriptions as Messages
Different Kinds of Codes
Assessing the Efficiency of Description Methods
The Most Important Section of This Book: Probabilities and Code Lengths
The Kraft Inequality
Code Lengths "Are" Probabilities
Immediate Insights and Consequences
Probabilities and Code Lengths, Part II
(Relative) Entropy and the Information Inequality
Uniform Codes, Maximum Entropy, and Minimax Codelength
Summary, Outlook, Further Reading
Information-Theoretic Properties of Statistical Models
Likelihood and Observed Fisher Information
KL Divergence and Expected Fisher Information
Maximum Likelihood: Data vs. Parameters
Summary and Outlook
Crude Two-Part Code MDL
Introduction: Making Two-Part MDL Precise
Two-Part Code MDL for Markov Chain Selection
The Code C[subscript 2]
The Code C[subscript 1]
Crude Two-Part Code MDL for Markov Chains
Simplistic Two-Part Code MDL Hypothesis Selection
Two-Part MDL for Tasks Other Than Hypothesis Selection
Behavior of Two-Part Code MDL
Two-Part Code MDL and Maximum Likelihood
The Maximum Likelihood Principle
MDL vs. ML
MDL as a Maximum Probability Principle
Computing and Approximating Two-Part MDL in Practice
Justifying Crude MDL: Consistency and Code Design
A General Consistency Result
Code Design for Two-Part Code MDL
Summary and Outlook
Appendix: Proof of Theorem 5.1
Universal Coding
Universal Coding with Countable Models
Universal Coding: The Basic Idea
Two-part Codes as Simple Universal Codes
From Universal Codes to Universal Models
Formal Definition of Universality
The Finite Case
Minimax Regret and Normalized ML
NML vs. Two-Part vs. Bayes
The Countably Infinite Case
The Two-Part and Bayesian Codes
The NML Code
Prequential Universal Models
Distributions as Prediction Strategies
Bayes Is Prequential; NML and Two-part Are Not
The Prequential Plug-In Model
Individual vs. Stochastic Universality
Stochastic Redundancy
Uniformly Universal Models
Summary, Outlook and Further Reading
Parametric Models: Normalized Maximum Likelihood
Asymptotic Expansion of Parametric Complexity
The Meaning of [Characters not reproducible]
Complexity and Functional Form
KL Divergence and Distinguishability
Complexity and Volume
Complexity and the Number of Distinguishable Distributions
Explicit and Simplified Computations
Parametric Models: Bayes
The Bayesian Regret
Basic Interpretation of Theorem 8.1
Bayes Meets Minimax - Jeffreys' Prior
Jeffreys' Prior and the Boundary
How to Prove the Bayesian and NML Regret Theorems
Proof Sketch of Theorem 8.1
Beyond Exponential Families
Proof Sketch of Theorem 7.1
Stochastic Universality
Appendix: Proofs of Theorem 8.1 and Theorem 8.2
Parametric Models: Prequential Plug-in
Prequential Plug-in for Exponential Families
The Plug-in vs. the Bayes Universal Model
More Precise Asymptotics
Parametric Models: Two-Part
The Ordinary Two-Part Universal Model
Derivation of the Two-Part Code Regret
Proof Sketch of Theorem 10.1
The Conditional Two-Part Universal Code
Conditional Two-Part Codes for Discrete Exponential Families
Distinguishability and the Phase Transition
Summary and Outlook
NML With Infinite Complexity
Examples of Undefined NML Distribution
Examples of Undefined Jeffreys' Prior
Metauniversal Codes
Constrained Parametric Complexity
Meta-Two-Part Coding
Renormalized Maximum Likelihood
NML with Luckiness
Asymptotic Expansion of LNML
Conditional Universal Models
Bayesian Approach with Jeffreys' Prior
Conditional NML
Liang and Barron's Approach
Summary and Remarks
Appendix: Proof of Theorem 11.4
Linear Regression
Prelude: The Normal Location Family
Least-Squares Estimation
The Normal Equations
Composition of Experiments
Penalized Least-Squares
The Linear Model
Bayesian Linea Model M[superscript X] with Gaussian Prior
Bayesian Linear Models M[superscript X] and S[superscript X] with Noninformative Priors
Universal Models for Linear Regression
Bayes and LNML
Bayes-Jeffreys and CNML
Beyond Parametrics
CUP: Unions of Parametric Models
CUP vs. Parametric Models
Universal Codes Based on Histograms
Redundancy of Universal CUP Histogram Codes
Nonparametric Redundancy
Standard CUP Universal Codes
Minimax Nonparametric Redundancy
Gaussian Process Regression
Kernelization of Bayesian Linear Regression
Gaussian Processes
Gaussian Processes as Universal Models
Conclusion and Further Reading
Refined MDL
MDL Model Selection
Simple Refined MDL Model Selection
Compression Interpretation
Counting Interpretation
Bayesian Interpretation
Prequential Interpretation
General Parametric Model Selection
Models with Infinite Complexities
Comparing Many or Infinitely Many Models
The General Picture
Practical Issues in MDL Model Selection
Calculating Universal Codelengths
Computational Efficiency and Practical Quality of Non-NML Universal Codes
Model Selection with Conditional NML and Plug-in Codes
General Warnings about Model Selection
MDL Model Selection for Linear Regression
Rissanen's RNML Approach
Hansen and Yu's gMDL Approach
Liang and Barron's Approach
Worst Case vs. Average Case
MDL Prediction and Estimation
MDL for Prediction and Predictive Estimation
Prequential MDL Estimators
Prequential MDL Estimators Are Consistent
Parametric and Nonparametric Examples
Cesaro KL consistency vs. KL consistency
Two-Part Code MDL for Point Hypothesis Selection
Discussion of Two-Part Consistency Theorem
MDL Parameter Estimation
MDL Estimators vs. Luckiness ML Estimators
What Estimator To Use?
Comparison to Bayesian Estimators
Summary and Outlook
Appendix: Proof of Theorem 15.3
MDL Consistency and Convergence
The Scenarios Considered
Consistency: Prequential and Two-Part MDL Estimators
Consistency: MDL Model Selection
Selection between a Union of Parametric Models
Nonparametric Model Selection Based on CUP Model Class
MDL Consistency Peculiarities
Risks and Rates
Relations between Divergences and Risk Measures
Minimax Rates
MDL Rates of Convergence
Prequential and Two-Part MDL Estimators
MDL Model Selection
MDL in Context
MDL and Frequentist Paradigms
Sanity Check or Design Principle?
The Weak Prequential Principle
MDL vs. Frequentist Principles: Remaining Issues
MDL and Bayesian Inference
Luckiness Functions vs. Prior Distributions
MDL, Bayes, and Occam
MDL and Brands of Bayesian Statistics
Conclusion: a Common Future after All?
Combining the Best of AIC and BIC
Strict Minimum Message Length
Comparison to MDL
The Wallace-Freeman Estimator
MDL and Prequential Analysis
MDL and Cross-Validation
MDL and Maximum Entropy
Kolmogorov Complexity and Structure Function
MDL and Individual Sequence Prediction
MDL and Statistical Learning Theory
Structural Risk Minimization
PAC-Bayesian Approaches
PAC-Bayesand MDL
The Road Ahead
Additional Background
The Exponential or "Maximum Entropy" Families
Definition and Overview
Basic Properties
Mean-Value, Canonical, and Other Parameterizations
The Mean Value Parameterization
Other Parameterizations
Relating Mean-Value and Canonical Parameters
Exponential Families of General Probabilistic Sources
Fisher Information Definitions and Characterizations
Information-Theoretic Properties of Exponential Families
Robustness of Exponential Family Codes
If [Characters not reproducible][subscript mean] Does Not Contain the Mean
Behavior at the ML Estimate [Beta]
Behavior of the ML Estimate [Beta]
Central Limit Theorem
Large Deviations
Maximum Entropy and Minimax Codelength
Exponential Families and Maximum Entropy
Exponential Families and Minimax Codelength
The Compression Game
Likelihood Ratio Families and Renyi Divergences
The Likelihood Ratio Family
List of Symbols
Subject Index