Minimum Description Length Principle

Name: Minimum Description Length Principle
Price: 80.25 USD
Availability: InStock
ISBN: 9780262072816

ISBN-10: 0262072815

ISBN-13: 9780262072816

Edition: 2007

Authors: Peter D. Grï¿½nwald, Jorma Rissanen

List price: $55.00

30 day, 100% satisfaction guarantee!

Marketplace

3 new & used from $80.25

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern. This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is…

Book details

List price: $55.00
Copyright year: 2007
Publisher: MIT Press
Publication date: 3/23/2007
Binding: Hardcover
Pages: 504
Size: 7.00" wide x 9.00" long x 1.50" tall
Weight: 2.684
Language: English

Peter D. Grï¿½nwald is a researcher at CWI, the National Research Institute for Mathematics andComputer Science, Amsterdam, the Netherlands. He is also affiliated with EURANDOM, the EuropeanResearch Institute for the Study of Stochastic Phenomena, Eindhoven, the Netherlands.



List of Figures


Series Foreword


Foreword


Preface



Introductory Material



Learning, Regularity, and Compression



Regularity and Learning



Regularity and Compression



Solomonoff's Breakthrough - Kolmogorov Complexity



Making the Idea Applicable



Crude MDL, Refined MDL and Universal Coding



From Crude to Refined MDL



Universal Coding and Refined MDL



Refined MDL for Model Selection



Refined MDL for Prediction and Hypothesis Selection



Some Remarks on Model Selection



Model Selection among Non-Nested Models



Goals of Model vs. Point Hypothesis Selection



The MDL Philosophy



MDL, Occam's Razor, and the "True Model"



Answer to Criticism No. 1



Answer to Criticism No. 2



History and Forms of MDL



What Is MDL?



MDL Literature



Summary and Outlook



Probabilistic and Statistical Preliminaries



General Mathematical Preliminaries



Probabilistic Preliminaries



Definitions; Notational Conventions



Probabilistic Sources



Limit Theorems and Statements



Probabilistic Models



Probabilistic Model Classes



Kinds of Probabilistic Models



Terminological Preliminaries



Modeling Preliminaries: Goals and Methods for Inductive Inference



Consistency



Basic Concepts of Bayesian Statistics



Summary and Outlook



Information-Theoretic Preliminaries



Coding Preliminaries



Restriction to Prefix Coding Systems; Descriptions as Messages



Different Kinds of Codes



Assessing the Efficiency of Description Methods



The Most Important Section of This Book: Probabilities and Code Lengths



The Kraft Inequality



Code Lengths "Are" Probabilities



Immediate Insights and Consequences



Probabilities and Code Lengths, Part II



(Relative) Entropy and the Information Inequality



Uniform Codes, Maximum Entropy, and Minimax Codelength



Summary, Outlook, Further Reading



Information-Theoretic Properties of Statistical Models



Introduction



Likelihood and Observed Fisher Information



KL Divergence and Expected Fisher Information



Maximum Likelihood: Data vs. Parameters



Summary and Outlook



Crude Two-Part Code MDL



Introduction: Making Two-Part MDL Precise



Two-Part Code MDL for Markov Chain Selection



The Code C[subscript 2]



The Code C[subscript 1]



Crude Two-Part Code MDL for Markov Chains



Simplistic Two-Part Code MDL Hypothesis Selection



Two-Part MDL for Tasks Other Than Hypothesis Selection



Behavior of Two-Part Code MDL



Two-Part Code MDL and Maximum Likelihood



The Maximum Likelihood Principle



MDL vs. ML



MDL as a Maximum Probability Principle



Computing and Approximating Two-Part MDL in Practice



Justifying Crude MDL: Consistency and Code Design



A General Consistency Result



Code Design for Two-Part Code MDL



Summary and Outlook



Appendix: Proof of Theorem 5.1



Universal Coding



Universal Coding with Countable Models



Universal Coding: The Basic Idea



Two-part Codes as Simple Universal Codes



From Universal Codes to Universal Models



Formal Definition of Universality



The Finite Case



Minimax Regret and Normalized ML



NML vs. Two-Part vs. Bayes



The Countably Infinite Case



The Two-Part and Bayesian Codes



The NML Code



Prequential Universal Models



Distributions as Prediction Strategies



Bayes Is Prequential; NML and Two-part Are Not



The Prequential Plug-In Model



Individual vs. Stochastic Universality



Stochastic Redundancy



Uniformly Universal Models



Summary, Outlook and Further Reading



Parametric Models: Normalized Maximum Likelihood



Introduction



Preliminaries



Asymptotic Expansion of Parametric Complexity



The Meaning of [Characters not reproducible]



Complexity and Functional Form



KL Divergence and Distinguishability



Complexity and Volume



Complexity and the Number of Distinguishable Distributions



Explicit and Simplified Computations



Parametric Models: Bayes



The Bayesian Regret



Basic Interpretation of Theorem 8.1



Bayes Meets Minimax - Jeffreys' Prior



Jeffreys' Prior and the Boundary



How to Prove the Bayesian and NML Regret Theorems



Proof Sketch of Theorem 8.1



Beyond Exponential Families



Proof Sketch of Theorem 7.1



Stochastic Universality



Appendix: Proofs of Theorem 8.1 and Theorem 8.2



Parametric Models: Prequential Plug-in



Prequential Plug-in for Exponential Families



The Plug-in vs. the Bayes Universal Model



More Precise Asymptotics



Summary



Parametric Models: Two-Part



The Ordinary Two-Part Universal Model



Derivation of the Two-Part Code Regret



Proof Sketch of Theorem 10.1



Discussion



The Conditional Two-Part Universal Code



Conditional Two-Part Codes for Discrete Exponential Families



Distinguishability and the Phase Transition



Summary and Outlook



NML With Infinite Complexity



Introduction



Examples of Undefined NML Distribution



Examples of Undefined Jeffreys' Prior



Metauniversal Codes



Constrained Parametric Complexity



Meta-Two-Part Coding



Renormalized Maximum Likelihood



NML with Luckiness



Asymptotic Expansion of LNML



Conditional Universal Models



Bayesian Approach with Jeffreys' Prior



Conditional NML



Liang and Barron's Approach



Summary and Remarks



Appendix: Proof of Theorem 11.4



Linear Regression



Introduction



Prelude: The Normal Location Family



Least-Squares Estimation



The Normal Equations



Composition of Experiments



Penalized Least-Squares



The Linear Model



Bayesian Linea Model M[superscript X] with Gaussian Prior



Bayesian Linear Models M[superscript X] and S[superscript X] with Noninformative Priors



Universal Models for Linear Regression



NML



Bayes and LNML



Bayes-Jeffreys and CNML



Beyond Parametrics



Introduction



CUP: Unions of Parametric Models



CUP vs. Parametric Models



Universal Codes Based on Histograms



Redundancy of Universal CUP Histogram Codes



Nonparametric Redundancy



Standard CUP Universal Codes



Minimax Nonparametric Redundancy



Gaussian Process Regression



Kernelization of Bayesian Linear Regression



Gaussian Processes



Gaussian Processes as Universal Models



Conclusion and Further Reading



Refined MDL



MDL Model Selection



Introduction



Simple Refined MDL Model Selection



Compression Interpretation



Counting Interpretation



Bayesian Interpretation



Prequential Interpretation



General Parametric Model Selection



Models with Infinite Complexities



Comparing Many or Infinitely Many Models



The General Picture



Practical Issues in MDL Model Selection



Calculating Universal Codelengths



Computational Efficiency and Practical Quality of Non-NML Universal Codes



Model Selection with Conditional NML and Plug-in Codes



General Warnings about Model Selection



MDL Model Selection for Linear Regression



Rissanen's RNML Approach



Hansen and Yu's gMDL Approach



Liang and Barron's Approach



Discussion



Worst Case vs. Average Case



MDL Prediction and Estimation



Introduction



MDL for Prediction and Predictive Estimation



Prequential MDL Estimators



Prequential MDL Estimators Are Consistent



Parametric and Nonparametric Examples



Cesaro KL consistency vs. KL consistency



Two-Part Code MDL for Point Hypothesis Selection



Discussion of Two-Part Consistency Theorem



MDL Parameter Estimation



MDL Estimators vs. Luckiness ML Estimators



What Estimator To Use?



Comparison to Bayesian Estimators



Summary and Outlook



Appendix: Proof of Theorem 15.3



MDL Consistency and Convergence



Introduction



The Scenarios Considered



Consistency: Prequential and Two-Part MDL Estimators



Consistency: MDL Model Selection



Selection between a Union of Parametric Models



Nonparametric Model Selection Based on CUP Model Class



MDL Consistency Peculiarities



Risks and Rates



Relations between Divergences and Risk Measures



Minimax Rates



MDL Rates of Convergence



Prequential and Two-Part MDL Estimators



MDL Model Selection



MDL in Context



MDL and Frequentist Paradigms



Sanity Check or Design Principle?



The Weak Prequential Principle



MDL vs. Frequentist Principles: Remaining Issues



MDL and Bayesian Inference



Luckiness Functions vs. Prior Distributions



MDL, Bayes, and Occam



MDL and Brands of Bayesian Statistics



Conclusion: a Common Future after All?



MDL, AIC and BIC



BIC



AIC



Combining the Best of AIC and BIC



MDL and MML



Strict Minimum Message Length



Comparison to MDL



The Wallace-Freeman Estimator



MDL and Prequential Analysis



MDL and Cross-Validation



MDL and Maximum Entropy



Kolmogorov Complexity and Structure Function



MDL and Individual Sequence Prediction



MDL and Statistical Learning Theory



Structural Risk Minimization



PAC-Bayesian Approaches



PAC-Bayesand MDL



The Road Ahead



Additional Background



The Exponential or "Maximum Entropy" Families



Introduction



Definition and Overview



Basic Properties



Mean-Value, Canonical, and Other Parameterizations



The Mean Value Parameterization



Other Parameterizations



Relating Mean-Value and Canonical Parameters



Exponential Families of General Probabilistic Sources



Fisher Information Definitions and Characterizations



Information-Theoretic Properties of Exponential Families



Introduction



Robustness of Exponential Family Codes



If [Characters not reproducible][subscript mean] Does Not Contain the Mean



Behavior at the ML Estimate [Beta]



Behavior of the ML Estimate [Beta]



Central Limit Theorem



Large Deviations



Maximum Entropy and Minimax Codelength



Exponential Families and Maximum Entropy



Exponential Families and Minimax Codelength



The Compression Game



Likelihood Ratio Families and Renyi Divergences



The Likelihood Ratio Family



Summary


References


List of Symbols


Subject Index