| |
| |
Preface | |
| |
| |
Acknowledgments | |
| |
| |
About the Author | |
| |
| |
| |
From Data to Models: Complexity and Challenges in Understanding Biological, Ecological, and Natural Systems | |
| |
| |
| |
Introduction | |
| |
| |
| |
Layout of the Book | |
| |
| |
References | |
| |
| |
| |
Fundamentals of Neural Networks and Models for Linear Data Analysis | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Neural Networks and Their Capabilities | |
| |
| |
| |
Inspirations from Biology | |
| |
| |
| |
Modeling Information Processing in Neurons | |
| |
| |
| |
Neuron Models and Learning Strategies | |
| |
| |
| |
Threshold Neuron as a Simple Classifier | |
| |
| |
| |
Learning Models for Neurons and Neural Assemblies | |
| |
| |
| |
Hebbian Learning | |
| |
| |
| |
Unsupervised or Competitive Learning | |
| |
| |
| |
Supervised Learning | |
| |
| |
| |
Perceptron with Supervised Learning as a Classifier | |
| |
| |
| |
Perceptron Learning Algorithm | |
| |
| |
| |
A Practical Example of Perceptron on a Larger Realistic Data Set: Identifying the Origin of Fish from the Growth-Ring Diameter of Scales | |
| |
| |
| |
Comparison of Perceptron with Linear Discriminant Function Analysis in Statistics | |
| |
| |
| |
Multi-Output Perceptron for Multicategory Classification | |
| |
| |
| |
Higher-Dimensional Classification Using Perceptron | |
| |
| |
| |
Perceptron Summary | |
| |
| |
| |
Linear Neuron for Linear Classification and Prediction | |
| |
| |
| |
Learning with the Delta Rule | |
| |
| |
| |
Linear Neuron as a Classifier | |
| |
| |
| |
Classification Properties of a Linear Neuron as a Subset of Predictive Capabilities | |
| |
| |
| |
Example: Linear Neuron as a Predictor | |
| |
| |
| |
A Practical Example of Linear Prediction: Predicting the Heat Influx in a Home | |
| |
| |
| |
Comparison of Linear Neuron Model with Linear Regression | |
| |
| |
| |
Example: Multiple Input Linear Neuron Model-Improving the Prediction Accuracy of Heat Influx in a Home | |
| |
| |
| |
Comparison of a Multiple-Input Linear Neuron with Multiple Linear Regression | |
| |
| |
| |
Multiple Linear Neuron Models | |
| |
| |
| |
Comparison of a Multiple Linear Neuron Network with Canonical Correlation Analysis | |
| |
| |
| |
Linear Neuron and Linear Network Summary | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Neural Networks for Nonlinear Pattern Recognition | |
| |
| |
| |
Overview and Introduction | |
| |
| |
| |
Multilayer Perceptron | |
| |
| |
| |
Nonlinear Neurons | |
| |
| |
| |
Neuron Activation Functions | |
| |
| |
| |
Sigmoid Functions | |
| |
| |
| |
Gaussian Functions | |
| |
| |
| |
Example: Population Growth Modeling Using a Nonlinear Neuron | |
| |
| |
| |
Comparison of Nonlinear Neuron with Nonlinear Regression Analysis | |
| |
| |
| |
One-Input Multilayer Nonlinear Networks | |
| |
| |
| |
Processing with a Single Nonlinear Hidden Neuron | |
| |
| |
| |
Examples: Modeling Cyclical Phenomena with Multiple Nonlinear Neurons | |
| |
| |
| |
Example 1: Approximating a Square Wave | |
| |
| |
| |
Example 2: Modeling Seasonal Species Migration | |
| |
| |
| |
Two-Input Multilayer Perceptron Network | |
| |
| |
| |
Processing of Two-Dimensional Inputs by Nonlinear Neurons | |
| |
| |
| |
Network Output | |
| |
| |
| |
Examples: Two-Dimensional Prediction and Classification | |
| |
| |
| |
Example 1: Two-Dimensional Nonlinear Function Approximation | |
| |
| |
| |
Example 2: Two-Dimensional Nonlinear Classification Model | |
| |
| |
| |
Multidimensional Data Modeling with Nonlinear Multilayer Perceptron Networks | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Learning of Nonlinear Patterns by Neural Networks | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Supervised Training of Networks for Nonlinear Pattern Recognition | |
| |
| |
| |
Gradient Descent and Error Minimization | |
| |
| |
| |
Backpropagation Learning | |
| |
| |
| |
Example: Backpropagation Training-A Hand Computation | |
| |
| |
| |
Error Gradient with Respect to Output Neuron Weights | |
| |
| |
| |
The Error Gradient with Respect to the Hidden-Neuron Weights | |
| |
| |
| |
Application of Gradient Descent in Backpropagation Learning | |
| |
| |
| |
Batch Learning | |
| |
| |
| |
Learning Rate and Weight Update | |
| |
| |
| |
Example-by-Example (Online) Learning | |
| |
| |
| |
Momentum | |
| |
| |
| |
Example: Backpropagation Learning Computer Experiment | |
| |
| |
| |
Single-Input Single-Output Network with Multiple Hidden Neurons | |
| |
| |
| |
Multiple-Input, Multiple-Hidden Neuron, and Single-Output Network | |
| |
| |
| |
Multiple-Input, Multiple-Hidden Neuron, Multiple-Output Network | |
| |
| |
| |
Example: Backpropagation Learning Case Study-Solving a Complex Classification Problem | |
| |
| |
| |
Delta-Bar-Delta Learning (Adaptive Learning Rate) Method | |
| |
| |
| |
Example: Network Training with Delta-Bar-Delta-A Hand Computation | |
| |
| |
| |
Example: Delta-Bar-Delta with Momentum-A Hand Computation | |
| |
| |
| |
Network Training with Delta-Bar Delta-A Computer Experiment | |
| |
| |
| |
Comparison of Delta-Bar-Delta Method with Backpropagation | |
| |
| |
| |
Example: Network Training with Delta-Bar-Delta-A Case Study | |
| |
| |
| |
Steepest Descent Method | |
| |
| |
| |
Example: Network Training with Steepest Descent-Hand Computation | |
| |
| |
| |
Example: Network Training with Steepest Descent-A Computer Experiment | |
| |
| |
| |
Second-Order Methods of Error Minimization and Weight Optimization | |
| |
| |
| |
QuickProp | |
| |
| |
| |
Example: Network Training with QuickProp-A Hand Computation | |
| |
| |
| |
Example: Network Training with QuickProp-A Computer Experiment | |
| |
| |
| |
Comparison of QuickProp with Steepest Descent, Delta-Bar-Delta, and Backpropagation | |
| |
| |
| |
General Concept of Second-Order Methods of Error Minimization | |
| |
| |
| |
Gauss-Newton Method | |
| |
| |
| |
Network Training with the Gauss-Newton Method-A Hand Computation | |
| |
| |
| |
Example: Network Training with Gauss-Newton Method-A Computer Experiment | |
| |
| |
| |
The Levenberg-Marquardt Method | |
| |
| |
| |
Example: Network Training with LM Method-A Hand Computation | |
| |
| |
| |
Network Training with the LM Method-A Computer Experiment | |
| |
| |
| |
Comparison of the Efficiency of the First-Order and Second-Order Methods in Minimizing Error | |
| |
| |
| |
Comparison of the Convergence Characteristics of First-Order and Second-Order Learning Methods | |
| |
| |
| |
Backpropagation | |
| |
| |
| |
Steepest Descent Method | |
| |
| |
| |
Gauss-Newton Method | |
| |
| |
| |
Levenberg-Marquardt Method | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Implementation of Neural Network Models for Extracting Reliable Patterns from Data | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Bias-Variance Tradeoff | |
| |
| |
| |
Improving Generalization of Neural Networks | |
| |
| |
| |
Illustration of Early Stopping | |
| |
| |
| |
Effect of Initial Random Weights | |
| |
| |
| |
Weight Structure of the Trained Networks | |
| |
| |
| |
Effect of Random Sampling | |
| |
| |
| |
Effect of Model Complexity: Number of Hidden Neurons | |
| |
| |
| |
Summary on Early Stopping | |
| |
| |
| |
Regularization | |
| |
| |
| |
Reducing Structural Complexity of Networks by Pruning | |
| |
| |
| |
Optimal Brain Damage | |
| |
| |
| |
Example of Network Pruning with Optimal Brain Damage | |
| |
| |
| |
Network Pruning Based on Variance of Network Sensitivity | |
| |
| |
| |
Illustration of Application of Variance Nullity in Pruning Weights | |
| |
| |
| |
Pruning Hidden Neurons Based on Variance Nullity of Sensitivity | |
| |
| |
| |
Robustness of a Network to Perturbation of Weights | |
| |
| |
| |
Confidence Intervals for Weights | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Data Exploration, Dimensionality Reduction, and Feature Extraction | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Example: Thermal Conductivity of Wood in Relation to Correlated Input Data | |
| |
| |
| |
Data Visualization | |
| |
| |
| |
Correlation Scatter Plots and Histograms | |
| |
| |
| |
Parallel Visualization | |
| |
| |
| |
Projecting Multidimensional Data onto Two-Dimensional Plane | |
| |
| |
| |
Correlation and Covariance between Variables | |
| |
| |
| |
Normalization of Data | |
| |
| |
| |
Standardization | |
| |
| |
| |
Simple Range Scaling | |
| |
| |
| |
Whitening-Normalization of Correlated Multivariate Data | |
| |
| |
| |
Selecting Relevant Inputs | |
| |
| |
| |
Statistical Tools for Variable Selection | |
| |
| |
| |
Partial Correlation | |
| |
| |
| |
Multiple Regression and Best-Subsets Regression | |
| |
| |
| |
Dimensionality Reduction and Feature Extraction | |
| |
| |
| |
Multicollinearity | |
| |
| |
| |
Principal Component Analysis (PCA) | |
| |
| |
| |
Partial Least-Squares Regression | |
| |
| |
| |
Outlier Detection | |
| |
| |
| |
Noise | |
| |
| |
| |
Case Study: Illustrating Input Selection and Dimensionality Reduction for a Practical Problem | |
| |
| |
| |
Data Preprocessing and Preliminary Modeling | |
| |
| |
| |
PCA-Based Neural Network Modeling | |
| |
| |
| |
Effect of Hidden Neurons for Non-PCA- and PCA-Based Approaches | |
| |
| |
| |
Case Study Summary | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Assessment of Uncertainty of Neural Network Models Using Bayesian Statistics | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Estimating Weight Uncertainty Using Bayesian Statistics | |
| |
| |
| |
Quality Criterion | |
| |
| |
| |
Incorporating Bayesian Statistics to Estimate Weight Uncertainty | |
| |
| |
| |
Square Error | |
| |
| |
| |
Intrinsic Uncertainty of Targets for Multivariate Output | |
| |
| |
| |
Probability Density Function of Weights | |
| |
| |
| |
Example Illustrating Generation of Probability Distribution of Weights | |
| |
| |
| |
Estimation of Geophysical Parameters from Remote Sensing: A Case Study | |
| |
| |
| |
Assessing Uncertainty of Neural Network Outputs Using Bayesian Statistics | |
| |
| |
| |
Example Illustrating Uncertainty Assessment of Output Errors | |
| |
| |
| |
Total Network Output Errors | |
| |
| |
| |
Error Correlation and Covariance Matrices | |
| |
| |
| |
Statistical Analysis of Error Covariance | |
| |
| |
| |
Decomposition of Total Output Error into Model Error and Intrinsic Noise | |
| |
| |
| |
Assessing the Sensitivity of Network Outputs to Inputs | |
| |
| |
| |
Approaches to Determine the Influence of Inputs on Outputs in Feedforward Networks | |
| |
| |
| |
Methods Based on Magnitude of Weights | |
| |
| |
| |
Sensitivity Analysis | |
| |
| |
| |
Example: Comparison of Methods to Assess the Influence of Inputs on Outputs | |
| |
| |
| |
Uncertainty of Sensitivities | |
| |
| |
| |
Example Illustrating Uncertainty Assessment of Network Sensitivity to Inputs | |
| |
| |
| |
PCA Decomposition of Inputs and Outputs | |
| |
| |
| |
PCA-Based Neural Network Regression | |
| |
| |
| |
Neural Network Sensitivities | |
| |
| |
| |
Uncertainty of Input Sensitivity | |
| |
| |
| |
PCA-Regularized Jacobians | |
| |
| |
| |
Case Study Summary | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Discovering Unknown Clusters in Data with Self-Organizing Maps | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Structure of Unsupervised Networks | |
| |
| |
| |
Learning in Unsupervised Networks | |
| |
| |
| |
Implementation of Competitive Learning | |
| |
| |
| |
Winner Selection Based on Neuron Activation | |
| |
| |
| |
Winner Selection Based on Distance to Input Vector | |
| |
| |
| |
Other Distance Measures | |
| |
| |
| |
Competitive Learning Example | |
| |
| |
| |
Recursive Versus Batch Learning | |
| |
| |
| |
Illustration of the Calculations Involved in Winner Selection | |
| |
| |
| |
Network Training | |
| |
| |
| |
Self-Organizing Feature Maps | |
| |
| |
| |
Learning in Self-Organizing Map Networks | |
| |
| |
| |
Selection of Neighborhood Geometry | |
| |
| |
| |
Training of Self-Organizing Maps | |
| |
| |
| |
Neighbor Strength | |
| |
| |
| |
Example: Training Self-Organizing Networks with a Neighbor Feature | |
| |
| |
| |
Neighbor Matrix and Distance to Neighbors from the Winner | |
| |
| |
| |
Shrinking Neighborhood Size with Iterations | |
| |
| |
| |
Learning Rate Decay | |
| |
| |
| |
Weight Update Incorporating Learning Rate and Neighborhood Decay | |
| |
| |
| |
Recursive and Batch Training and Relation to K-Means Clustering | |
| |
| |
| |
Two Phases of Self-Organizing Map Training | |
| |
| |
| |
Example: Illustrating Self-Organizing Map Learning with a Hand Calculation | |
| |
| |
| |
SOM Case Study: Determination of Mastitis Health Status of Dairy Herd from Combined Milk Traits | |
| |
| |
| |
Example of Two-Dimensional Self-Organizing Maps: Clustering Canadian and Alaskan Salmon Based on the Diameter of Growth Rings of the Scales | |
| |
| |
| |
Map Structure and Initialization | |
| |
| |
| |
Map Training | |
| |
| |
| |
U-Matrix | |
| |
| |
| |
Map Initialization | |
| |
| |
| |
Example: Training Two-Dimensional Maps on Multidimensional Data | |
| |
| |
| |
Data Visualization | |
| |
| |
| |
Map Structure and Training | |
| |
| |
| |
U-Matrix | |
| |
| |
| |
Point Estimates of Probability Density of Inputs Captured by the Map | |
| |
| |
| |
Quantization Error | |
| |
| |
| |
Accuracy of Retrieval of Input Data from the Map | |
| |
| |
| |
Forming Clusters on the Map | |
| |
| |
| |
Approaches to Clustering | |
| |
| |
| |
Example Illustrating Clustering on a Trained Map | |
| |
| |
| |
Finding Optimum Clusters on the Map with the Ward Method | |
| |
| |
| |
Finding Optimum Clusters by K-Means Clustering | |
| |
| |
| |
Validation of a Trained Map | |
| |
| |
| |
n-Fold Cross Validation | |
| |
| |
| |
Evolving Self-Organizing Maps | |
| |
| |
| |
Growing Cell Structure of Map | |
| |
| |
| |
Centroid Method for Mapping Input Data onto Positions between Neurons on the Map | |
| |
| |
| |
Dynamic Self-Organizing Maps with Controlled Growth (GSOM) | |
| |
| |
| |
Example: Application of Dynamic Self-Organizing Maps | |
| |
| |
| |
Evolving Tree | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
| |
Neural Networks for Time-Series Forecasting | |
| |
| |
| |
Introduction and Overview | |
| |
| |
| |
Linear Forecasting of Time-Series with Statistical and Neural Network Models | |
| |
| |
| |
Example Case Study: Regulating Temperature of a Furnace | |
| |
| |
| |
Multistep-Ahead Linear Forecasting | |
| |
| |
| |
Neural Networks for Nonlinear Time-Series Forecasting | |
| |
| |
| |
Focused Time-Lagged and Dynamically Driven Recurrent Networks | |
| |
| |
| |
Focused Time-Lagged Feedforward Networks | |
| |
| |
| |
Spatio-Temporal Time-Lagged Networks | |
| |
| |
| |
Example: Spatio-Temporal Time-Lagged Network-Regulating Temperature in a Furnace | |
| |
| |
| |
Single-Step Forecasting with Neural NARx Model | |
| |
| |
| |
Multistep Forecasting with Neural NARx Model | |
| |
| |
| |
Case Study: River Flow Forecasting | |
| |
| |
| |
Linear Model for River Flow Forecasting | |
| |
| |
| |
Nonlinear Neural (NARx) Model for River Flow Forecasting | |
| |
| |
| |
Input Sensitivity | |
| |
| |
| |
Hybrid Linear (ARIMA) and Nonlinear Neural Network Models | |
| |
| |
| |
Case Study: Forecasting the Annual Number of Sunspots | |
| |
| |
| |
Automatic Generation of Network Structure Using Simplest Structure Concept | |
| |
| |
| |
Case Study: Forecasting Air Pollution with Automatic Neural Network Model Generation | |
| |
| |
| |
Generalized Neuron Network | |
| |
| |
| |
Case Study: Short-Term Load Forecasting with a Generalized Neuron Network | |
| |
| |
| |
Dynamically Driven Recurrent Networks | |
| |
| |
| |
Recurrent Networks with Hidden Neuron Feedback | |
| |
| |
| |
Encapsulating Long-Term Memory | |
| |
| |
| |
Structure and Operation of the Elman Network | |
| |
| |
| |
Training Recurrent Networks | |
| |
| |
| |
Network Training Example: Hand Calculation | |
| |
| |
| |
Recurrent Learning Network Application Case Study: Rainfall Runoff Modeling | |
| |
| |
| |
Two-Step-Ahead Forecasting with Recurrent Networks | |
| |
| |
| |
Real-Time Recurrent Learning Case Study: Two-Step-Ahead Stream Flow Forecasting | |
| |
| |
| |
Recurrent Networks with Output Feedback | |
| |
| |
| |
Encapsulating Long-Term Memory in Recurrent Networks with Output Feedback | |
| |
| |
| |
Application of a Recurrent Net with Output and Error Feedback and Exogenous Inputs: (NARIMAx) Case Study: Short-Term Temperature Forecasting | |
| |
| |
| |
Training of Recurrent Nets with Output Feedback | |
| |
| |
| |
Fully Recurrent Network | |
| |
| |
| |
Fully Recurrent Network Practical Application Case Study: Short-Term Electricity Load Forecasting | |
| |
| |
| |
Bias and Variance in Time-Series Forecasting | |
| |
| |
| |
Decomposition of Total Error into Bias and Variance Components | |
| |
| |
| |
Example Illustrating Bias-Variance Decomposition | |
| |
| |
| |
Long-Term Forecasting | |
| |
| |
| |
Case Study: Long-Term Forecasting with Multiple Neural Networks (MNNs) | |
| |
| |
| |
Input Selection for Time-Series Forecasting | |
| |
| |
| |
Input Selection from Nonlinearly Dependent Variables | |
| |
| |
| |
Partial Mutual Information Method | |
| |
| |
| |
Generalized Regression Neural Network | |
| |
| |
| |
Self-Organizing Maps for Input Selection | |
| |
| |
| |
Genetic Algorithms for Input Selection | |
| |
| |
| |
Practical Application of Input Selection Methods for Time-Series Forecasting | |
| |
| |
| |
Input Selection Case Study: Selecting Inputs for Forecasting River Salinity | |
| |
| |
| |
Summary | |
| |
| |
Problems | |
| |
| |
References | |
| |
| |
Appendix | |
| |
| |
Index | |