| |

| |

Preface | |

| |

| |

Acknowledgments | |

| |

| |

About the Author | |

| |

| |

| |

From Data to Models: Complexity and Challenges in Understanding Biological, Ecological, and Natural Systems | |

| |

| |

| |

Introduction | |

| |

| |

| |

Layout of the Book | |

| |

| |

References | |

| |

| |

| |

Fundamentals of Neural Networks and Models for Linear Data Analysis | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Neural Networks and Their Capabilities | |

| |

| |

| |

Inspirations from Biology | |

| |

| |

| |

Modeling Information Processing in Neurons | |

| |

| |

| |

Neuron Models and Learning Strategies | |

| |

| |

| |

Threshold Neuron as a Simple Classifier | |

| |

| |

| |

Learning Models for Neurons and Neural Assemblies | |

| |

| |

| |

Hebbian Learning | |

| |

| |

| |

Unsupervised or Competitive Learning | |

| |

| |

| |

Supervised Learning | |

| |

| |

| |

Perceptron with Supervised Learning as a Classifier | |

| |

| |

| |

Perceptron Learning Algorithm | |

| |

| |

| |

A Practical Example of Perceptron on a Larger Realistic Data Set: Identifying the Origin of Fish from the Growth-Ring Diameter of Scales | |

| |

| |

| |

Comparison of Perceptron with Linear Discriminant Function Analysis in Statistics | |

| |

| |

| |

Multi-Output Perceptron for Multicategory Classification | |

| |

| |

| |

Higher-Dimensional Classification Using Perceptron | |

| |

| |

| |

Perceptron Summary | |

| |

| |

| |

Linear Neuron for Linear Classification and Prediction | |

| |

| |

| |

Learning with the Delta Rule | |

| |

| |

| |

Linear Neuron as a Classifier | |

| |

| |

| |

Classification Properties of a Linear Neuron as a Subset of Predictive Capabilities | |

| |

| |

| |

Example: Linear Neuron as a Predictor | |

| |

| |

| |

A Practical Example of Linear Prediction: Predicting the Heat Influx in a Home | |

| |

| |

| |

Comparison of Linear Neuron Model with Linear Regression | |

| |

| |

| |

Example: Multiple Input Linear Neuron Model-Improving the Prediction Accuracy of Heat Influx in a Home | |

| |

| |

| |

Comparison of a Multiple-Input Linear Neuron with Multiple Linear Regression | |

| |

| |

| |

Multiple Linear Neuron Models | |

| |

| |

| |

Comparison of a Multiple Linear Neuron Network with Canonical Correlation Analysis | |

| |

| |

| |

Linear Neuron and Linear Network Summary | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Neural Networks for Nonlinear Pattern Recognition | |

| |

| |

| |

Overview and Introduction | |

| |

| |

| |

Multilayer Perceptron | |

| |

| |

| |

Nonlinear Neurons | |

| |

| |

| |

Neuron Activation Functions | |

| |

| |

| |

Sigmoid Functions | |

| |

| |

| |

Gaussian Functions | |

| |

| |

| |

Example: Population Growth Modeling Using a Nonlinear Neuron | |

| |

| |

| |

Comparison of Nonlinear Neuron with Nonlinear Regression Analysis | |

| |

| |

| |

One-Input Multilayer Nonlinear Networks | |

| |

| |

| |

Processing with a Single Nonlinear Hidden Neuron | |

| |

| |

| |

Examples: Modeling Cyclical Phenomena with Multiple Nonlinear Neurons | |

| |

| |

| |

Example 1: Approximating a Square Wave | |

| |

| |

| |

Example 2: Modeling Seasonal Species Migration | |

| |

| |

| |

Two-Input Multilayer Perceptron Network | |

| |

| |

| |

Processing of Two-Dimensional Inputs by Nonlinear Neurons | |

| |

| |

| |

Network Output | |

| |

| |

| |

Examples: Two-Dimensional Prediction and Classification | |

| |

| |

| |

Example 1: Two-Dimensional Nonlinear Function Approximation | |

| |

| |

| |

Example 2: Two-Dimensional Nonlinear Classification Model | |

| |

| |

| |

Multidimensional Data Modeling with Nonlinear Multilayer Perceptron Networks | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Learning of Nonlinear Patterns by Neural Networks | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Supervised Training of Networks for Nonlinear Pattern Recognition | |

| |

| |

| |

Gradient Descent and Error Minimization | |

| |

| |

| |

Backpropagation Learning | |

| |

| |

| |

Example: Backpropagation Training-A Hand Computation | |

| |

| |

| |

Error Gradient with Respect to Output Neuron Weights | |

| |

| |

| |

The Error Gradient with Respect to the Hidden-Neuron Weights | |

| |

| |

| |

Application of Gradient Descent in Backpropagation Learning | |

| |

| |

| |

Batch Learning | |

| |

| |

| |

Learning Rate and Weight Update | |

| |

| |

| |

Example-by-Example (Online) Learning | |

| |

| |

| |

Momentum | |

| |

| |

| |

Example: Backpropagation Learning Computer Experiment | |

| |

| |

| |

Single-Input Single-Output Network with Multiple Hidden Neurons | |

| |

| |

| |

Multiple-Input, Multiple-Hidden Neuron, and Single-Output Network | |

| |

| |

| |

Multiple-Input, Multiple-Hidden Neuron, Multiple-Output Network | |

| |

| |

| |

Example: Backpropagation Learning Case Study-Solving a Complex Classification Problem | |

| |

| |

| |

Delta-Bar-Delta Learning (Adaptive Learning Rate) Method | |

| |

| |

| |

Example: Network Training with Delta-Bar-Delta-A Hand Computation | |

| |

| |

| |

Example: Delta-Bar-Delta with Momentum-A Hand Computation | |

| |

| |

| |

Network Training with Delta-Bar Delta-A Computer Experiment | |

| |

| |

| |

Comparison of Delta-Bar-Delta Method with Backpropagation | |

| |

| |

| |

Example: Network Training with Delta-Bar-Delta-A Case Study | |

| |

| |

| |

Steepest Descent Method | |

| |

| |

| |

Example: Network Training with Steepest Descent-Hand Computation | |

| |

| |

| |

Example: Network Training with Steepest Descent-A Computer Experiment | |

| |

| |

| |

Second-Order Methods of Error Minimization and Weight Optimization | |

| |

| |

| |

QuickProp | |

| |

| |

| |

Example: Network Training with QuickProp-A Hand Computation | |

| |

| |

| |

Example: Network Training with QuickProp-A Computer Experiment | |

| |

| |

| |

Comparison of QuickProp with Steepest Descent, Delta-Bar-Delta, and Backpropagation | |

| |

| |

| |

General Concept of Second-Order Methods of Error Minimization | |

| |

| |

| |

Gauss-Newton Method | |

| |

| |

| |

Network Training with the Gauss-Newton Method-A Hand Computation | |

| |

| |

| |

Example: Network Training with Gauss-Newton Method-A Computer Experiment | |

| |

| |

| |

The Levenberg-Marquardt Method | |

| |

| |

| |

Example: Network Training with LM Method-A Hand Computation | |

| |

| |

| |

Network Training with the LM Method-A Computer Experiment | |

| |

| |

| |

Comparison of the Efficiency of the First-Order and Second-Order Methods in Minimizing Error | |

| |

| |

| |

Comparison of the Convergence Characteristics of First-Order and Second-Order Learning Methods | |

| |

| |

| |

Backpropagation | |

| |

| |

| |

Steepest Descent Method | |

| |

| |

| |

Gauss-Newton Method | |

| |

| |

| |

Levenberg-Marquardt Method | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Implementation of Neural Network Models for Extracting Reliable Patterns from Data | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Bias-Variance Tradeoff | |

| |

| |

| |

Improving Generalization of Neural Networks | |

| |

| |

| |

Illustration of Early Stopping | |

| |

| |

| |

Effect of Initial Random Weights | |

| |

| |

| |

Weight Structure of the Trained Networks | |

| |

| |

| |

Effect of Random Sampling | |

| |

| |

| |

Effect of Model Complexity: Number of Hidden Neurons | |

| |

| |

| |

Summary on Early Stopping | |

| |

| |

| |

Regularization | |

| |

| |

| |

Reducing Structural Complexity of Networks by Pruning | |

| |

| |

| |

Optimal Brain Damage | |

| |

| |

| |

Example of Network Pruning with Optimal Brain Damage | |

| |

| |

| |

Network Pruning Based on Variance of Network Sensitivity | |

| |

| |

| |

Illustration of Application of Variance Nullity in Pruning Weights | |

| |

| |

| |

Pruning Hidden Neurons Based on Variance Nullity of Sensitivity | |

| |

| |

| |

Robustness of a Network to Perturbation of Weights | |

| |

| |

| |

Confidence Intervals for Weights | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Data Exploration, Dimensionality Reduction, and Feature Extraction | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Example: Thermal Conductivity of Wood in Relation to Correlated Input Data | |

| |

| |

| |

Data Visualization | |

| |

| |

| |

Correlation Scatter Plots and Histograms | |

| |

| |

| |

Parallel Visualization | |

| |

| |

| |

Projecting Multidimensional Data onto Two-Dimensional Plane | |

| |

| |

| |

Correlation and Covariance between Variables | |

| |

| |

| |

Normalization of Data | |

| |

| |

| |

Standardization | |

| |

| |

| |

Simple Range Scaling | |

| |

| |

| |

Whitening-Normalization of Correlated Multivariate Data | |

| |

| |

| |

Selecting Relevant Inputs | |

| |

| |

| |

Statistical Tools for Variable Selection | |

| |

| |

| |

Partial Correlation | |

| |

| |

| |

Multiple Regression and Best-Subsets Regression | |

| |

| |

| |

Dimensionality Reduction and Feature Extraction | |

| |

| |

| |

Multicollinearity | |

| |

| |

| |

Principal Component Analysis (PCA) | |

| |

| |

| |

Partial Least-Squares Regression | |

| |

| |

| |

Outlier Detection | |

| |

| |

| |

Noise | |

| |

| |

| |

Case Study: Illustrating Input Selection and Dimensionality Reduction for a Practical Problem | |

| |

| |

| |

Data Preprocessing and Preliminary Modeling | |

| |

| |

| |

PCA-Based Neural Network Modeling | |

| |

| |

| |

Effect of Hidden Neurons for Non-PCA- and PCA-Based Approaches | |

| |

| |

| |

Case Study Summary | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Assessment of Uncertainty of Neural Network Models Using Bayesian Statistics | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Estimating Weight Uncertainty Using Bayesian Statistics | |

| |

| |

| |

Quality Criterion | |

| |

| |

| |

Incorporating Bayesian Statistics to Estimate Weight Uncertainty | |

| |

| |

| |

Square Error | |

| |

| |

| |

Intrinsic Uncertainty of Targets for Multivariate Output | |

| |

| |

| |

Probability Density Function of Weights | |

| |

| |

| |

Example Illustrating Generation of Probability Distribution of Weights | |

| |

| |

| |

Estimation of Geophysical Parameters from Remote Sensing: A Case Study | |

| |

| |

| |

Assessing Uncertainty of Neural Network Outputs Using Bayesian Statistics | |

| |

| |

| |

Example Illustrating Uncertainty Assessment of Output Errors | |

| |

| |

| |

Total Network Output Errors | |

| |

| |

| |

Error Correlation and Covariance Matrices | |

| |

| |

| |

Statistical Analysis of Error Covariance | |

| |

| |

| |

Decomposition of Total Output Error into Model Error and Intrinsic Noise | |

| |

| |

| |

Assessing the Sensitivity of Network Outputs to Inputs | |

| |

| |

| |

Approaches to Determine the Influence of Inputs on Outputs in Feedforward Networks | |

| |

| |

| |

Methods Based on Magnitude of Weights | |

| |

| |

| |

Sensitivity Analysis | |

| |

| |

| |

Example: Comparison of Methods to Assess the Influence of Inputs on Outputs | |

| |

| |

| |

Uncertainty of Sensitivities | |

| |

| |

| |

Example Illustrating Uncertainty Assessment of Network Sensitivity to Inputs | |

| |

| |

| |

PCA Decomposition of Inputs and Outputs | |

| |

| |

| |

PCA-Based Neural Network Regression | |

| |

| |

| |

Neural Network Sensitivities | |

| |

| |

| |

Uncertainty of Input Sensitivity | |

| |

| |

| |

PCA-Regularized Jacobians | |

| |

| |

| |

Case Study Summary | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Discovering Unknown Clusters in Data with Self-Organizing Maps | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Structure of Unsupervised Networks | |

| |

| |

| |

Learning in Unsupervised Networks | |

| |

| |

| |

Implementation of Competitive Learning | |

| |

| |

| |

Winner Selection Based on Neuron Activation | |

| |

| |

| |

Winner Selection Based on Distance to Input Vector | |

| |

| |

| |

Other Distance Measures | |

| |

| |

| |

Competitive Learning Example | |

| |

| |

| |

Recursive Versus Batch Learning | |

| |

| |

| |

Illustration of the Calculations Involved in Winner Selection | |

| |

| |

| |

Network Training | |

| |

| |

| |

Self-Organizing Feature Maps | |

| |

| |

| |

Learning in Self-Organizing Map Networks | |

| |

| |

| |

Selection of Neighborhood Geometry | |

| |

| |

| |

Training of Self-Organizing Maps | |

| |

| |

| |

Neighbor Strength | |

| |

| |

| |

Example: Training Self-Organizing Networks with a Neighbor Feature | |

| |

| |

| |

Neighbor Matrix and Distance to Neighbors from the Winner | |

| |

| |

| |

Shrinking Neighborhood Size with Iterations | |

| |

| |

| |

Learning Rate Decay | |

| |

| |

| |

Weight Update Incorporating Learning Rate and Neighborhood Decay | |

| |

| |

| |

Recursive and Batch Training and Relation to K-Means Clustering | |

| |

| |

| |

Two Phases of Self-Organizing Map Training | |

| |

| |

| |

Example: Illustrating Self-Organizing Map Learning with a Hand Calculation | |

| |

| |

| |

SOM Case Study: Determination of Mastitis Health Status of Dairy Herd from Combined Milk Traits | |

| |

| |

| |

Example of Two-Dimensional Self-Organizing Maps: Clustering Canadian and Alaskan Salmon Based on the Diameter of Growth Rings of the Scales | |

| |

| |

| |

Map Structure and Initialization | |

| |

| |

| |

Map Training | |

| |

| |

| |

U-Matrix | |

| |

| |

| |

Map Initialization | |

| |

| |

| |

Example: Training Two-Dimensional Maps on Multidimensional Data | |

| |

| |

| |

Data Visualization | |

| |

| |

| |

Map Structure and Training | |

| |

| |

| |

U-Matrix | |

| |

| |

| |

Point Estimates of Probability Density of Inputs Captured by the Map | |

| |

| |

| |

Quantization Error | |

| |

| |

| |

Accuracy of Retrieval of Input Data from the Map | |

| |

| |

| |

Forming Clusters on the Map | |

| |

| |

| |

Approaches to Clustering | |

| |

| |

| |

Example Illustrating Clustering on a Trained Map | |

| |

| |

| |

Finding Optimum Clusters on the Map with the Ward Method | |

| |

| |

| |

Finding Optimum Clusters by K-Means Clustering | |

| |

| |

| |

Validation of a Trained Map | |

| |

| |

| |

n-Fold Cross Validation | |

| |

| |

| |

Evolving Self-Organizing Maps | |

| |

| |

| |

Growing Cell Structure of Map | |

| |

| |

| |

Centroid Method for Mapping Input Data onto Positions between Neurons on the Map | |

| |

| |

| |

Dynamic Self-Organizing Maps with Controlled Growth (GSOM) | |

| |

| |

| |

Example: Application of Dynamic Self-Organizing Maps | |

| |

| |

| |

Evolving Tree | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

| |

Neural Networks for Time-Series Forecasting | |

| |

| |

| |

Introduction and Overview | |

| |

| |

| |

Linear Forecasting of Time-Series with Statistical and Neural Network Models | |

| |

| |

| |

Example Case Study: Regulating Temperature of a Furnace | |

| |

| |

| |

Multistep-Ahead Linear Forecasting | |

| |

| |

| |

Neural Networks for Nonlinear Time-Series Forecasting | |

| |

| |

| |

Focused Time-Lagged and Dynamically Driven Recurrent Networks | |

| |

| |

| |

Focused Time-Lagged Feedforward Networks | |

| |

| |

| |

Spatio-Temporal Time-Lagged Networks | |

| |

| |

| |

Example: Spatio-Temporal Time-Lagged Network-Regulating Temperature in a Furnace | |

| |

| |

| |

Single-Step Forecasting with Neural NARx Model | |

| |

| |

| |

Multistep Forecasting with Neural NARx Model | |

| |

| |

| |

Case Study: River Flow Forecasting | |

| |

| |

| |

Linear Model for River Flow Forecasting | |

| |

| |

| |

Nonlinear Neural (NARx) Model for River Flow Forecasting | |

| |

| |

| |

Input Sensitivity | |

| |

| |

| |

Hybrid Linear (ARIMA) and Nonlinear Neural Network Models | |

| |

| |

| |

Case Study: Forecasting the Annual Number of Sunspots | |

| |

| |

| |

Automatic Generation of Network Structure Using Simplest Structure Concept | |

| |

| |

| |

Case Study: Forecasting Air Pollution with Automatic Neural Network Model Generation | |

| |

| |

| |

Generalized Neuron Network | |

| |

| |

| |

Case Study: Short-Term Load Forecasting with a Generalized Neuron Network | |

| |

| |

| |

Dynamically Driven Recurrent Networks | |

| |

| |

| |

Recurrent Networks with Hidden Neuron Feedback | |

| |

| |

| |

Encapsulating Long-Term Memory | |

| |

| |

| |

Structure and Operation of the Elman Network | |

| |

| |

| |

Training Recurrent Networks | |

| |

| |

| |

Network Training Example: Hand Calculation | |

| |

| |

| |

Recurrent Learning Network Application Case Study: Rainfall Runoff Modeling | |

| |

| |

| |

Two-Step-Ahead Forecasting with Recurrent Networks | |

| |

| |

| |

Real-Time Recurrent Learning Case Study: Two-Step-Ahead Stream Flow Forecasting | |

| |

| |

| |

Recurrent Networks with Output Feedback | |

| |

| |

| |

Encapsulating Long-Term Memory in Recurrent Networks with Output Feedback | |

| |

| |

| |

Application of a Recurrent Net with Output and Error Feedback and Exogenous Inputs: (NARIMAx) Case Study: Short-Term Temperature Forecasting | |

| |

| |

| |

Training of Recurrent Nets with Output Feedback | |

| |

| |

| |

Fully Recurrent Network | |

| |

| |

| |

Fully Recurrent Network Practical Application Case Study: Short-Term Electricity Load Forecasting | |

| |

| |

| |

Bias and Variance in Time-Series Forecasting | |

| |

| |

| |

Decomposition of Total Error into Bias and Variance Components | |

| |

| |

| |

Example Illustrating Bias-Variance Decomposition | |

| |

| |

| |

Long-Term Forecasting | |

| |

| |

| |

Case Study: Long-Term Forecasting with Multiple Neural Networks (MNNs) | |

| |

| |

| |

Input Selection for Time-Series Forecasting | |

| |

| |

| |

Input Selection from Nonlinearly Dependent Variables | |

| |

| |

| |

Partial Mutual Information Method | |

| |

| |

| |

Generalized Regression Neural Network | |

| |

| |

| |

Self-Organizing Maps for Input Selection | |

| |

| |

| |

Genetic Algorithms for Input Selection | |

| |

| |

| |

Practical Application of Input Selection Methods for Time-Series Forecasting | |

| |

| |

| |

Input Selection Case Study: Selecting Inputs for Forecasting River Salinity | |

| |

| |

| |

Summary | |

| |

| |

Problems | |

| |

| |

References | |

| |

| |

Appendix | |

| |

| |

Index | |