| |
| |
| |
Introduction | |
| |
| |
| |
The Genesis of Bioinformatics | |
| |
| |
| |
Bioinformatics Versus Other Disciplines | |
| |
| |
| |
Further Developments: from Linear Information to Multidimensional Structure Organization | |
| |
| |
| |
Mathematical and Computational Methods | |
| |
| |
| |
Why Mathematical Modeling? | |
| |
| |
| |
Fitting Models to Data | |
| |
| |
| |
Computer Software | |
| |
| |
| |
Applications | |
| |
| |
| |
Mathematical and Computational Methods | |
| |
| |
| |
Probability and Statistics | |
| |
| |
| |
The Rules of Probability Calculus | |
| |
| |
| |
Independence, Conditional Probabilities and Bayes' Rules | |
| |
| |
| |
Random Variables | |
| |
| |
| |
Vector Random Variables | |
| |
| |
| |
Marginal Distributions | |
| |
| |
| |
Operations on Random Variables | |
| |
| |
| |
Notation | |
| |
| |
| |
Expectation and Moments of Random Variables | |
| |
| |
| |
Probability-Generating Functions and Characteristic Functions | |
| |
| |
| |
A Collection of Discrete and Continuous Distributions | |
| |
| |
| |
Bernoulli Trials and the Binomial Distribution | |
| |
| |
| |
The Geometric Distribution | |
| |
| |
| |
The Negative Binomial Distribution | |
| |
| |
| |
The Poisson Distribution | |
| |
| |
| |
The Multinomial Distribution | |
| |
| |
| |
The Hypergeometric Distribution | |
| |
| |
| |
The Normal (Gaussian) Distribution | |
| |
| |
| |
The Exponential Distribution | |
| |
| |
| |
The Gamma Distribution | |
| |
| |
| |
The Beta Distribution | |
| |
| |
| |
Likelihood maximization | |
| |
| |
| |
Binomial Distribution | |
| |
| |
| |
Multinomial distribution | |
| |
| |
| |
Poisson Distribution | |
| |
| |
| |
Geometric Distribution | |
| |
| |
| |
Normal Distribution | |
| |
| |
| |
Exponential Distribution | |
| |
| |
| |
Other Methods of Estimating Parameters: a Comparison | |
| |
| |
| |
Example 1. Uniform Distribution | |
| |
| |
| |
Example 2. Cauchy Distribution | |
| |
| |
| |
Minimum Variance Parameter Estimation | |
| |
| |
| |
The Expectation Maximization Method | |
| |
| |
| |
The Derivations of the Algorithm | |
| |
| |
| |
Examples of Recursive Estimation of Parameters by Using the EM Algorithm | |
| |
| |
| |
Statistical Tests | |
| |
| |
| |
The Idea | |
| |
| |
| |
Parametric Tests | |
| |
| |
| |
Nonparametric Tests | |
| |
| |
| |
Type I and II statistical errors | |
| |
| |
| |
Markov Chains | |
| |
| |
| |
Transition Probability Matrix and State Transition Graph | |
| |
| |
| |
Time Evolution of Probability Distributions of States | |
| |
| |
| |
Classification of States | |
| |
| |
| |
Ergodicity | |
| |
| |
| |
Stationary Distribution | |
| |
| |
| |
Reversible Markov Chains | |
| |
| |
| |
Time-Continuous Markov Chains | |
| |
| |
| |
Markov Chain Monte Carlo (MCMC) Methods | |
| |
| |
| |
Acceptance-Rejection Rule | |
| |
| |
| |
Applications of the Metropolis-Hastings Algorithm | |
| |
| |
| |
Simulated Annealing and MC3 | |
| |
| |
| |
Hidden Markov Models | |
| |
| |
| |
Probability of Occurrence of a Sequence of Symbols | |
| |
| |
| |
Backward Algorithm | |
| |
| |
| |
Forward Algorithm | |
| |
| |
| |
Viterbi Algorithm | |
| |
| |
| |
The Baum-Welch algorithm | |
| |
| |
| |
Exercises | |
| |
| |
| |
Computer Science Algorithms | |
| |
| |
| |
Algorithms | |
| |
| |
| |
Sorting and Quicksort | |
| |
| |
| |
Simple Sort | |
| |
| |
| |
Quicksort | |
| |
| |
| |
String Searches. Fast Search | |
| |
| |
| |
Easy Search | |
| |
| |
| |
Fast Search | |
| |
| |
| |
Index Structures for Strings. Search Tries. Suffix Trees | |
| |
| |
| |
A Treelike Structure in Computer Memory | |
| |
| |
| |
Search Tries | |
| |
| |
| |
Compact Search Tries | |
| |
| |
| |
Suffix Tries and Suffix Trees | |
| |
| |
| |
Suffix Arrays | |
| |
| |
| |
Algorithms for Searching Tries | |
| |
| |
| |
Building Tries | |
| |
| |
| |
Remarks on the Efficiency of the Algorithms | |
| |
| |
| |
The Burrows-Wheeler Transform | |
| |
| |
| |
Inverse transform | |
| |
| |
| |
BW Transform as a Compression Tool | |
| |
| |
| |
BW Transform as a Search Tool for Patterns | |
| |
| |
| |
BW Transform as an Associative, Compressed Memory | |
| |
| |
| |
Computational Complexity of BW Transform | |
| |
| |
| |
Hashing | |
| |
| |
| |
Hashing functions for addressing variables | |
| |
| |
| |
Collisions | |
| |
| |
| |
Statistics of Memory Access Time with Hashing | |
| |
| |
| |
Inquiring About Repetitive Structure of Sequences, Comparing Sequences and Detecting Sequence Overlap by Hashing | |
| |
| |
| |
Exercises | |
| |
| |
| |
Pattern Analysis | |
| |
| |
| |
Feature Extraction | |
| |
| |
| |
Classification | |
| |
| |
| |
Linear Classifiers | |
| |
| |
| |
Linear Classifier Functions and Artificial Neurons | |
| |
| |
| |
Artificial Neural Networks | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Clustering | |
| |
| |
| |
K-means Clustering | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Dimensionality Reduction, Principal Component Analysis | |
| |
| |
| |
Singular-Value Decomposition (SVD) | |
| |
| |
| |
Geometric Interpretation of SVD | |
| |
| |
| |
Partial-Least-Squares (PLS) Method | |
| |
| |
| |
Parametric Transformations | |
| |
| |
| |
Hough Transform | |
| |
| |
| |
Generalized Hough Transforms | |
| |
| |
| |
Geometric Hashing | |
| |
| |
| |
Exercises | |
| |
| |
| |
Optimization | |
| |
| |
| |
Static Optimization | |
| |
| |
| |
Convexity and Concavity | |
| |
| |
| |
Constrained Optimization with Equality Constraints | |
| |
| |
| |
Constrained Optimization with Inequality Constraints | |
| |
| |
| |
Sufficiency of Optimality Conditions for Constrained Problems | |
| |
| |
| |
Computing Solutions to Optimization Problems | |
| |
| |
| |
Linear Programming | |
| |
| |
| |
Quadratic Programming | |
| |
| |
| |
Recursive Optimization Algorithms | |
| |
| |
| |
Dynamic Programming | |
| |
| |
| |
Dynamic Programming Algorithm for a Discrete-Time System | |
| |
| |
| |
Tracing a Path in a Plane | |
| |
| |
| |
Shortest Paths in Arrays and Graphs | |
| |
| |
| |
Combinatorial Optimization | |
| |
| |
| |
Examples of Combinatorial Optimization Problems | |
| |
| |
| |
Time Complexity | |
| |
| |
| |
Decision and Optimization Problems | |
| |
| |
| |
Classes of Problems and Algorithms | |
| |
| |
| |
Suboptimal Algorithms | |
| |
| |
| |
Unsolved Problems | |
| |
| |
| |
Exercises | |
| |
| |
| |
Applications | |
| |
| |
| |
Sequence Alignment | |
| |
| |
| |
Number of Possible Alignments | |
| |
| |
| |
Dot Matrices | |
| |
| |
| |
Scoring Correspondences and Mismatches | |
| |
| |
| |
Developing Scoring Functions | |
| |
| |
| |
Estimating Probabilities of Nucleotide Substitution | |
| |
| |
| |
Parametric Models of Nucleotide Substitution | |
| |
| |
| |
Computing Transition Probabilities | |
| |
| |
| |
Fitting Nucleotide Substitution Models to Data | |
| |
| |
| |
Breaking the Loop of Dependencies | |
| |
| |
| |
Scaling Substitution Probabilities | |
| |
| |
| |
Amino Acid Substitution Matrices | |
| |
| |
| |
Gaps | |
| |
| |
| |
Sequence Alignment by Dynamic Programming | |
| |
| |
| |
The Needleman-Wunsch Alignment Algorithm | |
| |
| |
| |
The Smith-Waterman Algorithm | |
| |
| |
| |
Aligning Sequences Against Databases | |
| |
| |
| |
Methods of Multiple Alignment | |
| |
| |
| |
Exercises | |
| |
| |
| |
Molecular Phylogenetics | |
| |
| |
| |
Trees: Vocabulary and Methods | |
| |
| |
| |
The Vocabulary of Trees | |
| |
| |
| |
Overview of Tree-Building Methodologies | |
| |
| |
| |
Distance-Based Trees | |
| |
| |
| |
Tree-Derived Distance | |
| |
| |
| |
Ultrametric Distances and Molecular-Clock Trees | |
| |
| |
| |
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Algorithm | |
| |
| |
| |
Neighbor-Joining Trees | |
| |
| |
| |
Maximum Likelihood (Felsenstein) Trees | |
| |
| |
| |
Hypotheses and Steps | |
| |
| |
| |
The Pulley Principle | |
| |
| |
| |
Estimating Branch Lengths | |
| |
| |
| |
Estimating the Tree Topology | |
| |
| |
| |
Maximum-Parsimony Trees | |
| |
| |
| |
Minimal Number of Evolutionary Events for a Given Tree | |
| |
| |
| |
Searching for the Optimal Tree Topology | |
| |
| |
| |
Miscellaneous Topics in Phylogenetic Tree Models | |
| |
| |
| |
The Nonparametric Bootstrap Method | |
| |
| |
| |
Variable Substitution Rates, the Felsenstein-Churchill Algorithm and Related Methods | |
| |
| |
| |
The Evolutionary Trace Method and Functional Sites in Proteins | |
| |
| |
| |
Coalescence Theory | |
| |
| |
| |
Neutral Evolution: Interaction of Genetic Drift and Mutation | |
| |
| |
| |
Modeling Genetic Drift | |
| |
| |
| |
Modeling Mutation | |
| |
| |
| |
Coalescence Under Different Demographic Scenarios | |
| |
| |
| |
Statistical Inference on Demographic Hypotheses and Parameters | |
| |
| |
| |
Markov Chain Monte Carlo (MCMC) Methods | |
| |
| |
| |
Approximate Approaches | |
| |
| |
| |
Exercises | |
| |
| |
| |
Genomics | |
| |
| |
| |
The DNA Molecule and the Central Dogma of Molecular Biology | |
| |
| |
| |
Genome Structure | |
| |
| |
| |
Genome Sequencing | |
| |
| |
| |
Restriction Enzymes | |
| |
| |
| |
Electrophoresis | |
| |
| |
| |
Southern Blot | |
| |
| |
| |
The Polymerase Chain Reaction | |
| |
| |
| |
DNA Cloning | |
| |
| |
| |
Chain Termination DNA Sequencing | |
| |
| |
| |
Genome Shotgun Sequencing | |
| |
| |
| |
Genome Assembly Algorithms | |
| |
| |
| |
Growing Contigs from Fragments | |
| |
| |
| |
Detection of Overlaps Between Reads | |
| |
| |
| |
Repetitive Structure of DNA | |
| |
| |
| |
The Shortest Superstring Problem | |
| |
| |
| |
Overlap Graphs and the Hamiltonian Path Problem | |
| |
| |
| |
Sequencing by Hybridization | |
| |
| |
| |
De Bruijn Graphs | |
| |
| |
| |
All l-mers in the Reads | |
| |
| |
| |
The Euler Superpath Problem | |
| |
| |
| |
Further Aspects of DNA Assembly Algorithms | |
| |
| |
| |
Statistics of the Genome Coverage | |
| |
| |
| |
Contigs, Gaps and Anchored Contigs | |
| |
| |
| |
Statistics with Minimum Overlaps Between Fragments, Anchored Contigs | |
| |
| |
| |
Genome Length and Structure Estimation by Sampling l-mers | |
| |
| |
| |
Polymorphisms | |
| |
| |
| |
Genome Annotation | |
| |
| |
| |
Research Tools for Genome Annotation | |
| |
| |
| |
Gene Identification | |
| |
| |
| |
DNA Motifs | |
| |
| |
| |
Annotation by Words and Comparisons of Genome Assemblies | |
| |
| |
| |
Human Chromosome 14 | |
| |
| |
| |
Exercises | |
| |
| |
| |
Proteomics | |
| |
| |
| |
Protein Structure | |
| |
| |
| |
Amino Acids | |
| |
| |
| |
Peptide Bonds | |
| |
| |
| |
Primary Structure | |
| |
| |
| |
Secondary Structure | |
| |
| |
| |
Tertiary Structure | |
| |
| |
| |
Quaternary Structure | |
| |
| |
| |
Experimental Determination of Amino Acid Sequences and Protein Structures | |
| |
| |
| |
Electrophoresis | |
| |
| |
| |
Protein 2D Gels | |
| |
| |
| |
Protein Western Blots | |
| |
| |
| |
Mass Spectrometry | |
| |
| |
| |
Chemical Identification of Amino Acids in Peptides | |
| |
| |
| |
Analysis of Protein 3D Structure by X Ray Diffraction and NMR | |
| |
| |
| |
Other Assays for Protein Compositions and Interactions | |
| |
| |
| |
Computational Methods for Modeling Molecular Structures | |
| |
| |
| |
Molecular-Force-Field Model | |
| |
| |
| |
Molecular Dynamics | |
| |
| |
| |
Hydrogen Bonds | |
| |
| |
| |
Computation and Minimization of RMSD | |
| |
| |
| |
Solutions to the Problem of Minimization of RMSD over Rotations | |
| |
| |
| |
Solutions to the Problem of Minimization of RMSD over Rotations and Translations | |
| |
| |
| |
Solvent-Accessible Surface of a Protein | |
| |
| |
| |
Computational Prediction of Protein Structure and Function | |
| |
| |
| |
Inferring Structures of Proteins | |
| |
| |
| |
Protein Annotation | |
| |
| |
| |
De Novo Methods | |
| |
| |
| |
Comparative Modeling | |
| |
| |
| |
Protein-Ligand Binding Analysis | |
| |
| |
| |
Classification Based on Proteomic Assays | |
| |
| |
| |
Exercises | |
| |
| |
| |
RNA | |
| |
| |
| |
The RNA World Hypothesis | |
| |
| |
| |
The Functions of RNA | |
| |
| |
| |
Reverse Transcription, Sequencing RNA Chains | |
| |
| |
| |
The Northern Blot | |
| |
| |
| |
RNA Primary Structure | |
| |
| |
| |
RNA Secondary Structure | |
| |
| |
| |
RNA Tertiary Structure | |
| |
| |
| |
Computational Prediction of RNA Secondary Structure | |
| |
| |
| |
Nested Structure | |
| |
| |
| |
Maximizing the Number of Pairings Between Bases | |
| |
| |
| |
Minimizing the Energy of RNA Secondary Structure | |
| |
| |
| |
Pseudoknots | |
| |
| |
| |
Prediction of RNA Structure by Comparative Sequence Analysis | |
| |
| |
| |
Exercises | |
| |
| |
| |
DNA Microarrays | |
| |
| |
| |
Design of DNA Microarrays | |
| |
| |
| |
Kinetics of the Binding Process | |
| |
| |
| |
Data Preprocessing and Normalization | |
| |
| |
| |
Normalization Procedures for Single Microarrays | |
| |
| |
| |
Normalization Based on Spiked-in Control RNA | |
| |
| |
| |
RMA Normalization Procedure | |
| |
| |
| |
Correction of Ratio-Intensity Plots for cDNA | |
| |
| |
| |
Statistics of Gene Expression Profiles | |
| |
| |
| |
Modeling Probability Distributions of Gene Expressions | |
| |
| |
| |
Class Prediction and Class Discovery | |
| |
| |
| |
Dimensionality Reduction | |
| |
| |
| |
Example of Application of PCA to Microarray Data | |
| |
| |
| |
Class Discovery | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Class Prediction. Differentially Expressed Genes | |
| |
| |
| |
Multiple Testing, and Analysis of False Discovery Rate (FDR) | |
| |
| |
| |
FDR analysis in ALL versus AML gene expression data | |
| |
| |
| |
The Gene Ontology Database | |
| |
| |
| |
Structure of GO | |
| |
| |
| |
Other Vocabularies of Terms | |
| |
| |
| |
Supporting Results of DNA Microarray Analyses with GO and other Vocabulary Terms | |
| |
| |
| |
Exercises | |
| |
| |
| |
Bioinformatic Databases and Bioinformatic Internet Resources | |
| |
| |
| |
Genomic Databases | |
| |
| |
| |
Proteomic Databases | |
| |
| |
| |
RNA Databases | |
| |
| |
| |
Gene Expression Databases | |
| |
| |
| |
Ontology Databases | |
| |
| |
| |
Databases of Genetic and Proteomic Pathways | |
| |
| |
| |
Programs and Services | |
| |
| |
| |
Clinical Databases | |
| |
| |
References | |
| |
| |
Index | |