| |
| |
Preface | |
| |
| |
Acknowledgments | |
| |
| |
| |
Introduction | |
| |
| |
| |
Basic Terminology | |
| |
| |
| |
The Central Dogma of Molecular Biology | |
| |
| |
| |
Genome | |
| |
| |
| |
Proteome | |
| |
| |
| |
DNA (Deoxyribonucleic Acid) | |
| |
| |
| |
RNA (Ribonucleic Acid) | |
| |
| |
| |
mRNA (messenger RNA) | |
| |
| |
| |
Genetic Code | |
| |
| |
| |
Gene | |
| |
| |
| |
Gene Expression and the Gene Expression Level | |
| |
| |
| |
Protein | |
| |
| |
| |
Overlapping Areas of Research | |
| |
| |
| |
Genomics | |
| |
| |
| |
Proteomics | |
| |
| |
| |
Bioinformatics | |
| |
| |
| |
Transcriptomics and Other -omics | |
| |
| |
| |
Data Mining | |
| |
| |
| |
Basic Analysis Of Gene Expression Microarray Data | |
| |
| |
| |
Introduction | |
| |
| |
| |
Microarray Technology | |
| |
| |
| |
Spotted Microarrays | |
| |
| |
| |
Affymetrix GeneChip “ Microarrays | |
| |
| |
| |
Bead-Based Microarrays | |
| |
| |
| |
Low-Level Preprocessing of Affymetrix Microarrays | |
| |
| |
| |
MAS5 | |
| |
| |
| |
RMA | |
| |
| |
| |
GCRMA | |
| |
| |
| |
PLIER | |
| |
| |
| |
Public Repositories of Microarray Data | |
| |
| |
| |
Microarray Gene Expression Data Society (MGED) Standards | |
| |
| |
| |
Public Databases | |
| |
| |
| |
Gene Expression Omnibus (GEO) | |
| |
| |
| |
ArrayExpress | |
| |
| |
| |
Gene Expression Matrix | |
| |
| |
| |
Elements of Gene Expression Microarray Data Analysis | |
| |
| |
| |
Additional Preprocessing, Quality Assessment, and Filtering | |
| |
| |
| |
Quality Assessment | |
| |
| |
| |
Filtering | |
| |
| |
| |
Basic Exploratory Data Analysis | |
| |
| |
| |
t Test | |
| |
| |
| |
t Test for Equal Variances | |
| |
| |
| |
t Test for Unequal Variances | |
| |
| |
| |
ANOVA F Test | |
| |
| |
| |
SAM t Statistic | |
| |
| |
| |
Limma | |
| |
| |
| |
Adjustment for Multiple Comparisons | |
| |
| |
| |
Single-Step Bonferroni Procedure | |
| |
| |
| |
Single-Step Sidak Procedure | |
| |
| |
| |
Step-Down Holm Procedure | |
| |
| |
| |
Step-Up Benjamini and Hochberg Procedure | |
| |
| |
| |
Permutation Based Multiplicity Adjustment | |
| |
| |
| |
Unsupervised Learning (Taxonomy-Related Analysis) | |
| |
| |
| |
Cluster Analysis | |
| |
| |
| |
Measures of Similarity or Distance | |
| |
| |
| |
K-Means Clustering | |
| |
| |
| |
Hierarchical Clustering | |
| |
| |
| |
Two-Way Clustering and Related Methods | |
| |
| |
| |
Principal Component Analysis | |
| |
| |
| |
Self-Organizing Maps | |
| |
| |
Exercises | |
| |
| |
| |
Biomarker Discovery And Classification | |
| |
| |
| |
Overview | |
| |
| |
| |
Gene Expression Matrix ... Again | |
| |
| |
| |
Biomarker Discovery | |
| |
| |
| |
Classification Systems | |
| |
| |
| |
Parametric and Nonparametric Learning Algorithms | |
| |
| |
| |
Terms Associated with Common Assumptions Underlying Parametric Learning Algorithms | |
| |
| |
| |
Visualization of Classification Results | |
| |
| |
| |
Validation of the Classification Model | |
| |
| |
| |
Reclassification | |
| |
| |
| |
Leave-One-Out and K-Fold Cross-Validation | |
| |
| |
| |
External and Internal Cross-Validation | |
| |
| |
| |
Holdout Method of Validation | |
| |
| |
| |
Ensemble-Based Validation (Using Out-of-Bag Samples) | |
| |
| |
| |
Validation on an Independent Data Set | |
| |
| |
| |
Reporting Validation Results | |
| |
| |
| |
Binary Classifiers | |
| |
| |
| |
Multiclass Classifiers | |
| |
| |
| |
Identifying Biological Processes Underlying the Class Differentiation | |
| |
| |
| |
Feature Selection | |
| |
| |
| |
Introduction | |
| |
| |
| |
Univariate Versus Multivariate Approaches | |
| |
| |
| |
Supervised Versus Unsupervised Methods | |
| |
| |
| |
Taxonomy of Feature Selection Methods | |
| |
| |
| |
Filters, Wrappers, Hybrid, and Embedded Models | |
| |
| |
| |
Strategy: Exhaustive, Complete, Sequential, Random, and Hybrid Searches | |
| |
| |
| |
Subset Evaluation Criteria | |
| |
| |
| |
Search-Stopping Criteria | |
| |
| |
| |
Feature Selection for Multiclass Discrimination | |
| |
| |
| |
Regularization and Feature Selection | |
| |
| |
| |
Stability of Biomarkers | |
| |
| |
| |
Discriminant Analysis | |
| |
| |
| |
Introduction | |
| |
| |
| |
Learning Algorithm | |
| |
| |
| |
A Stepwise Hybrid Feature Selection with T<sup>2</sup> | |
| |
| |
| |
Support Vector Machines | |
| |
| |
| |
Hard-Margin Support Vector Machines | |
| |
| |
| |
Soft-Margin Support Vector Machines | |
| |
| |
| |
Kernels | |
| |
| |
| |
SVMs and Multiclass Discrimination | |
| |
| |
| |
One-Versus-the-Rest Approach | |
| |
| |
| |
Pairwise Approach | |
| |
| |
| |
All-Classes-Simultaneously Approach | |
| |
| |
| |
SVMs and Feature Selection: Recursive Feature Elimination | |
| |
| |
| |
Summary | |
| |
| |
| |
Random Forests | |
| |
| |
| |
Introduction | |
| |
| |
| |
Random Forests Learning Algorithm | |
| |
| |
| |
Random Forests and Feature Selection | |
| |
| |
| |
Summary | |
| |
| |
| |
Ensemble Classifiers, Bootstrap Methods, and The Modified Bagging Schema | |
| |
| |
| |
Ensemble Classifiers | |
| |
| |
| |
Parallel Approach | |
| |
| |
| |
Serial Approach | |
| |
| |
| |
Ensemble Classifiers and Biomarker Discovery | |
| |
| |
| |
Bootstrap Methods | |
| |
| |
| |
Bootstrap and Linear Discriminant Analysis | |
| |
| |
| |
The Modified Bagging Schema | |
| |
| |
| |
Other Learning Algorithms | |
| |
| |
| |
k-Nearest Neighbor Classifiers | |
| |
| |
| |
Artificial Neural Networks | |
| |
| |
| |
Perceptron | |
| |
| |
| |
Multilayer Feedforward Neural Networks | |
| |
| |
| |
Training the Network (Supervised Learning) | |
| |
| |
| |
Eight Commandments of Gene Expression Analysis (for Biomarker Discovery) | |
| |
| |
Exercises | |
| |
| |
| |
The Informative Set Of Genes | |
| |
| |
| |
Introduction | |
| |
| |
| |
Definitions | |
| |
| |
| |
The Method | |
| |
| |
| |
Identification of the Informative Set of Genes | |
| |
| |
| |
Primary Expression Patterns of the Informative Set of Genes | |
| |
| |
| |
The Most Frequently Used Genes of the Primary Expression Patterns | |
| |
| |
| |
Using the Informative Set of Genes to Identify Robust Multivariate Biomarkers | |
| |
| |
| |
Summary | |
| |
| |
Exercises | |
| |
| |
| |
Analysis Of Protein Expression Data | |
| |
| |
| |
Introduction | |
| |
| |
| |
Protein Chip Technology | |
| |
| |
| |
Antibody Microarrays | |
| |
| |
| |
Peptide Microarrays | |
| |
| |
| |
Protein Microarrays | |
| |
| |
| |
Reverse Phase Microarrays | |
| |
| |
| |
Two-Dimensional Gel Electrophoresis | |
| |
| |
| |
MALD1-TOF and SELDI-TOF Mass Spectrometry | |
| |
| |
| |
MALDI-TOF Mass Spectrometry | |
| |
| |
| |
SELDI-TOF Mass Spectrometry | |
| |
| |
| |
Preprocessing of Mass Spectrometry Data | |
| |
| |
| |
Introduction | |
| |
| |
| |
Elements of Preprocessing of SELDI-TOF Mass Spectrometry Data | |
| |
| |
| |
Quality Assessment | |
| |
| |
| |
Calibration | |
| |
| |
| |
Baseline Correction | |
| |
| |
| |
Noise Reduction and Smoothing | |
| |
| |
| |
Peak Detection | |
| |
| |
| |
Intensity Normalization | |
| |
| |
| |
Peak Alignment Across Spectra | |
| |
| |
| |
Analysis of Protein Expression Data | |
| |
| |
| |
Additional Preprocessing | |
| |
| |
| |
Basic Exploratory Data Analysis | |
| |
| |
| |
Unsupervised Learning | |
| |
| |
| |
Supervised Learning-Feature Selection and Biomarker Discovery | |
| |
| |
| |
Supervised Learning-Classification Systems | |
| |
| |
| |
Associating Biomarker Peaks with Proteins | |
| |
| |
| |
Introduction | |
| |
| |
| |
The Universal Protein Resource (UniProt) | |
| |
| |
| |
Search Programs | |
| |
| |
| |
Tandem Mass Spectrometry | |
| |
| |
| |
Summary | |
| |
| |
| |
Sketches For Selected Exercises | |
| |
| |
| |
Introduction | |
| |
| |
| |
Multiclass Discrimination (Exercise 3.2) | |
| |
| |
| |
Data Set Selection, Downloading, and Consolidation | |
| |
| |
| |
Filtering Probe Sets | |
| |
| |
| |
Designing a Multistage Classification Schema | |
| |
| |
| |
Identifying the Informative Set of Genes (Exercises 4.2-4.6) | |
| |
| |
| |
The Informative Set of Genes | |
| |
| |
| |
Primary Expression Patterns of the Informative Set | |
| |
| |
| |
The Most Frequently Used Genes of the Primary Expression Patterns | |
| |
| |
| |
Using the Informative Set of Genes to Identify Robust Multivariate Markers (Exercise 4.8) | |
| |
| |
| |
Validating Biomarkers on an Independent Test Data Set (Exercise 4.8) | |
| |
| |
| |
Using a Training Set that Combines More than One Data Set (Exercises 3.5 and 4.1-4.8) | |
| |
| |
| |
Combining the Two Data Sets into a Single Training Set | |
| |
| |
| |
Filtering Probe Sets of the Combined Data | |
| |
| |
| |
Assessing the Discriminatory Power of the Biomarkers and Their Generalization | |
| |
| |
| |
Identifying the Informative Set of Genes | |
| |
| |
| |
Primary Expression Patterns of the Informative Set of Genes | |
| |
| |
| |
The Most Frequently Used Genes of the Primary Expression Patterns | |
| |
| |
| |
Using the Informative Set of Genes to Identify Robust Multivariate Markers | |
| |
| |
| |
Validating Biomarkers on an Independent Test Data Set | |
| |
| |
References | |
| |
| |
Index | |