Data Mining for Genomics and Proteomics Analysis of Gene and Protein Expression Data

Name: Data Mining for Genomics and Proteomics Analysis of Gene and Protein Expression Data
Price: 18.3 USD
Availability: InStock
ISBN: 9780470163733

ISBN-10: 0470163739

ISBN-13: 9780470163733

Edition: 2010

Authors: Darius M. Dziuda

List price: $189.95

30 day, 100% satisfaction guarantee!

Marketplace

3 new & used from $18.30

what's this?

Rush Rewards U
Members Receive:

You have reached 400 XP and carrot coins. That is the daily max!

Description:

Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedical studies can be used to maximize the chance of extracting new and useful biomedical knowledge from data. It is an excellent resource for students and professionals involved with gene or protein expression data in a variety of settings.

Book details

List price: $189.95
Copyright year: 2010
Publisher: John Wiley & Sons, Incorporated
Publication date: 6/17/2010
Binding: Hardcover
Pages: 336
Size: 6.40" wide x 9.50" long x 0.90" tall
Weight: 1.386
Language: English

Darius M. Dziuda, PhD, is Associate Professor of Data Mining and Statistics in the Department of Mathematical Sciences at Central Connecticut State University (CCSU). His research and professional activities have been focused on efficient data mining of biomedical data and on methods for identification of parsimonious multivariate biomarkers for medical diagnosis, prognosis, personalized medicine, and drug discovery. For CCSU's data mining program, Dr. Dziuda developed and teaches graduate-level courses on Data Mining for Genomics and Proteomics and on Biomarker Discovery.



Preface


Acknowledgments



Introduction



Basic Terminology



The Central Dogma of Molecular Biology



Genome



Proteome



DNA (Deoxyribonucleic Acid)



RNA (Ribonucleic Acid)



mRNA (messenger RNA)



Genetic Code



Gene



Gene Expression and the Gene Expression Level



Protein



Overlapping Areas of Research



Genomics



Proteomics



Bioinformatics



Transcriptomics and Other -omics



Data Mining



Basic Analysis Of Gene Expression Microarray Data



Introduction



Microarray Technology



Spotted Microarrays



Affymetrix GeneChip “ Microarrays



Bead-Based Microarrays



Low-Level Preprocessing of Affymetrix Microarrays



MAS5



RMA



GCRMA



PLIER



Public Repositories of Microarray Data



Microarray Gene Expression Data Society (MGED) Standards



Public Databases



Gene Expression Omnibus (GEO)



ArrayExpress



Gene Expression Matrix



Elements of Gene Expression Microarray Data Analysis



Additional Preprocessing, Quality Assessment, and Filtering



Quality Assessment



Filtering



Basic Exploratory Data Analysis



t Test



t Test for Equal Variances



t Test for Unequal Variances



ANOVA F Test



SAM t Statistic



Limma



Adjustment for Multiple Comparisons



Single-Step Bonferroni Procedure



Single-Step Sidak Procedure



Step-Down Holm Procedure



Step-Up Benjamini and Hochberg Procedure



Permutation Based Multiplicity Adjustment



Unsupervised Learning (Taxonomy-Related Analysis)



Cluster Analysis



Measures of Similarity or Distance



K-Means Clustering



Hierarchical Clustering



Two-Way Clustering and Related Methods



Principal Component Analysis



Self-Organizing Maps


Exercises



Biomarker Discovery And Classification



Overview



Gene Expression Matrix ... Again



Biomarker Discovery



Classification Systems



Parametric and Nonparametric Learning Algorithms



Terms Associated with Common Assumptions Underlying Parametric Learning Algorithms



Visualization of Classification Results



Validation of the Classification Model



Reclassification



Leave-One-Out and K-Fold Cross-Validation



External and Internal Cross-Validation



Holdout Method of Validation



Ensemble-Based Validation (Using Out-of-Bag Samples)



Validation on an Independent Data Set



Reporting Validation Results



Binary Classifiers



Multiclass Classifiers



Identifying Biological Processes Underlying the Class Differentiation



Feature Selection



Introduction



Univariate Versus Multivariate Approaches



Supervised Versus Unsupervised Methods



Taxonomy of Feature Selection Methods



Filters, Wrappers, Hybrid, and Embedded Models



Strategy: Exhaustive, Complete, Sequential, Random, and Hybrid Searches



Subset Evaluation Criteria



Search-Stopping Criteria



Feature Selection for Multiclass Discrimination



Regularization and Feature Selection



Stability of Biomarkers



Discriminant Analysis



Introduction



Learning Algorithm



A Stepwise Hybrid Feature Selection with T<sup>2</sup>



Support Vector Machines



Hard-Margin Support Vector Machines



Soft-Margin Support Vector Machines



Kernels



SVMs and Multiclass Discrimination



One-Versus-the-Rest Approach



Pairwise Approach



All-Classes-Simultaneously Approach



SVMs and Feature Selection: Recursive Feature Elimination



Summary



Random Forests



Introduction



Random Forests Learning Algorithm



Random Forests and Feature Selection



Summary



Ensemble Classifiers, Bootstrap Methods, and The Modified Bagging Schema



Ensemble Classifiers



Parallel Approach



Serial Approach



Ensemble Classifiers and Biomarker Discovery



Bootstrap Methods



Bootstrap and Linear Discriminant Analysis



The Modified Bagging Schema



Other Learning Algorithms



k-Nearest Neighbor Classifiers



Artificial Neural Networks



Perceptron



Multilayer Feedforward Neural Networks



Training the Network (Supervised Learning)



Eight Commandments of Gene Expression Analysis (for Biomarker Discovery)


Exercises



The Informative Set Of Genes



Introduction



Definitions



The Method



Identification of the Informative Set of Genes



Primary Expression Patterns of the Informative Set of Genes



The Most Frequently Used Genes of the Primary Expression Patterns



Using the Informative Set of Genes to Identify Robust Multivariate Biomarkers



Summary


Exercises



Analysis Of Protein Expression Data



Introduction



Protein Chip Technology



Antibody Microarrays



Peptide Microarrays



Protein Microarrays



Reverse Phase Microarrays



Two-Dimensional Gel Electrophoresis



MALD1-TOF and SELDI-TOF Mass Spectrometry



MALDI-TOF Mass Spectrometry



SELDI-TOF Mass Spectrometry



Preprocessing of Mass Spectrometry Data



Introduction



Elements of Preprocessing of SELDI-TOF Mass Spectrometry Data



Quality Assessment



Calibration



Baseline Correction



Noise Reduction and Smoothing



Peak Detection



Intensity Normalization



Peak Alignment Across Spectra



Analysis of Protein Expression Data



Additional Preprocessing



Basic Exploratory Data Analysis



Unsupervised Learning



Supervised Learning-Feature Selection and Biomarker Discovery



Supervised Learning-Classification Systems



Associating Biomarker Peaks with Proteins



Introduction



The Universal Protein Resource (UniProt)



Search Programs



Tandem Mass Spectrometry



Summary



Sketches For Selected Exercises



Introduction



Multiclass Discrimination (Exercise 3.2)



Data Set Selection, Downloading, and Consolidation



Filtering Probe Sets



Designing a Multistage Classification Schema



Identifying the Informative Set of Genes (Exercises 4.2-4.6)



The Informative Set of Genes



Primary Expression Patterns of the Informative Set



The Most Frequently Used Genes of the Primary Expression Patterns



Using the Informative Set of Genes to Identify Robust Multivariate Markers (Exercise 4.8)



Validating Biomarkers on an Independent Test Data Set (Exercise 4.8)



Using a Training Set that Combines More than One Data Set (Exercises 3.5 and 4.1-4.8)



Combining the Two Data Sets into a Single Training Set



Filtering Probe Sets of the Combined Data



Assessing the Discriminatory Power of the Biomarkers and Their Generalization



Identifying the Informative Set of Genes



Primary Expression Patterns of the Informative Set of Genes



The Most Frequently Used Genes of the Primary Expression Patterns



Using the Informative Set of Genes to Identify Robust Multivariate Markers



Validating Biomarkers on an Independent Test Data Set


References


Index