Skip to content

Data Mining for Genomics and Proteomics Analysis of Gene and Protein Expression Data

Best in textbook rentals since 2012!

ISBN-10: 0470163739

ISBN-13: 9780470163733

Edition: 2010

Authors: Darius M. Dziuda

List price: $189.95
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedical studies can be used to maximize the chance of extracting new and useful biomedical knowledge from data. It is an excellent resource for students and professionals involved with gene or protein expression data in a variety of settings.
Customers also bought

Book details

List price: $189.95
Copyright year: 2010
Publisher: John Wiley & Sons, Incorporated
Publication date: 6/17/2010
Binding: Hardcover
Pages: 336
Size: 6.40" wide x 9.50" long x 0.90" tall
Weight: 1.386
Language: English

Darius M. Dziuda, PhD, is Associate Professor of Data Mining and Statistics in the Department of Mathematical Sciences at Central Connecticut State University (CCSU). His research and professional activities have been focused on efficient data mining of biomedical data and on methods for identification of parsimonious multivariate biomarkers for medical diagnosis, prognosis, personalized medicine, and drug discovery. For CCSU's data mining program, Dr. Dziuda developed and teaches graduate-level courses on Data Mining for Genomics and Proteomics and on Biomarker Discovery.

Preface
Acknowledgments
Introduction
Basic Terminology
The Central Dogma of Molecular Biology
Genome
Proteome
DNA (Deoxyribonucleic Acid)
RNA (Ribonucleic Acid)
mRNA (messenger RNA)
Genetic Code
Gene
Gene Expression and the Gene Expression Level
Protein
Overlapping Areas of Research
Genomics
Proteomics
Bioinformatics
Transcriptomics and Other -omics
Data Mining
Basic Analysis Of Gene Expression Microarray Data
Introduction
Microarray Technology
Spotted Microarrays
Affymetrix GeneChip “ Microarrays
Bead-Based Microarrays
Low-Level Preprocessing of Affymetrix Microarrays
MAS5
RMA
GCRMA
PLIER
Public Repositories of Microarray Data
Microarray Gene Expression Data Society (MGED) Standards
Public Databases
Gene Expression Omnibus (GEO)
ArrayExpress
Gene Expression Matrix
Elements of Gene Expression Microarray Data Analysis
Additional Preprocessing, Quality Assessment, and Filtering
Quality Assessment
Filtering
Basic Exploratory Data Analysis
t Test
t Test for Equal Variances
t Test for Unequal Variances
ANOVA F Test
SAM t Statistic
Limma
Adjustment for Multiple Comparisons
Single-Step Bonferroni Procedure
Single-Step Sidak Procedure
Step-Down Holm Procedure
Step-Up Benjamini and Hochberg Procedure
Permutation Based Multiplicity Adjustment
Unsupervised Learning (Taxonomy-Related Analysis)
Cluster Analysis
Measures of Similarity or Distance
K-Means Clustering
Hierarchical Clustering
Two-Way Clustering and Related Methods
Principal Component Analysis
Self-Organizing Maps
Exercises
Biomarker Discovery And Classification
Overview
Gene Expression Matrix ... Again
Biomarker Discovery
Classification Systems
Parametric and Nonparametric Learning Algorithms
Terms Associated with Common Assumptions Underlying Parametric Learning Algorithms
Visualization of Classification Results
Validation of the Classification Model
Reclassification
Leave-One-Out and K-Fold Cross-Validation
External and Internal Cross-Validation
Holdout Method of Validation
Ensemble-Based Validation (Using Out-of-Bag Samples)
Validation on an Independent Data Set
Reporting Validation Results
Binary Classifiers
Multiclass Classifiers
Identifying Biological Processes Underlying the Class Differentiation
Feature Selection
Introduction
Univariate Versus Multivariate Approaches
Supervised Versus Unsupervised Methods
Taxonomy of Feature Selection Methods
Filters, Wrappers, Hybrid, and Embedded Models
Strategy: Exhaustive, Complete, Sequential, Random, and Hybrid Searches
Subset Evaluation Criteria
Search-Stopping Criteria
Feature Selection for Multiclass Discrimination
Regularization and Feature Selection
Stability of Biomarkers
Discriminant Analysis
Introduction
Learning Algorithm
A Stepwise Hybrid Feature Selection with T<sup>2</sup>
Support Vector Machines
Hard-Margin Support Vector Machines
Soft-Margin Support Vector Machines
Kernels
SVMs and Multiclass Discrimination
One-Versus-the-Rest Approach
Pairwise Approach
All-Classes-Simultaneously Approach
SVMs and Feature Selection: Recursive Feature Elimination
Summary
Random Forests
Introduction
Random Forests Learning Algorithm
Random Forests and Feature Selection
Summary
Ensemble Classifiers, Bootstrap Methods, and The Modified Bagging Schema
Ensemble Classifiers
Parallel Approach
Serial Approach
Ensemble Classifiers and Biomarker Discovery
Bootstrap Methods
Bootstrap and Linear Discriminant Analysis
The Modified Bagging Schema
Other Learning Algorithms
k-Nearest Neighbor Classifiers
Artificial Neural Networks
Perceptron
Multilayer Feedforward Neural Networks
Training the Network (Supervised Learning)
Eight Commandments of Gene Expression Analysis (for Biomarker Discovery)
Exercises
The Informative Set Of Genes
Introduction
Definitions
The Method
Identification of the Informative Set of Genes
Primary Expression Patterns of the Informative Set of Genes
The Most Frequently Used Genes of the Primary Expression Patterns
Using the Informative Set of Genes to Identify Robust Multivariate Biomarkers
Summary
Exercises
Analysis Of Protein Expression Data
Introduction
Protein Chip Technology
Antibody Microarrays
Peptide Microarrays
Protein Microarrays
Reverse Phase Microarrays
Two-Dimensional Gel Electrophoresis
MALD1-TOF and SELDI-TOF Mass Spectrometry
MALDI-TOF Mass Spectrometry
SELDI-TOF Mass Spectrometry
Preprocessing of Mass Spectrometry Data
Introduction
Elements of Preprocessing of SELDI-TOF Mass Spectrometry Data
Quality Assessment
Calibration
Baseline Correction
Noise Reduction and Smoothing
Peak Detection
Intensity Normalization
Peak Alignment Across Spectra
Analysis of Protein Expression Data
Additional Preprocessing
Basic Exploratory Data Analysis
Unsupervised Learning
Supervised Learning-Feature Selection and Biomarker Discovery
Supervised Learning-Classification Systems
Associating Biomarker Peaks with Proteins
Introduction
The Universal Protein Resource (UniProt)
Search Programs
Tandem Mass Spectrometry
Summary
Sketches For Selected Exercises
Introduction
Multiclass Discrimination (Exercise 3.2)
Data Set Selection, Downloading, and Consolidation
Filtering Probe Sets
Designing a Multistage Classification Schema
Identifying the Informative Set of Genes (Exercises 4.2-4.6)
The Informative Set of Genes
Primary Expression Patterns of the Informative Set
The Most Frequently Used Genes of the Primary Expression Patterns
Using the Informative Set of Genes to Identify Robust Multivariate Markers (Exercise 4.8)
Validating Biomarkers on an Independent Test Data Set (Exercise 4.8)
Using a Training Set that Combines More than One Data Set (Exercises 3.5 and 4.1-4.8)
Combining the Two Data Sets into a Single Training Set
Filtering Probe Sets of the Combined Data
Assessing the Discriminatory Power of the Biomarkers and Their Generalization
Identifying the Informative Set of Genes
Primary Expression Patterns of the Informative Set of Genes
The Most Frequently Used Genes of the Primary Expression Patterns
Using the Informative Set of Genes to Identify Robust Multivariate Markers
Validating Biomarkers on an Independent Test Data Set
References
Index