Skip to content

Fault-Tolerant Systems

Best in textbook rentals since 2012!

ISBN-10: 0120885255

ISBN-13: 9780120885251

Edition: 2007

Authors: Israel Koren, C. Mani Krishna

List price: $89.95
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

There are many applications in which the reliability of the overall system must be far higher than the reliability of its individual components. In such cases, designers devise mechanisms and architectures that allow the system to either completely mask the effects of a component failure or recover from it so quickly that the application is not seriously affected. This is the work of fault-tolerant designers and their work is increasingly important and complex not only because of the increasing number of mission critical applications, but also because the diminishing reliability of hardware means that even systems for non-critical applications will need to be designed with fault-tolerance…    
Customers also bought

Book details

List price: $89.95
Copyright year: 2007
Publisher: Elsevier Science & Technology
Publication date: 4/13/2007
Binding: Hardcover
Pages: 400
Size: 7.50" wide x 9.21" long x 1.00" tall
Weight: 2.046
Language: English

Foreword
Preface
Acknowledgements
About the Authors
Preliminaries
Fault Classification
Types of Redundancy
Basic Measures of Fault Tolerance
Traditional Measures
Network Measures
Outline of This Book
Further Reading
References
Hardware Fault Tolerance
The Rate of Hardware Failures
Failure Rate, Reliability, and Mean Time to Failure
Canonical and Resilient Structures
Series and Parallel Systems
Non-Series/Parallel Systems
M-of-N Systems
Voters
Variations on N-Modular Redundancy
Duplex Systems
Other Reliability Evaluation Techniques
Poisson Processes
Markov Models
Fault-Tolerance Processor-Level Techniques
Watchdog Processor
Simultaneous Multithreading for Fault Tolerance
Byzantine Failures
Byzantine Agreement with Message Authentication
Further Reading
Exercises
References
Information Redundancy
Coding
Parity Codes
Checksum
M-of-N Codes
Berger Code
Cyclic Codes
Arithmetic Codes
Resilient Disk Systems
Raid Level 1
Raid Level 2
Raid Level 3
Raid Level 4
Raid Level 5
Modeling Correlated Failures
Data Replication
Voting: Non-Hierarchical Organization
Voting: Hierarchical Organization
Primary-Backup Approach
Algorithm-Based Fault Tolerance
Further Reading
Exercises
References
Fault-Tolerant Networks
Measures of Resilience
Graph-Theoretical Measures
Computer Networks Measures
Common Network Topologies and Their Resilience
Multistage and Extra-Stage Networks
Crossbar Networks
Rectangular Mesh and Interstitial Mesh
Hypercube Network
Cube-Connected Cycles Networks
Loop Networks
Ad hoc Point-to-Point Networks
Fault-Tolerant Routing
Hypercube Fault-Tolerant Routing
Origin-Based Routing in the Mesh
Further Reading
Exercises
References
Software Fault Tolerance
Acceptance Tests
Single-Version Fault Tolerance
Wrappers
Software Rejuvenation
Data Diversity
Software Implemented Hardware Fault Tolerance (SIHFT)
N-Version Programming
Consistent Comparison Problem
Version Independence
Recovery Block Approach
Basic Principles
Success Probability Calculation
Distributed Recovery Blocks
Preconditions, Postconditions, and Assertions
Exception-Handling
Requirements from Exception-Handlers
Basics of Exceptions and Exception-Handling
Language Support
Software Reliability Models
Jelinski-Moranda Model
Littlewood-Verrall Model
Musa-Okumoto Model
Model Selection and Parameter Estimation
Fault-Tolerant Remote Procedure Calls
Primary-Backup Approach
The Circus Approach
Further Reading
Exercises
References
Checkpointing
What is Checkpointing?
Why is Checkpointing Nontrivial?
Checkpoint Level
Optimal Checkpointing-An Analytical Model
Time Between Checkpoints-A First-Order Approximation
Optimal Checkpoint Placement
Time Between Checkpoints-A More Accurate Model
Reducing Overhead
Reducing Latency
Cache-Aided Rollback Error Recovery (CARER)
Checkpointing in Distributed Systems
The Domino Effect and Livelock
A Coordinated Checkpointing Algorithm
Time-Based Synchronization
Diskless Checkpointing
Message Logging
Checkpointing in Shared-Memory Systems
Bus-Based Coherence Protocol
Directory-Based Protocol
Checkpointing in Real-Time Systems
Other Uses of Checkpointing
Further Reading
Exercises
References
Case Studies
NonStop Systems
Architecture
Maintenance and Repair Aids
Software
Modifications to the NonStop Architecture
Stratus Systems
Cassini Command and Data Subsystem
IBM G5
IBM Sysplex
Itanium
Further Reading
References
Defect Tolerance in VLSI Circuits
Manufacturing Defects and Circuit Faults
Probability of Failure and Critical Area
Basic Yield Models
The Poisson and Compound Poisson Yield Models
Variations on the Simple Yield Models
Yield Enhancement Through Redundancy
Yield Projection for Chips with Redundancy
Memory Arrays with Redundancy
Logic Integrated Circuits with Redundancy
Modifying the Floorplan
Further Reading
Exercises
References
Fault Detection in Cryptographic Systems
Overview of Ciphers
Symmetric Key Ciphers
Public Key Ciphers
Security Attacks Through Fault Injection
Fault Attacks on Symmetric Key Ciphers
Fault Attacks on Public (Asymmetric) Key Ciphers
Countermeasures
Spatial and Temporal Duplication
Error-Detecting Codes
Are These Countermeasures Sufficient?
Final Comment
Further Reading
Exercises
References
Simulation Techniques
Writing a Simulation Program
Parameter Estimation
Point Versus Interval Estimation
Method of Moments
Method of Maximum Likelihood
The Bayesian Approach to Parameter Estimation
Confidence Intervals
Variance Reduction Methods
Antithetic Variables
Using Control Variables
Stratified Sampling
Importance Sampling
Random Number Generation
Uniformly Distributed Random Number Generators
Testing Uniform Random Number Generators
Generating Other Distributions
Fault Injection
Types of Fault Injection Techniques
Fault Injection Application and Tools
Further Reading
Exercises
References
Subject Index