| |
| |
Preface | |
| |
| |
Introduction | |
| |
| |
HDFS | |
| |
| |
Goals and Motivation | |
| |
| |
Design | |
| |
| |
Daemons | |
| |
| |
Reading and Writing Data | |
| |
| |
The Read Path | |
| |
| |
The Write Path | |
| |
| |
Managing Filesystem Metadata | |
| |
| |
Namenode High Availability | |
| |
| |
Namenode Federation | |
| |
| |
Access and Integration | |
| |
| |
Command-Line Tools | |
| |
| |
FUSE | |
| |
| |
REST Support | |
| |
| |
| |
MapReduce | |
| |
| |
The Stages of MapReduce | |
| |
| |
Introducing Hadoop MapReduce | |
| |
| |
Daemons | |
| |
| |
When It All Goes Wrong | |
| |
| |
YARN | |
| |
| |
| |
Planning a Hadoop Cluster | |
| |
| |
Picking a Distribution and Version of Hadoop | |
| |
| |
Apache Hadoop | |
| |
| |
Cloudera's Distribution Including Apache Hadoop | |
| |
| |
Versions and Features | |
| |
| |
What Should I Use? | |
| |
| |
Hardware Selection | |
| |
| |
Master Hardware Selection | |
| |
| |
Worker Hardware Selection | |
| |
| |
Cluster Sizing | |
| |
| |
Blades, SANs, and Virtualization | |
| |
| |
Operating System Selection and Preparation | |
| |
| |
Deployment Layout | |
| |
| |
Software | |
| |
| |
Hostnames, DNS, and Identification | |
| |
| |
Users, Groups, and Privileges | |
| |
| |
Kernel Tuning | |
| |
| |
vm.swappiness | |
| |
| |
vm.overcommit_memory | |
| |
| |
Disk Configuration | |
| |
| |
Choosing a Filesystem | |
| |
| |
Mount Options | |
| |
| |
Network Design | |
| |
| |
Network Usage in Hadoop: A Review | |
| |
| |
1 Gb versus 10 Gb Networks | |
| |
| |
Typical Network Topologies | |
| |
| |
| |
Installation and Configuration | |
| |
| |
Installing Hadoop | |
| |
| |
Apache Hadoop | |
| |
| |
CDH | |
| |
| |
Configuration: An Overview | |
| |
| |
The Hadoop XML Configuration Files | |
| |
| |
Environment Variables and Shell Scripts | |
| |
| |
Logging Configuration | |
| |
| |
HDFS | |
| |
| |
Identification and Location | |
| |
| |
Optimization and Tuning | |
| |
| |
Formatting the Namenode | |
| |
| |
Creating a /tmp Directory | |
| |
| |
Namenode High Availability | |
| |
| |
Fencing Options | |
| |
| |
Basic Configuration | |
| |
| |
Automatic Failover Configuration | |
| |
| |
Format and Bootstrap the Namenodes | |
| |
| |
Namenode Federation | |
| |
| |
MapReduce | |
| |
| |
Identification and Location | |
| |
| |
Optimization and Tuning | |
| |
| |
Rack Topology | |
| |
| |
Security | |
| |
| |
| |
Identity, Authentication, and Authorization | |
| |
| |
Identity | |
| |
| |
Kerberos and Hadoop | |
| |
| |
Kerberos: A Refresher | |
| |
| |
Kerberos Support in Hadoop | |
| |
| |
Authorization | |
| |
| |
HDFS | |
| |
| |
MapReduce | |
| |
| |
Other Tools and Systems | |
| |
| |
Tying It Together | |
| |
| |
| |
Resource Management | |
| |
| |
What Is Resource Management? | |
| |
| |
HDFS Quotas | |
| |
| |
MapReduce Schedulers | |
| |
| |
The FIFO Scheduler | |
| |
| |
The Fair Scheduler | |
| |
| |
The Capacity Scheduler | |
| |
| |
The Future | |
| |
| |
| |
Cluster Maintenance | |
| |
| |
Managing Hadoop Processes | |
| |
| |
Starting and Stopping Processes with Init Scripts | |
| |
| |
Starting and Stopping Processes Manually | |
| |
| |
HDFS Maintenance Tasks | |
| |
| |
Adding a Datanode | |
| |
| |
Decommissioning a Datanode | |
| |
| |
Checking Filesystem Integrity with fsck | |
| |
| |
Balancing HDFS Block Data | |
| |
| |
Dealing with a Failed Disk | |
| |
| |
MapReduce Maintenance Tasks | |
| |
| |
Adding a Tasktracker | |
| |
| |
Decommissioning a Tasktracker | |
| |
| |
Killing a MapReduce Job | |
| |
| |
Killing a MapReduce Task | |
| |
| |
Dealing with a Blacklisted Tasktracker | |
| |
| |
| |
Troubleshooting | |
| |
| |
Differential Diagnosis Applied to Systems | |
| |
| |
Common Failures and Problems | |
| |
| |
Humans (You) | |
| |
| |
Misconfiguration | |
| |
| |
Hardware Failure | |
| |
| |
Resource Exhaustion | |
| |
| |
Host Identification and Naming | |
| |
| |
Network Partitions | |
| |
| |
"Is the Computer Plugged In?" | |
| |
| |
E-SPORE | |
| |
| |
Treatment and Care | |
| |
| |
War Stories | |
| |
| |
A Mystery Bottleneck | |
| |
| |
There's No Place Like 127.0.0.1 | |
| |
| |
| |
Monitoring | |
| |
| |
An Overview | |
| |
| |
Hadoop Metrics | |
| |
| |
Apache Hadoop 0.20.0 and CDH3 (metrics1) | |
| |
| |
Apache Hadoop 0.20.203 and Later, and CDH4 (metrics 2) | |
| |
| |
What about SNMP? | |
| |
| |
Health Monitoring | |
| |
| |
Host-Level Checks | |
| |
| |
All Hadoop Processes | |
| |
| |
HDFS Checks | |
| |
| |
MapReduce Checks | |
| |
| |
| |
Backup and Recovery | |
| |
| |
Data Backup | |
| |
| |
Distributed Copy (distcp) | |
| |
| |
Parallel Data Ingestion | |
| |
| |
Namenode Metadata | |
| |
| |
Appendix: Deprecated Configuration Properties | |
| |
| |
Index | |