更新时间:2021-06-10 19:19:10
coverpage
Title Page
Dedication
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Code in action
Conventions used
Get in touch
Reviews
Hadoop 3.0 - Background and Introduction
How it all started
What Hadoop is and why it is important
How Apache Hadoop works
Resource Manager
Node Manager
YARN Timeline Service version 2
NameNode
DataNode
Hadoop 3.0 releases and new features
Choosing the right Hadoop distribution
Cloudera Hadoop distribution
Hortonworks Hadoop distribution
MapR Hadoop distribution
Summary
Planning and Setting Up Hadoop Clusters
Technical requirements
Prerequisites for Hadoop setup
Preparing hardware for Hadoop
Readying your system
Installing the prerequisites
Working across nodes without passwords (SSH in keyless)
Downloading Hadoop
Running Hadoop in standalone mode
Setting up a pseudo Hadoop cluster
Planning and sizing clusters
Initial load of data
Organizational data growth
Workload and computational requirements
High availability and fault tolerance
Velocity of data and other factors
Setting up Hadoop in cluster mode
Installing and configuring HDFS in cluster mode
Setting up YARN in cluster mode
Diagnosing the Hadoop cluster
Working with log files
Cluster debugging and tuning tools
JPS (Java Virtual Machine Process Status)
JStack
Deep Dive into the Hadoop Distributed File System
How HDFS works
Key features of HDFS
Achieving multi tenancy in HDFS
Snapshots of HDFS
Safe mode
Hot swapping
Federation
Intra-DataNode balancer
Data flow patterns of HDFS
HDFS as primary storage with cache
HDFS as archival storage
HDFS as historical storage
HDFS as a backbone
HDFS configuration files
Hadoop filesystem CLIs
Working with HDFS user commands
Working with Hadoop shell commands
Working with data structures in HDFS
Understanding SequenceFile
MapFile and its variants
Developing MapReduce Applications
How MapReduce works
What is MapReduce?
An example of MapReduce
Configuring a MapReduce environment
Working with mapred-site.xml
Working with Job history server
RESTful APIs for Job history server
Understanding Hadoop APIs and packages
Setting up a MapReduce project
Setting up an Eclipse project
Deep diving into MapReduce APIs
Configuring MapReduce jobs
Understanding input formats
Understanding output formats
Working with Mapper APIs
Working with the Reducer API