Apache Hadoop 3 Quick Start Guide

更新时间：2021-06-10 19:19:10

最新章节：Leave a review - let other readers know what you think

coverpage

Title Page

Dedication

Packt Upsell

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Code in action

Conventions used

Get in touch

Reviews

Hadoop 3.0 - Background and Introduction

How it all started

What Hadoop is and why it is important

How Apache Hadoop works

Resource Manager

Node Manager

YARN Timeline Service version 2

NameNode

DataNode

Hadoop 3.0 releases and new features

Choosing the right Hadoop distribution

Cloudera Hadoop distribution

Hortonworks Hadoop distribution

MapR Hadoop distribution

Summary

Planning and Setting Up Hadoop Clusters

Technical requirements

Prerequisites for Hadoop setup

Preparing hardware for Hadoop

Readying your system

Installing the prerequisites

Working across nodes without passwords (SSH in keyless)

Downloading Hadoop

Running Hadoop in standalone mode

Setting up a pseudo Hadoop cluster

Planning and sizing clusters

Initial load of data

Organizational data growth

Workload and computational requirements

High availability and fault tolerance

Velocity of data and other factors

Setting up Hadoop in cluster mode

Installing and configuring HDFS in cluster mode

Setting up YARN in cluster mode

Diagnosing the Hadoop cluster

Working with log files

Cluster debugging and tuning tools

JPS (Java Virtual Machine Process Status)

JStack

Summary

Deep Dive into the Hadoop Distributed File System

Technical requirements

How HDFS works

Key features of HDFS

Achieving multi tenancy in HDFS

Snapshots of HDFS

Safe mode

Hot swapping

Federation

Intra-DataNode balancer

Data flow patterns of HDFS

HDFS as primary storage with cache

HDFS as archival storage

HDFS as historical storage

HDFS as a backbone

HDFS configuration files

Hadoop filesystem CLIs

Working with HDFS user commands

Working with Hadoop shell commands

Working with data structures in HDFS

Understanding SequenceFile

MapFile and its variants

Summary

Developing MapReduce Applications

Technical requirements

How MapReduce works

What is MapReduce?

An example of MapReduce

Configuring a MapReduce environment

Working with mapred-site.xml

Working with Job history server

RESTful APIs for Job history server

Understanding Hadoop APIs and packages

Setting up a MapReduce project

Setting up an Eclipse project

Deep diving into MapReduce APIs

Configuring MapReduce jobs

Understanding input formats

Understanding output formats

Working with Mapper APIs

Working with the Reducer API