Skip to content

Latest commit

 

History

History
106 lines (84 loc) · 2.84 KB

README.md

File metadata and controls

106 lines (84 loc) · 2.84 KB

Apache Hadoop Docker Image

An Apache Hadoop container image. It's not useful to run a single image, better create a Cluster with Docker Compose or Docker Swarm.

Setup

git config core.eol lf
git config core.autocrlf input

Custom commands

This image contains a script named start-hadoop (included in the PATH). This script is used to initialize NameNodes, DataNode, ResourceManager and NodeMangers. The script supports running as a daemon if the daemon argument is passed as the last argument. This is useful when another command must be used or when the image is being used as the base for another image.

Starting the NameNode

To start a NameNode run the following command:

start-hadoop namenode [daemon]

Starting the ResourceManager

To start a ResourceManager run the following command:

start-hadoop resourcemanager [daemon]

Starting a DataNode and NodeManager

To start a DataNode/NodeManager at the same Container run the following command:

start-hadoop datanode [daemon]

Creating a standalone cluster with Docker Compose

The easiest way to create a standalone cluster with this image is by using Docker Compose with docker-compose.yml.

Start

This start the Cluster run the following command:

docker-compose up --remove-orphans --scale datanode_nodemanager=3

A Cluster should contain at least 3 DataNodes/NodeManagers. The new nodes will automatically register themselves with the NameNode. If you wish to increase the number of DataNodes/NodeManagers change the --scale datanode_nodemanager value and run the start command again. You can run the command multiply times to achieve dynamic scaling.

Connect to NameNode

docker-compose exec --user hadoop namenode bash

Visit

Hadoop Web UI

YARN Web UI

Stop

To stop the whole Cluster run the following command:

docker-compose down --remove-orphans

Creating a multiple host cluster with Docker Swarm

Start

This start the Cluster run the following command:

docker swarm init
docker stack deploy --compose-file docker-stack.yml ba_stack

Check services

# listings
docker stack services ba_stack

# logs
docker service logs ba_stack_namenode
docker service logs ba_stack_datanode
docker service logs ba_stack_resourcemanager 
docker service logs ba_stack_spark

Connect

docker exec -it --user hadoop <ContinerID> bash

Stop

To stop the whole Cluster run the following command:

docker stack rm ba_stack
docker swarm leave --force

Persistence

The image has a volume mounted at /opt/hdfs. To maintain states between restarts, mount a volume at this location. This should be done for the NameNode and the DataNodes.