An Apache Hadoop container image. It's not useful to run a single image, better create a Cluster with Docker Compose or Docker Swarm.
git config core.eol lf
git config core.autocrlf input
This image contains a script named start-hadoop
(included in the PATH).
This script is used to initialize NameNodes, DataNode, ResourceManager and NodeMangers.
The script supports running as a daemon if the daemon
argument is passed as the last argument.
This is useful when another command must be used or when the image is being used as the base for another image.
To start a NameNode run the following command:
start-hadoop namenode [daemon]
To start a ResourceManager run the following command:
start-hadoop resourcemanager [daemon]
To start a DataNode/NodeManager at the same Container run the following command:
start-hadoop datanode [daemon]
The easiest way to create a standalone cluster with this image is by using Docker Compose with docker-compose.yml
.
This start the Cluster run the following command:
docker-compose up --remove-orphans --scale datanode_nodemanager=3
A Cluster should contain at least 3 DataNodes/NodeManagers. The new nodes will automatically register themselves with the NameNode.
If you wish to increase the number of DataNodes/NodeManagers change the --scale datanode_nodemanager
value and run the start command again.
You can run the command multiply times to achieve dynamic scaling.
docker-compose exec --user hadoop namenode bash
To stop the whole Cluster run the following command:
docker-compose down --remove-orphans
This start the Cluster run the following command:
docker swarm init
docker stack deploy --compose-file docker-stack.yml ba_stack
# listings
docker stack services ba_stack
# logs
docker service logs ba_stack_namenode
docker service logs ba_stack_datanode
docker service logs ba_stack_resourcemanager
docker service logs ba_stack_spark
docker exec -it --user hadoop <ContinerID> bash
To stop the whole Cluster run the following command:
docker stack rm ba_stack
docker swarm leave --force
The image has a volume mounted at /opt/hdfs
. To maintain states between restarts, mount a volume at this location.
This should be done for the NameNode and the DataNodes.