Home Page History

About

This python projects automates the deployment of a hadoop cluster in our private cloud infrastructure. It was developed to support the experiments of the BTJ project. However anyone can fork it and use it as a base for other projects running on hadoop.

How To Run

Set the "install_dir" parameter inside the "Coordinator.properties" file and run the command "python Coordinator.py start" inside the project directory from a terminal.

Important Notice

You should be connected to the physical router of the cloud so that the necessary files can be transferred through the local network to the VMs.

Cluster Configuration

Inside the project you will find a file called "Coordinator.properties" which includes all the cluster parameterization (for both VMs and hadoop). Comments are provided for every parameter inside the file.

Passwordless ssh login to VMs

In the templates/ssh_keys/authorized_keys file you can add as many public ssh keys you like. All the keys will be copied to every VM so that you can access them without password.

BTJ configuration

Every file inside templates/btj directory will be copied to the master VM.

Datasets

The shared directory with the datasets running in "leopard" file server will be automatically mounted to the master VM so that you can upload any dataset you want to the HDFS through 1Gbps network.

Hadoop web monitoring

In the master VM the w3m text-based browser is installed to support hadoop web monitoring. You can just type w3m http://localhost:50030 or 50070 to monitor your cluster.

How do I know that my cluster is ready?

Just take a (continues) look to the logs/Coordinator.log file (tail -f logs/Coordinator.log) and when everything is ready you will be informed.

Last edited by Thanasis Naskos over 9 years ago