Just setup my first Hadoop Cluster on AWS EC2 RHEL 6.5 and wanted to share my setup experience and steps to avoid common errors and mistakes during the setup process. Hope that you will find this helpful!
I searched on the web and managed to collate instructions and how-to’s and want to take the time to acknowledge the following for the instructions.
This will be a two part post and in this first part, we will cover step 1 – installing and deploying Hadoop on EC2.
Ready? Lets start …. !
Step 1: Install, Setup and Deploy Hadoop Cluster on Amazon EC2 with HDP2
I followed the tutorial on Hortonworks website: http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/
Follow the above instruction to setup EC2 instances and install Ambari server.
As of writing, the latest version of Ambari is 1.7.0. Please refer to http://docs.hortonworks.com for instructions on installing the latest version of Ambari.
*Install Ambari on the sever you wish to use as the main server to manage the cluster.
Potential Issues Encountered in Step 1
- Issue #1: Unable to start Ambari because of ntpd not running
Although the above tutorial does not explicitly mention it, you must set ntpd running on all your nodes in the cluster. This is so that they can synchronize with each other when executing Hadoop jobs. Ensure that you run the following commands on all the nodes.
- Issue #2: Error setting up Ambari cluster – Please login as the user “ec2-user” rather than the user “root”. scp /usr/lib/python2.6/site-packages/ambari_commons
Ensure that the SSH user entry when setting the Ambari cluster is “ec2-user”
- Issue #3: Error setting up Ambari cluster – Some warnings were encountered while performing checks against the 4 registered hosts above Click here to see the warnings.
Ensure that all warnings are resolved – try not to proceed with the warnings still in place as there would be issues faced in subsequent starting up of the Ambari cluster.
Generally the process of setting up and install Hadoop on the Ambari cluster was relatively smooth.
In the next post I will share on the setup and installation of R and RStudio on the cluster.