How To Setup RStudio With Hadoop Cluster On AWS EC2 RHEL 6.5 – Part 1

Just setup my first Hadoop Cluster on AWS EC2 RHEL 6.5 and wanted to share my setup experience and steps to avoid common errors and mistakes during the setup process. Hope that you will find this helpful!

I searched on the web and managed to collate instructions and how-to’s and want to take the time to acknowledge the following for the instructions.

Hortonworks

The Coatless Professor

This will be a two part post and in this first part, we will cover step 1 – installing and deploying Hadoop on EC2.

Ready? Lets start …. !

Step 1: Install, Setup and Deploy Hadoop Cluster on Amazon EC2 with HDP2

I followed the tutorial on Hortonworks website: http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/

Follow the above instruction to setup EC2 instances and install Ambari server.

As of writing, the latest version of Ambari is 1.7.0. Please refer to http://docs.hortonworks.com for instructions on installing the latest version of Ambari.

*Install Ambari on the sever you wish to use as the main server to manage the cluster.

Potential Issues Encountered in Step 1

  • Issue #1: Unable to start Ambari because of ntpd not running

Although the above tutorial does not explicitly mention it, you must set ntpd running on all your nodes in the cluster. This is so that they can synchronize with each other when executing Hadoop jobs. Ensure that you run the following commands on all the nodes.

Step1-Issue1-pic1

  • Issue #2: Error setting up Ambari cluster – Please login as the user “ec2-user” rather than the user “root”. scp /usr/lib/python2.6/site-packages/ambari_commons

Ensure that the SSH user entry when setting the Ambari cluster is “ec2-user”

Step1-Issue2-pic1

  • Issue #3: Error setting up Ambari cluster – Some warnings were encountered while performing checks against the 4 registered hosts above Click here to see the warnings.

Ensure that all warnings are resolved – try not to proceed with the warnings still in place as there would be issues faced in subsequent starting up of the Ambari cluster.

Generally the process of setting up and install Hadoop on the Ambari cluster was relatively smooth.

In the next post I will share on the setup and installation of R and RStudio on the cluster.

Advertisements

One thought on “How To Setup RStudio With Hadoop Cluster On AWS EC2 RHEL 6.5 – Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s