How To Setup RStudio With Hadoop Cluster On AWS EC2 RHEL 6.5 – Part 2

This is the second part of the how-to for setting up RStudio on Hadoop and AWS EC2. In this post, I will be sharing on the steps on installing R and RStudio as well as how to resolve the issues I encountered while setting up. I want to acknowledge the following site used as part of the instructions in this post:

The Coatless Professor

Ready? Lets start…

Step 2: Install R and RStudio

Follow the instructions on the link below.

http://www.thecoatlessprofessor.com/programming/installing-r-studio-server-on-hortonworks-virtual-box-image-and-rmr2-a-k-a-rhadoop-r-package

Skip the step of installing and setting up Oracle Virtual Box and loading Hortonwork’s Virtual Box Image into Virtual Box and head straight to the section of “Installing RStudio Server on Hortonwork’s image (based on CENT OS 6)” and follow the instructions until complete.

Be sure to run the R test script and observe that the jobs are being split across multiple Hadoop clusters.

Potential Issues Encountered in Step 2

However, as all tutorials go, there will always be issues encountered. The following section details the issues I encountered during my installation and the resolutions I found that addresses them. Hopefully it would be of help to you as well.

  • Issue #1: -bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

This error is an indication that the locale setting on the OS is not correct / set to an unrecognized locale. If this is not fixed, you will not be able to start RStudio.  To fix this we need to change the system language.

First verify the locales on your system by running the following command:

Step2-Issue1-pic1

The issue here is that we need to set LC_CTYPE and LC_ALL to be “en_US.UTF-8”. To do this we need to edit the /etc/sysconfig/i18n file

sudo vi /etc/sysconfig/i18n

LC_CTYPE=en_US.UTF-8
LC_ALL=en_US.UTF-8

Add LC_CTYPE=en_US.UTF-8 and LC_ALL=en_US.UTF-8 into the file and save it.

Step2-Issue1-pic2

Reload the i18n file by logging in again and verify that the locale setting has been set correctly.

Step2-Issue1-pic3

When you next login, you should not be seeing any further errors related to LC_CTYPE.

  • Issue #2: Ensure that R is successfully installed

When executing the “sudo yum -y install R git wget openssl098e vim curl” command, it may seem that you have successfully executed the command and installed all the components.

Step2-Issue2-pic1

You should look out for the following:

Step2-Issue2-pic2

This shows that there was no package R available. We need to add an additional repository that allows us to install the new packages – the Extra Packages for Enterprise Linux (EPEL). Execute the following commands:

sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'
sudo yum update
sudo yum install R

When prompted to enter [y/N], select “y”. The update should take some time and once it is complete, we can execute the command again –

sudo yum -y install R git wget openssl098e vim curl

To verify that R was successfully installed, execute the following command:

R

The R console should appear without any errors.

  • Issue #3: Missing packages (lapack, blas, textinfo and libicu)

When installing R, you may encounter the following errors due to missing packages (especially on RHEL 6.6 and above), which need to be installed before proceeding to setup R.

Step2-Issue3-pic1

At the console, execute the following commands to install the missing packages.

wget http://mirror.centos.org/centos/6/os/x86_64/Packages/lapack-devel-3.2.1-4.el6.x86_64.rpm
wget http://mirror.centos.org/centos/6/os/x86_64/Packages/blas-devel-3.2.1-4.el6.x86_64.rpm
wget http://mirror.centos.org/centos/6/os/x86_64/Packages/texinfo-tex-4.13a-8.el6.x86_64.rpm
wget http://mirror.centos.org/centos/6/os/x86_64/Packages/libicu-devel-4.2.1-9.1.el6_2.x86_64.rpm
sudo yum localinstall *.rpm

Once the packages have been successfully installed, you can proceed to install R.

  • Issue #4: R must be installed on all nodes in the cluster

Although not explicitly stated, do ensure that R is installed on all nodes in the cluster, otherwise you will encounter errors pertaining to executing the map reduce job on the Hadoop cluster.

That ends this 2-part tutorial on installing RStudio with Hadoop Cluster on AWS EC2 RHEL 6.5. Hope you find this helpful!


Link to Part 1

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s