This is the second part of the how-to for setting up RStudio on Hadoop and AWS EC2. In this post, I will be sharing on the steps on installing R and RStudio as well as how to resolve the issues I encountered while setting up. I want to acknowledge the following site used as part of the instructions in this post:
Ready? Lets start…
Step 2: Install R and RStudio
Follow the instructions on the link below.
Skip the step of installing and setting up Oracle Virtual Box and loading Hortonwork’s Virtual Box Image into Virtual Box and head straight to the section of “Installing RStudio Server on Hortonwork’s image (based on CENT OS 6)” and follow the instructions until complete.
Be sure to run the R test script and observe that the jobs are being split across multiple Hadoop clusters.
Potential Issues Encountered in Step 2
However, as all tutorials go, there will always be issues encountered. The following section details the issues I encountered during my installation and the resolutions I found that addresses them. Hopefully it would be of help to you as well.
- Issue #1: -bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
This error is an indication that the locale setting on the OS is not correct / set to an unrecognized locale. If this is not fixed, you will not be able to start RStudio. To fix this we need to change the system language.
First verify the locales on your system by running the following command:
The issue here is that we need to set LC_CTYPE and LC_ALL to be “en_US.UTF-8”. To do this we need to edit the /etc/sysconfig/i18n file
sudo vi /etc/sysconfig/i18n LC_CTYPE=en_US.UTF-8 LC_ALL=en_US.UTF-8
Add LC_CTYPE=en_US.UTF-8 and LC_ALL=en_US.UTF-8 into the file and save it.
Reload the i18n file by logging in again and verify that the locale setting has been set correctly.
When you next login, you should not be seeing any further errors related to LC_CTYPE.
- Issue #2: Ensure that R is successfully installed
When executing the “sudo yum -y install R git wget openssl098e vim curl” command, it may seem that you have successfully executed the command and installed all the components.
You should look out for the following:
This shows that there was no package R available. We need to add an additional repository that allows us to install the new packages – the Extra Packages for Enterprise Linux (EPEL). Execute the following commands:
sudo su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm' sudo yum update sudo yum install R
When prompted to enter [y/N], select “y”. The update should take some time and once it is complete, we can execute the command again –
sudo yum -y install R git wget openssl098e vim curl
To verify that R was successfully installed, execute the following command:
The R console should appear without any errors.
- Issue #3: Missing packages (lapack, blas, textinfo and libicu)
When installing R, you may encounter the following errors due to missing packages (especially on RHEL 6.6 and above), which need to be installed before proceeding to setup R.
At the console, execute the following commands to install the missing packages.
wget http://mirror.centos.org/centos/6/os/x86_64/Packages/lapack-devel-3.2.1-4.el6.x86_64.rpm wget http://mirror.centos.org/centos/6/os/x86_64/Packages/blas-devel-3.2.1-4.el6.x86_64.rpm wget http://mirror.centos.org/centos/6/os/x86_64/Packages/texinfo-tex-4.13a-8.el6.x86_64.rpm wget http://mirror.centos.org/centos/6/os/x86_64/Packages/libicu-devel-4.2.1-9.1.el6_2.x86_64.rpm sudo yum localinstall *.rpm
Once the packages have been successfully installed, you can proceed to install R.
- Issue #4: R must be installed on all nodes in the cluster
Although not explicitly stated, do ensure that R is installed on all nodes in the cluster, otherwise you will encounter errors pertaining to executing the map reduce job on the Hadoop cluster.
That ends this 2-part tutorial on installing RStudio with Hadoop Cluster on AWS EC2 RHEL 6.5. Hope you find this helpful!
Link to Part 1