How can R and Hadoop be used together?

Hadoop is a disruptive Java-based programming framework that supports the processing of large data sets in a distributed computing environment, while R is a programming language and software environment for statistical computing and graphics.

Hadoop and R complement each other quite well in terms of visualization and analytics of big data.

viewing image

5 Ways Hadoop and R Work Together

There are five different ways of using Hadoop and R together:

RHadoop – RHadoop is a great open source solution for R and Hadoop provided by Revolution Analytics. RHadoop is bundled with four main R packages to manage and analyze the data with Hadoop framework.

RHIPE – RHIPE is the R and Hadoop Integrated Programming Environment specially designed with Divide and Recombine (D&R) techniques to analyze the large datasets.

ORCH – ORCH is Oracle R connector for Hadoop. ORCH can be used on the Oracle Big Data Appliance or on non-Oracle Hadoop clusters.

Hadoop Streaming – Hadoop streaming utilities as R scripts which is R packages available at CRAN. This R package is developed by David S. Rosenberg with the consideration of making this Hadoop Streaming easier as possible for R users.

Hadoop Streaming – Hadoop Streaming is Hadoop utility which allows users to develop and run MapReduce program in language other than java.

Now, let’s see a demo:

demo 1

demo2

demo3

demo4

demo5

Hadoop Installation

RHadoop is a 3 package-collection: rmr, rhbase and rhdfs. The package called rmr provides the Map Reduce functionality of Hadoop in R which you can learn about with this Hadoop course. Rhbase provides the R database management called HBase and Rhdfs provides the R file management called HDFS.

The first step is to get Hadoop installed and to do this you will need to download hadoop-1.2.tar.gz and then begin unpacking it. Next, you will need to set Java-Home and in conf / Hadoop _ env.sh, type this line:

image 1

After this step you will then need to enable self-log-in after setting up your remote desktop. Go to system preferences then under network and internet, click sharing. Under the services list, check ‘remote log-in.’ You can also click the ‘only these users’ buttons for extra security before choosing Hadoop.

You can also set up self-log-in and remote desktop by adding this line in conf/Hadoop_env.sh:

image 2

R Installation

With the method below, you can install multiple R versions on Mac. Especially if yours is a more updated R version and you plan to attempt it with v 2. 15. 2. On Hadoop, you can successfully run v1. 15. 1 and Rv1. 15. 2 using the procedure below.

Assume that on a Mac, you currently have Rv3. 0. 0. In Applications, first rename the R_64bitapp to R3. 0. 0_64bit app and rename the R app to R3. 0. 0. Next, install R v 2. 15. 2 before renaming the R_64bit.app and the R.app which you have just installed.

As such, R users are not required to learn a new language, e.g., Java, or environment, e.g., cluster software and hardware, to work with Hadoop. Moreover, functionality from R open source packages can be used in the writing of mapper and reducer functions.

Since the popularity of combined platform of R and Hadoop increases more and more, I think the Big Data Analytics can become a emerging trend. With the help of this parallel Data Analytics platform, Large organization can easily derive insightful insights to get bigger and bigger advantages from Big Data Analytics.

Leave a Comment

Your email address will not be published. Required fields are marked *