Initialize Pig Server & Communicate Using Java
Just like SQL we can be embed Pig Script inside Java
Application. Thus this is the way I initialize my Pig Server and can
communicate the same by using Java code. Here I will explain you the steps, how
anybody can perform the action item.
Here I am assuming Hadoop & Pig are up and running onto the machine.
You can download below explained project and data-sets from the link EmbededPigInsideJavaPOC or from GitHub download link
Create a new Java Project and make sure below libraries should be added in the class path of the project.
i) $HADOOP_HOME/*.jar
ii) $HADOOP_HOME/lib/*.jar
iii) $PIG_HOME/*.jar or
pig.jar
iv) $ZOOKEEPER_HOME/*.jar
The following action items
need to be performed.
Ø Load input data into the HDFS.
Ø Create an
instance of Pig Server class in Java
Ø Create
an instance of Pig Server class in Java
Ø Execute
Pig Script command by using PigServer class
by using PigServer.registerQuery().
Ø To
retrieve the result and store use PigServer.openIterator()
or PigServer.store().
Ø Use PigServer.registerJar() for registering UDF.
Load input data into the HDFS
Here I have the data-sets of Bollywood Super hit Movies. Place the bollywood_hit_movies.csv inside your project directory. You can get the data-sets which is in the project folder from the link EmbededPigInsideJavaPOC or from GitHub download link.
$ hadoop fs -copyFromLocal
/home/training/.eclipse_workspace/PigEmbedded/bollywood_hit_movies.csv
/home/training/.eclipse_workspace/PigEmbedded/bollywood_hit_movies.csv
EmbededPigScriptInsideJava Class
project In this class, doing the analysis of Bollywood Super Hit Movies by Leading Actor Using Apache Pig Script embedded inside the Java Application.
package com.pig.embeded;
import java.io.IOException;
import java.util.Properties;
import org.apache.pig.ExecType;
import org.apache.pig.PigServer;
import org.apache.pig.backend.executionengine.ExecException;
/**
*
* @author Iqubal Mustafa Kaki
*
*/
public class EmbededPigScriptInsideJava {
public static void main(String args[]) throws ExecException {
Properties p = new Properties();
p.setProperty("fs.default.name", "hdfs://localhost:8020/");
p.setProperty("mapred.job.tracker", "localhost:8021");
PigServer pigServer = new PigServer(ExecType.LOCAL, p);
try {
runMyQuery(pigServer, "bollywood_hit_movies.csv");
} catch (IOException e) {
e.printStackTrace();
}
}
private static void runMyQuery(PigServer pigServer, String inputfile)
throws IOException {
pigServer.setJobName("Embedded PIG Statements in JAVA");
pigServer.registerQuery("bollywood_hit = LOAD '"+ inputfile
+ "' Using PigStorage(',') AS (year: int, movie: chararray,lead_actor:chararray);");
pigServer.registerQuery("actor_superhits = GROUP bollywood_hit BY lead_actor;");
pigServer.registerQuery("number_of_hits = FOREACH actor_superhits GENERATE group, COUNT(bollywood_hit);");
pigServer.store("number_of_hits", "actor_superhit_dir");
}
}
|
Run the above class, it will create the output actor_superhit_dir inside current project directory.
Input(s):
Successfully read records from: "file:///home/training/.eclipse_workspace/EmbededPigInsideJavaPOC/bollywood_hit_movies.csv"
Output(s):
Successfully stored records in: "actor_superhit_dir"
Job DAG:
job_local_0001
16/02/08 04:59:18 INFO mapReduceLayer.MapReduceLauncher: Success!
Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
This is really an awesome article. Thank you for sharing this.It is worth reading for everyone.
ReplyDeleteBigdata Hadoop Training
Very nice Blog,keep sharing more information with us.
ReplyDeletethank you.....
big data hadoop course
hadoop admin course