Playing with Snake – Writing Apache
Pig UDF’s using Java
Apache Pig is having capability to execute Java, Python, or Ruby code inside Pig Script as UDF - thus you can use them to load, aggregate, or do sophisticated data analysis. Here I will explain you how to write Apache Pig UDF’s (User Defined
Functions) using Java. Be ensuring
you have installed Eclipse and Apache Maven onto your machine.
Create a Maven Project for writing Apache Pig UDF by
following below steps
File > New > Maven Project >
Check Create a simple project
Here you need to add pig-0.15.0 dependency into
the POM.XML
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.15.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2</version>
</dependency>
</dependencies>
|
Write Java UDF Class
For writing Apache Pig UDF using Java. We need to implement EvalFunc interface to the class and should override exec method. Here, we are returning the uppercase of the given column in the below explained UDF example.
UpperCaseAttribute Java Class
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class UpperCaseAttribute extends EvalFunc<String>{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
String str = (String)input.get(0);
return str.toUpperCase();
}
}
|
Export class file as JAR
Right click > Export > JAR file > Next
Step 1: Registering the Jar file
Apache Pig works on top of Hadoop. Be ensure that Hadoop is up and running. Then start Pig in MapReduce mode.
$ pig -x mapreduce
First step would be to register the exported JAR by executing below commands. Here I am assuming JAR is at the /home/training/Desktop location
grunt> REGISTER '/home/training/Desktop/UpperCaseAttribute_UDF.jar'
Step 2: Defining Alias
After registering the UDF we need to define an alias by
using Define operator.
Step 3: Load Data into Apache Pig For Using UDF
Suppose we are having Person data as following
PersonDetails
Bill Gates,CEO,Microsoft
Iqubal,CEO,SZIAS
Steve Jobs,CEO,Apple
|
Create Directory and Load PersonDetails Data into the HDFS
Navigate to PersonDetails file directory
$ hadoop fs -mkdir /user/training/PIGDATA
$ hadoop fs -mkdir /user/training/PIGDATA/PIG_UDF_DATA
$ hadoop fs -copyFromLocal PersonDetails /user/training/PIGDATA/PIG_UDF_DATA
$ pig -x mapreduce
Processing Data using Apache Pig
grunt> PERSONDETAILS= LOAD '/user/training/PIGDATA/PIG_UDF_DATA/PersonDetails' Using PigStorage(',') AS (name:chararray, designation:chararray, company:chararray);
Let us use our created UDF to convert the
names of the Person in to upper case.
grunt> PersonNameUpperCase
= FOREACH PERSONDETAILS GENERATE UpperCaseAttribute(name);
grunt> dump PersonNameUpperCase;
2016-02-03 07:07:25,512 [main]
INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2016-02-03 07:07:25,516 [main]
INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2016-02-03 07:07:25,516 [main]
INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
(BILL GATES)
(IQUBAL)
(STEVE JOBS)
Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Hello, i'm trying to develope a UDF in java for Pig but I've got some issues.
ReplyDeleteMy principal goal is to merge different values using some policies. These policies are defined in an external text file that I want to read when I execute the script, but it seems that the script will not read it. To check this I tried to save in another file some results, obtained by the execution of the script, but also for this one the udf do nothing.
My question is, udf does not support the operations of reading to and writing from an external file?
Thanks for your attention
I just want to know about Pig UDF's and found these post is perfect one ,Thanks for sharing the informative post of Pig and able to understand the concepts easily,Thoroughly enjoyed reading
ReplyDeleteAlso Check out the : https://www.credosystemz.com/training-in-chennai/best-hadoop-training-in-chennai/
Reading this blog makes me happy.
ReplyDeleteBig Data and Hadoop Online Training
Very nice Blog,keep sharing more information with us.
ReplyDeletethank you.....
big data and hadoop course
hadoop admin online course