Tuesday, March 22, 2016

Elephant Bird API - Loading JSON data Using JsonLoader into Apache Pig

Elephant Bird API - Loading JSON data Using JsonLoader into Apache Pig

Here, I will explain how to use the Elephant Bird API for loading JSON data into Apache Pig. 





















Suppose we are having following Blog Data into JSON format.

Blog JSON:

"Blog" : "IMU008390900000", 
"BlogType" : "Big Data Blog"
"AuthorDisplayName" : "Iqubal Mustafa Kaki", 
"Title" : "Elephant Bird API - Loading JSON data into Apache Pig", 
"AuthorEmailId" : " iqubal.kaki@gmail.com", 
}

Steps for loading JSON Data into Apache Pig


Here we need to register below jars

grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-core-4.13.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-pig-4.13.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-hadoop-compat-4.5.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/json-simple-1.1.jar'

Load JSON into Pig Using JsonLoader

gruntrecords = load '/user/training/Blog.json' using com.twitter.elephantbird.pig.load.JsonLoader('Blog:chararray,BlogType:chararray,AuthorDisplayName:chararray,Title:chararray,AuthorEmailId:chararray');

Verify the same

grunt> dump records;

Output will be

Successfully stored 1 records (203 bytes) in: "hdfs://localhost/tmp/temp-726203488/tmp-918944215"

Counters:
Total records written : 1
Total bytes written : 203
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201603232110_0003

2016-03-23 21:48:54,132 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-03-23 21:48:54,192 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-03-23 21:48:54,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
([AuthorEmailId#iqubal.kaki@gmail.com,AuthorDisplayName#Iqubal Mustafa Kaki,BlogType#Big Data Blog,Title#Elephant Bird API Loading JSON data into Apache Pig,Blog#IMU008390900000])



























Reference https://pig.apache.org/docs/r0.11.1/func.html#jsonloadstore

Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist

Want to connect with me
If you want to connect with me, please connect through my email  iqubal.kaki@gmail.com