Tuesday, March 22, 2016

Elephant Bird API - Loading JSON data Using JsonLoader into Apache Pig

Elephant Bird API - Loading JSON data Using JsonLoader into Apache Pig

Here, I will explain how to use the Elephant Bird API for loading JSON data into Apache Pig. 





















Suppose we are having following Blog Data into JSON format.

Blog JSON:

"Blog" : "IMU008390900000", 
"BlogType" : "Big Data Blog"
"AuthorDisplayName" : "Iqubal Mustafa Kaki", 
"Title" : "Elephant Bird API - Loading JSON data into Apache Pig", 
"AuthorEmailId" : " iqubal.kaki@gmail.com", 
}

Steps for loading JSON Data into Apache Pig


Here we need to register below jars

grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-core-4.13.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-pig-4.13.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/elephant-bird-hadoop-compat-4.5.jar'
grunt> REGISTER '/home/training/Desktop/ElephantBird/json-simple-1.1.jar'

Load JSON into Pig Using JsonLoader

gruntrecords = load '/user/training/Blog.json' using com.twitter.elephantbird.pig.load.JsonLoader('Blog:chararray,BlogType:chararray,AuthorDisplayName:chararray,Title:chararray,AuthorEmailId:chararray');

Verify the same

grunt> dump records;

Output will be

Successfully stored 1 records (203 bytes) in: "hdfs://localhost/tmp/temp-726203488/tmp-918944215"

Counters:
Total records written : 1
Total bytes written : 203
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201603232110_0003

2016-03-23 21:48:54,132 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-03-23 21:48:54,192 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-03-23 21:48:54,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
([AuthorEmailId#iqubal.kaki@gmail.com,AuthorDisplayName#Iqubal Mustafa Kaki,BlogType#Big Data Blog,Title#Elephant Bird API Loading JSON data into Apache Pig,Blog#IMU008390900000])



























Reference https://pig.apache.org/docs/r0.11.1/func.html#jsonloadstore

Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist

Want to connect with me
If you want to connect with me, please connect through my email  iqubal.kaki@gmail.com

13 comments:

  1. Do you have any video of that? I’d love to find out some additional information.

    Top 5 Healthcare Websites Designed from Scratch.

    ReplyDelete
  2. awesome post presented by you..your writing style is fabulous and keep update with your blogs Big data hadoop online training

    ReplyDelete
  3. I really loved reading through this article... Thanks for sharing such an amazing post with us and keep blogging...
    Hadoop Online Training
    Datascience Online TRaining

    ReplyDelete
  4. The blog is highly informative and has answered all my questions. To introduce about our company and the activities, Techno Data Groupis a database provider that helps you to boost your sales & grow your business through well-build Hadoop Users Email.

    ReplyDelete
  5. Very nice article,keep sharing more article about big data and hadoop.
    thank you....

    big data online training

    hadoop admin online training

    ReplyDelete