Big
Data Analysis Using Apache Pig
Pig was developed by Yahoo. It is an engine built on the top of MapReduce which will
convert PigLatin Script into MapReduce code.
Thus Pig has a component known as Pig Engine that
accepts the Pig Latin scripts as input and converts those scripts into
MapReduce jobs. It is using PigLatin scripting language for the operations
like ETL (Extract, Transform and Load). Thus the definition of Apache Pig would
be Pig is a high level scripting language that is used with Apache
Hadoop Ecosystem. Also anyone can say it is a
tool/platform which is used to analyze larger sets of data representing them as
data flows.
The main motive behind developing Pig was to reduce the time required for development through its multi query data flows language.
The Key features of Apache Pig are as following
The Key features of Apache Pig are as following
- It has
procedural data flow language i.e. PigLatin
- It is mainly
used for programming
- It can handles
all kinds of data e.g. Structured as well as Unstructured
- By using Pig’s multi-query
approach anyone can operate many operation together in a single flow, reducing
the time of multiple times data scanned thus by using this we need to write 1/20th of code and required 1/16th time of development.
- It’s providing Rich
Set of operators for filter, join, sort etc.
- It’s providing
complex data types e.g. tuples, bags, and maps
- It is generally
used by the researcher and programmer
- It operates on
the client side of any cluster
- It does not have
a dedicated metadata database and schema or data types will be defined in the
script itself.
- Through User
Defined Functions (UDF) facility in Pig, anyone can execute many
languages code like Ruby, Python and Java. In other words in UDF we can use Java, Python and other language code and can execute them by using Pig Script.
Installing
Apache Pig
Here I am assuming Hadoop is up and running onto the machine.
Download Apache Pig-0.15.0 from Apache Pig Download Link
Go to the specified location where you want to have Pig installable, then unzip the Apache Pig zipped folder.
$ tar -xzf pig-0.15.0.tar.gz
$ mv pig-0.15.0 pig
$ mv pig-0.15.0 pig
Apache Pig Configuration
Setup Environment Variable
Set Environment Variables by editing bashrc file
using Edit ~/.bashrc file
and append following lines into that and save.
export PIG_HOME = /home/training/Pig
export PATH =
PATH:/home/training/pig/bin
export PIG_CLASSPATH = $HADOOP_HOME/conf
Then execute the below command to source ~/.bashrc
$ source ~/.bashrc
Verifying
the Installation
Verify the installation of Apache Pig by typing the pig command. If the installation is successful, you will get the grunt shell of Apache Pig as shown below.
Apache Pig
Execution Modes
Local Mode
In local mode, there is no need of Hadoop or HDFS. It will
require all file and execution from local system. This mode is generally used
for testing purpose.
$ pig -x local
MapReduce
Mode
MapReduce mode is where we load or process the data that exists
in the Hadoop File System (HDFS) using Apache Pig. In this mode, whenever we
execute the Pig Latin statements to process the data, a MapReduce job is
invoked in the back-end to perform a particular operation on the data that
exists in the HDFS.
$ pig -x mapreduce
Load/Read and Store Data Using Apache Pig
Apache Pig works on top of Hadoop. Be ensure that Hadoop
is up and running. Then start Pig in MapReduce mode.
$ pig -x mapreduce
Suppose we are having Person data as following
PersonDetails
Bill Gates,CEO,Microsoft
Iqubal,CEO,SZIAS
Steve Jobs,CEO,Apple
|
Create
Directory and Load PersonDetails Data into the HDFS
Navigate to PersonDetails file directory
$ hadoop fs -mkdir /user/training/PIGDATA
$ hadoop fs -mkdir /user/training/PIGDATA/PIG_UDF_DATA
$ hadoop fs -copyFromLocal PersonDetails
/user/training/PIGDATA/PIG_UDF_DATA
$ pig -x mapreduce
Processing Data using Apache Pig
grunt> PERSONDETAILS= LOAD
'/user/training/PIGDATA/PIG_UDF_DATA/PersonDetails' Using PigStorage(',') AS
(name:chararray, designation:chararray, company:chararray);
Reading Data using DUMP command
grunt> dump PERSONDETAILS;
(Bill Gates,CEO,Microsoft)
(Iqubal,CEO,SZIAS)
(Steve Jobs,CEO,Apple)
Storing Data using STORE command
grunt> STORE PERSONDETAILS INTO 'store_pig_latin';
Counters:
Total records written : 3
Total bytes written : 63
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201601290600_0012
2016-02-02 07:51:23,723 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
Verify by checking the file
grunt> cat store_pig_latin;
Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Analogica Data is one of the Top Big Data Analysis Company in India.provides services like Dashboarding and Visualisation,Big Data Analysis,Internet Of Things,Data Warehousing,Data Mining and Machine Learning.
ReplyDeleteHadoop framework is an open-source software framework used to distribute data.
ReplyDeleteHadoop uses different programming languages for storing huge data.
Good Knowledge sharing about Big Data Hadoop.
Big Data Hadoop has a huge demand in IT Industry.
http://eonlinetraining.co/course/big-data-hadoop-online-training/
I get a lot of great information from this blog. Thanks for sharing this valuable information on Big Data Analysis. Big Data Hadoop Online Training Bangalore
ReplyDeleteIt has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted. digital marketing jobs career opportunities in abroad
ReplyDeleteAdvance Digital Marketing Training in chennai– 100% Job Guarantee
Really you have done great job,There are may person searching about that now they will find enough resources by your post
ReplyDeleteData science training in velachery
Data science training in kalyan nagar
Data Science training in OMR
Data Science training in anna nagar
Data Science training in chennai
Data Science training in marathahalli
Data Science training in BTM layout
Data Science training in rajaji nagar
Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision.
ReplyDeleteDevops training in Chennai
Devops training in Bangalore
Devops training in Pune
Devops Online training
Devops training in Pune
Devops training in Bangalore
Devops training in tambaram
Your story is truly inspirational and I have learned a lot from your blog. Much appreciated.
ReplyDeletepython training in chennai | python training in bangalore
python online training | python training in pune
python training in chennai | python training in bangalore
After reading your post I understood that last week was with full of surprises and happiness for you. Congratz! Even though the website is work related, you can update small events in your life and share your happiness with us too.
ReplyDeletejava training in chennai | java training in bangalore
java online training | java training in pune
java training in chennai | java training in bangalore
Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
ReplyDeleteData Science Training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune
Data science training in kalyan nagar
selenium training in chennai
This comment has been removed by the author.
ReplyDeleteExcellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this
ReplyDeleteangularjs Training in chennai
angularjs Training in chennai
angularjs-Training in tambaram
angularjs-Training in sholinganallur
It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
ReplyDeletepython training in pune
python training institute in chennai
python training in Bangalore
Greetings. I know this is somewhat off-topic, but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform like yours, and I’m having difficulty finding one? Thanks a lot.
ReplyDeleteAmazon Web Services Training in Tambaram, Chennai|Best AWS Training in Tambaram, Chennai
Amazon Online Training
AWS Training in JayaNagar | Amazon Web Services Training in jayaNagar
AWS Training in Rajaji Nagar | Amazon Web Services Training in Rajaji Nagar
Amazon Web Services Training in Pune | Best AWS Training in Pune
AWS Online Training | Online AWS Certification Course - Gangboard
Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
ReplyDeletepython training in tambaram
python training in annanagar
python training in jayanagar
Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.
ReplyDeleteAWS Online Training | Online AWS Certification Course - Gangboard
Best Selenium Training in Chennai | Selenium Training Institute in Chennai | Besant Technologies
Selenium Training in Bangalore | Best Selenium Training in Bangalore
AWS Training in Bangalore | Amazon Web Services Training in Bangalore
Amazon Web Services Training in Pune | Best AWS Training in Pune
Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
ReplyDeleteselenium training in electronic city | selenium training in electronic city | Selenium Training in Chennai | Selenium online Training | Selenium Training in Pune | Selenium Training in Bangalore
Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging.
ReplyDeletedevops online training
aws online training
data science with python online training
data science online training
rpa online training
It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.
ReplyDeleteMicrosoft Azure online training
Selenium online training
Java online training
uipath online training
Python online training
This information you provided in the blog that is really unique I love it!! Thanks for sharing such a great blog Keep posting.
ReplyDeleteBig Data Training In Delhi
Big Data Course In Delhi
It's a great post! Thank you for sharing your knowledge to others, it was very informative and in depth one.
ReplyDeleteApache Pig Training in Electronic City
This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.datascience with python training in bangalore
ReplyDelete
ReplyDeleteHi, you know this article is helping for me and everyone and thanks for sharing information Big Data Training in Delhi
Thank you so much.
ReplyDeleteBig Data and Hadoop Online Training
Awesome Blog, I Loved it, Me first time here in the Blog. Totally Impressed.Thanks for your information!!
ReplyDeleteandroid training in chennai
android online training in chennai
android training in bangalore
android training in hyderabad
android Training in coimbatore
android training
android online training
Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome. I will instantly grab your rss feed to stay informed of any updates you make and as well take the advantage to share some latest information about
ReplyDeleteCREDIT CARD HACK SOFTWARE which many are not yet informed of, the recent technology.
Thank so much for the great job.
Very nice article,keep sharing more posts with us.
ReplyDeletethank you.....
Big data and hadoop training
Big data and hadoop course
Did you want to set your career towards Amazon Web Services? Then Infycle is with you to make this into your life. Infycle Technologies gives the combined and best Big AWS Training in Chennai, along with the 100% hands-on training guided by professional teachers in the field. In addition to this, the interviews for the placement will be guided to the candidates, so that, they can face the interviews without struggles. Apart from all, the candidates will be placed in the top MNC's with a great salary package. To get it all, call 7502633633 and make this happen for your happy life.Best AWS Training in Chennai
ReplyDeleteInfycle Technologies, the = No.1 software training institute in Chennai offers the No.1 Data Science course in Chennai for tech professionals and students at the best offers. In addition to the Data Science course, other in-demand courses such as Python, Selenium, Oracle, Java, Python, Power BI, Digital Marketing also will be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.
ReplyDeleteOffering the most effective collection of knowledge in real-time to students through experts and creating them into industry Python Training in Hyderabad with the help of experts from AI Patasala.
ReplyDeletePython Institute in Hyderabad
This post is so interactive and informative.keep update more information...
ReplyDeleteDigital Marketing Course in Tambaram
Digital Marketing Course in Chennai
Learn many things from your blog, great work, keep shining and if you are intresting in data engineering then checkout my blog data science course in satara
ReplyDeletegreat post.
ReplyDelete