Apache Pig - Maximum
Average Salary Analytics
Here I am having large dummy data-set of
employee. I will explain you how anybody can do the maximum average salary analysis of this
data-set by using Apache Pig.
Salary
Data Analytics
Please download the dummy data-set from Employee
Dummy Data
Here I am assuming Apache Hadoop and Pig is up running
onto your machine.
Create
Directory and Load employee_dataset Data into the HDFS
Navigate to dummy data file directory
$ hadoop fs -mkdir
/user/training/PIGDATA
$ hadoop fs -mkdir
/user/training/PIGDATA/PIG_UDF_DATA
$ hadoop fs -copyFromLocal employee_dataset /user/training/PIGDATA/PIG_UDF_DATA
$ pig -x mapreduce
Processing Data using Apache Pig
Step: 1 - Load
dataset with column names and datatypes
grunt> employeeData = LOAD '/user/training/PIGDATA/PIG_UDF_DATA/employee_dataset' using
PigStorage(',') AS (emp_id:int, emp_name:chararray, job_title:chararray,
dept_id:int, salary:float);
Step: 2 - Group
records by department
grunt> group_by_dept = GROUP employeeData
BY dept_id;
Step: 3 - Calculate
average salary by department
grunt> average_sal
= FOREACH group_by_dept GENERATE group, AVG(employeeData.salary) AS avgsalary;
Step: 4 - Sort salary in decreasing
order and select the top 1 to get the max salary
grunt> sorted_avg_sal = ORDER
average_sal BY avgsalary desc;
grunt> avg_max_sal_analytic =
LIMIT sorted_avg_sal 1;
Step: 5 - Store
results to HDFS
grunt> STORE
avg_max_sal_analytic INTO '/home/training/Desktop/output/pig/topdepartment';
Output :
Input(s):
Successfully read 10293 records (535750 bytes)
from: "/user/training/PIGDATA/PIG_UDF_DATA/employee_dataset"
Output(s):
Successfully stored 1 records (23 bytes) in:
"/home/training/Desktop/output/pig/topdepartment"
Counters:
Total records written : 1
Total bytes written : 23
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201601290600_0017 ->
job_201601290600_0018,
job_201601290600_0018 ->
job_201601290600_0019,
job_201601290600_0019
2016-02-04 05:09:41,199 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2016-02-04 05:09:41,199 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
Step: 6 - Check the
output
grunt> cat /home/training/Desktop/output/pig/topdepartment
Output:
77
9.99999986991104E14
Hope you have enjoyed the article.
Author: Iqubal Mustafa
Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com



After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blogHadoop Admin Online Course
ReplyDeleteBest blog.
ReplyDeleteBig Data and Hadoop Online Training
Learn tableau classes online to master data visualization, dashboards, and analytics using real-world datasets.
ReplyDeleteEasyPayTax is a trusted tax service provider delivering efficient GST filing, income tax returns, and compliance solutions tailored for individuals, startups, and businesses.ITR filing services in Hyderabad
ReplyDeleteJoin OnlineITGuru to learn UI/UX design online. Create stunning interfaces, improve user experience, and build a strong portfolio for a digital design career.ui and ux training
ReplyDeleteLearn Java online with OnlineITGuru. From basics to advanced topics, master coding, OOP, databases, and build professional applications for your career.java programming online
ReplyDeleteMaster data modeling with OnlineITGuru. Design databases, create ER diagrams, apply normalization, and manage data efficiently for real-world applications.best data modelling courses
ReplyDeleteLearn DevOps online with OnlineITGuru. Implement CI/CD, automation, cloud deployment, and containerization to streamline software delivery and operations.best devops course
ReplyDelete"Enhance your career with our comprehensive salesforce developer training designed for beginners and professionals alike. Learn to build, customize, and manage Salesforce applications efficiently with expert guidance."
ReplyDeleteBoost your career with expert salesforce admin certification classes designed to equip you with essential skills and hands-on experience. Join now to master Salesforce administration and excel in your certification exam.
ReplyDelete"Enhance your data visualization skills with our comprehensive tableau course online Learn interactive dashboards, analytics, and real-world applications at your own pace."
ReplyDeleteEasyPayTax helps individuals and businesses file taxes quickly and accurately with expert guidance and simple online tools. tax filing services in Hyderabad
ReplyDeleteEasyPayTax delivers fast, reliable tax filing services online, ensuring compliance and maximizing your refunds with minimal effort. ITR filing services in Hyderabad
ReplyDeleteEasyPayTax makes tax filing easy and stress-free with expert support, secure online processing, and accurate return submissions.GST registration in Hyderabad
ReplyDelete"Enhance your career with salesforce dev training designed for beginners and professionals alike. Gain hands-on experience and master Salesforce development skills to excel in the tech industry."
ReplyDelete"Enhance your career with salesforce administrator course designed to build expertise in managing Salesforce platforms efficiently. Gain practical skills, hands-on experience, and certification readiness to excel as a Salesforce Administrator."
ReplyDelete