Wednesday, February 10, 2016

Apache Pig - Maximum Average Salary Analytics

Apache Pig - Maximum Average Salary Analytics

Here I am having large dummy data-set of employee. I will explain you how anybody can do the maximum average salary analysis of this data-set by using Apache Pig.



















Salary Data Analytics


Please download the dummy data-set from Employee Dummy Data

Here I am assuming Apache Hadoop and Pig is up running onto your machine.

Create Directory and Load employee_dataset Data into the HDFS

Navigate to dummy data file directory

$ hadoop fs -mkdir /user/training/PIGDATA

$ hadoop fs -mkdir /user/training/PIGDATA/PIG_UDF_DATA

$ hadoop fs -copyFromLocal employee_dataset /user/training/PIGDATA/PIG_UDF_DATA

$ pig -x mapreduce



Processing Data using Apache Pig

Step: 1 - Load dataset with column names and datatypes

grunt> employeeData = LOAD '/user/training/PIGDATA/PIG_UDF_DATA/employee_dataset'  using  PigStorage(',') AS (emp_id:int, emp_name:chararray, job_title:chararray, dept_id:int, salary:float);

Step: 2 - Group records by department

grunt> group_by_dept = GROUP employeeData BY dept_id;

Step: 3 - Calculate average salary by department

grunt> average_sal = FOREACH group_by_dept GENERATE group, AVG(employeeData.salary) AS avgsalary;

Step: 4 - Sort salary in decreasing order and select the top 1 to get the max salary

grunt> sorted_avg_sal = ORDER average_sal BY avgsalary desc;

grunt> avg_max_sal_analytic = LIMIT sorted_avg_sal 1;

Step: 5 - Store results to HDFS

grunt> STORE avg_max_sal_analytic INTO '/home/training/Desktop/output/pig/topdepartment';




Output :

Input(s):
Successfully read 10293 records (535750 bytes) from: "/user/training/PIGDATA/PIG_UDF_DATA/employee_dataset"
Output(s):
Successfully stored 1 records (23 bytes) in: "/home/training/Desktop/output/pig/topdepartment"
Counters:
Total records written : 1
Total bytes written : 23
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201601290600_0017   ->      job_201601290600_0018,
job_201601290600_0018   ->      job_201601290600_0019,
job_201601290600_0019
2016-02-04 05:09:41,199 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2016-02-04 05:09:41,199 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

Step: 6 - Check the output

grunt> cat /home/training/Desktop/output/pig/topdepartment

Output:

77      9.99999986991104E14
























Hope you have enjoyed the article.

Author: Iqubal Mustafa Kaki, Technical Specialist

Want to connect with me
If you want to connect with me, please connect through my email - 
iqubal.kaki@gmail.com

16 comments:

  1. After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blogHadoop Admin Online Course

    ReplyDelete
  2. Learn tableau classes online to master data visualization, dashboards, and analytics using real-world datasets.

    ReplyDelete
  3. EasyPayTax is a trusted tax service provider delivering efficient GST filing, income tax returns, and compliance solutions tailored for individuals, startups, and businesses.ITR filing services in Hyderabad

    ReplyDelete
  4. Join OnlineITGuru to learn UI/UX design online. Create stunning interfaces, improve user experience, and build a strong portfolio for a digital design career.ui and ux training

    ReplyDelete
  5. Learn Java online with OnlineITGuru. From basics to advanced topics, master coding, OOP, databases, and build professional applications for your career.java programming online

    ReplyDelete
  6. Master data modeling with OnlineITGuru. Design databases, create ER diagrams, apply normalization, and manage data efficiently for real-world applications.best data modelling courses

    ReplyDelete
  7. Learn DevOps online with OnlineITGuru. Implement CI/CD, automation, cloud deployment, and containerization to streamline software delivery and operations.best devops course

    ReplyDelete
  8. "Enhance your career with our comprehensive salesforce developer training designed for beginners and professionals alike. Learn to build, customize, and manage Salesforce applications efficiently with expert guidance."

    ReplyDelete
  9. Boost your career with expert salesforce admin certification classes designed to equip you with essential skills and hands-on experience. Join now to master Salesforce administration and excel in your certification exam.

    ReplyDelete
  10. "Enhance your data visualization skills with our comprehensive tableau course online Learn interactive dashboards, analytics, and real-world applications at your own pace."

    ReplyDelete
  11. EasyPayTax helps individuals and businesses file taxes quickly and accurately with expert guidance and simple online tools. tax filing services in Hyderabad

    ReplyDelete
  12. EasyPayTax delivers fast, reliable tax filing services online, ensuring compliance and maximizing your refunds with minimal effort. ITR filing services in Hyderabad

    ReplyDelete
  13. EasyPayTax makes tax filing easy and stress-free with expert support, secure online processing, and accurate return submissions.GST registration in Hyderabad

    ReplyDelete
  14. "Enhance your career with salesforce dev training designed for beginners and professionals alike. Gain hands-on experience and master Salesforce development skills to excel in the tech industry."

    ReplyDelete
  15. "Enhance your career with salesforce administrator course designed to build expertise in managing Salesforce platforms efficiently. Gain practical skills, hands-on experience, and certification readiness to excel as a Salesforce Administrator."

    ReplyDelete