Friday, January 29, 2016

Apache Hive XML Data Analysis

Loading XML Data into Apache Hive

It is very interesting to do analysis of XML data using Apache Hive. Suppose we have large XML file and we need to process these data using Hadoop Ecosystem. Here I will explain about Loading of XML data into Apache Hive table.
















Here a sample XML is as following

Company.xml
<Employee>
<Name>Bill Gates</Name>
<Email>BILLGATES@Microsoft.Com</Email>
<Contact>XXXXXXXXXX</Contact>
</Employee>
<Employee>
<Name>Steve Jobs</Name>
<Email>SteveJobs@Apple.Com</Email>
<Contact> XXXXXXXXXX</Contact>
</Employee>
<Employee>
<Name>Iqubal Mustafa Kaki</Name>
<Email>IqubalMustafaKaki@SZIAS.Com</Email>
<Contact>8390900000</Contact>
</Employee>


Here we will load XML data into hive table by using XPATH() .

Thus we will explore XPATH() which will used for converting XML data to String Array.

Steps for loading XML Data into Hive Table

Step:1 Structured the XML, by executing below command

cat Company.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</Employee>|</Employee>\n|g'| grep -v '^\s*$' > company_structured_records.xml


Now we have company_structured_records.xml file for the further operation.

Step:2 Create a Hive table and load the XML file into the table

hive> create table CompanyXMLTablePOC(XMLDATA String);

hive> load data local inpath '/home/hadoopadmin/Desktop/company_structured_records.xml' into table CompanyXmlTablePOC;


Step:3 Convert the XML Data into Array Format Using XPATH()

hive> select xpath(xmldata, 'Employee/*/text()') from CompanyXmlTablePOC;

you will get the result result as below.

OK
["Bill Gates","BILLGATES@Microsoft.Com","XXXXXXXXXX"]
["Steve Jobs","SteveJobs@Apple.Com"," XXXXXXXXXX"]
["Iqubal Mustafa Kaki","IqubalMustafaKaki@SZIAS.Com","8390900000"]
Time taken: 1.503 seconds, Fetched: 3 row(s)




























Step:4  Create the HIVE Table and Insert Data from Above Created Table

hive> create table company AS select xpath_string(xmldata,'Employee/Name'),xpath_string(xmldata,'Employee/Email'),xpath_string(xmldata,'Employee/Contact')from companyxmltablepoc;







hive> select * from company;
OK
Bill Gates BILLGATES@Microsoft.Com XXXXXXXXXX
Steve Jobs SteveJobs@Apple.Com XXXXXXXXXX
Iqubal Mustafa Kaki IqubalMustafaKaki@SZIAS.Com 8390900000

Time taken: 5.616 seconds, Fetched: 3 row(s)





Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist

Want to connect with me
If you want to connect with me, please connect through my email - 
iqubal.kaki@gmail.com

10 comments:

  1. Nice post ! Thanks for sharing valuable information with us. Keep sharing..Hadoop Admin Online Training

    ReplyDelete
  2. really Good blog post. provided a helpful information. I hope that you will post more updates like this Big data hadoop online training Hyderabad

    ReplyDelete
  3. It's a great post! Thank you for sharing your knowledge to others, it was very informative and in depth one.
    Apache Pig Training in Electronic City

    ReplyDelete
  4. iOS app development training teaches learners how to create modern applications for iPhone and iPad devices. It explains development frameworks, coding techniques, and application structure clearly. This ios app development training helps students gain practical mobile development experience. Learners practice coding through exercises and assignments. Projects help them understand real-world application development. The training prepares learners for careers in iOS app development.

    ReplyDelete
  5. Tableau classes online focus on developing interactive dashboards and exploring business data efficiently. It explains filters, parameters, and calculated fields clearly. This tableau classes online enhances reporting and visualization expertise. Learners build dynamic worksheets. Case studies are included. Hands-on exercises are provided. It builds confident data professionals.

    ReplyDelete
  6. Very useful! Our ui design course online
    teaches user-centered design, visual hierarchy, interaction design, and practical tool usage for modern app and web interfaces.

    ReplyDelete
  7. Great insights! Enrolling in proper cpq training is essential, and this post clearly highlights its benefits and real-world applications.cpq training

    ReplyDelete
  8. Best ai and ml courses provide in-depth knowledge of artificial intelligence and machine learning technologies. They explain algorithms, deep learning, and data processing techniques clearly. These best ai and ml courses help learners gain practical skills through projects and assignments. Students explore real-world AI applications. The courses prepare learners for careers in AI and data science.

    ReplyDelete