Loading XML Data into Apache Hive
It is very interesting to do analysis of XML data using Apache Hive. Suppose we have large XML file and we need to process these data using Hadoop Ecosystem. Here I will explain about Loading of XML data into Apache Hive table.
Here a sample XML is as following
Company.xml
Step:4 Create the HIVE Table and Insert Data from Above Created Table
hive> create table company AS select xpath_string(xmldata,'Employee/Name'),xpath_string(xmldata,'Employee/Email'),xpath_string(xmldata,'Employee/Contact')from companyxmltablepoc;
hive> select * from company;
OK
Bill Gates BILLGATES@Microsoft.Com XXXXXXXXXX
Steve Jobs SteveJobs@Apple.Com XXXXXXXXXX
Iqubal Mustafa Kaki IqubalMustafaKaki@SZIAS.Com 8390900000
Time taken: 5.616 seconds, Fetched: 3 row(s)
Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist
It is very interesting to do analysis of XML data using Apache Hive. Suppose we have large XML file and we need to process these data using Hadoop Ecosystem. Here I will explain about Loading of XML data into Apache Hive table.
Here a sample XML is as following
Company.xml
<Employee>
<Name>Bill Gates</Name>
<Email>BILLGATES@Microsoft.Com</Email>
<Contact>XXXXXXXXXX</Contact>
</Employee>
<Employee>
<Name>Steve Jobs</Name>
<Email>SteveJobs@Apple.Com</Email>
<Contact> XXXXXXXXXX</Contact>
</Employee>
<Employee>
<Name>Iqubal Mustafa Kaki</Name>
<Email>IqubalMustafaKaki@SZIAS.Com</Email>
<Contact>8390900000</Contact>
</Employee>
|
Here we will load XML data into hive
table by using XPATH() .
Thus we will explore XPATH() which will used for converting XML data to String Array.
Steps
for loading XML Data into Hive Table
Step:1 Structured the XML, by executing below command
$ cat Company.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</Employee>|</Employee>\n|g'| grep -v '^\s*$' > company_structured_records.xml
Now we have company_structured_records.xml file for the further operation.
Step:2 Create a Hive table and load the XML file into the table
hive> create table CompanyXMLTablePOC(XMLDATA String);
hive> load data local inpath '/home/hadoopadmin/Desktop/company_structured_records.xml' into table CompanyXmlTablePOC;
Step:3 Convert the XML Data into Array Format Using XPATH()
hive> select xpath(xmldata, 'Employee/*/text()') from CompanyXmlTablePOC;
you will get the result result as below.
OK
["Bill Gates","BILLGATES@Microsoft.Com","XXXXXXXXXX"]
["Steve Jobs","SteveJobs@Apple.Com"," XXXXXXXXXX"]
["Iqubal Mustafa Kaki","IqubalMustafaKaki@SZIAS.Com","8390900000"]
Time taken: 1.503 seconds, Fetched: 3 row(s)
$ cat Company.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</Employee>|</Employee>\n|g'| grep -v '^\s*$' > company_structured_records.xml
Now we have company_structured_records.xml file for the further operation.
Step:2 Create a Hive table and load the XML file into the table
hive> create table CompanyXMLTablePOC(XMLDATA String);
hive> load data local inpath '/home/hadoopadmin/Desktop/company_structured_records.xml' into table CompanyXmlTablePOC;
Step:3 Convert the XML Data into Array Format Using XPATH()
hive> select xpath(xmldata, 'Employee/*/text()') from CompanyXmlTablePOC;
you will get the result result as below.
OK
["Bill Gates","BILLGATES@Microsoft.Com","XXXXXXXXXX"]
["Steve Jobs","SteveJobs@Apple.Com"," XXXXXXXXXX"]
["Iqubal Mustafa Kaki","IqubalMustafaKaki@SZIAS.Com","8390900000"]
Time taken: 1.503 seconds, Fetched: 3 row(s)
Step:4 Create the HIVE Table and Insert Data from Above Created Table
hive> create table company AS select xpath_string(xmldata,'Employee/Name'),xpath_string(xmldata,'Employee/Email'),xpath_string(xmldata,'Employee/Contact')from companyxmltablepoc;
hive> select * from company;
OK
Bill Gates BILLGATES@Microsoft.Com XXXXXXXXXX
Steve Jobs SteveJobs@Apple.Com XXXXXXXXXX
Iqubal Mustafa Kaki IqubalMustafaKaki@SZIAS.Com 8390900000
Time taken: 5.616 seconds, Fetched: 3 row(s)
Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Nice post ! Thanks for sharing valuable information with us. Keep sharing..Hadoop Admin Online Training
ReplyDeletereally Good blog post. provided a helpful information. I hope that you will post more updates like this Big data hadoop online training Hyderabad
ReplyDeleteIt's a great post! Thank you for sharing your knowledge to others, it was very informative and in depth one.
ReplyDeleteApache Pig Training in Electronic City
Nice Article,keep sharing more articles with us.
ReplyDeletekeep updating.
big data online training
hadoop admin course
Say, you got a nice article post.Really thank you! Really Great.
ReplyDeletedata science online free
Best Data Science Online Training