Apache Hive
Hive was developed by Facebook. In Facebook there were large pool of SQL versed engineers and they wanted to use these SQL knowledge developers rather than make all engineer learn Map Reduce.So what they did. They built an engine that will convert SQL query into Map Reduce code, so that how hive came into the picture. The Definition would be Hive is data warehouse infrastructure built on the top of HDFS. Thus everything will convert into Map Reduce code and will run onto the Hadoop EcoSystem through Hive shell.
The shell is the primary way that we will interact with Hive, by issuing commands in HiveQL. HiveQL is Hive’s query language, a dialect of SQL. It is heavily influenced by MySQL, so if you are familiar with MySQL, you should feel at home using Hive.
[root@kingsolomon ~]$ source ~/.bashrc
Edit Hive Configuration
Files
Rename these two files as following and made changes accordingly
export HADOOP_HEAP_SIZE=2048
2. Rename hive-default.xml.template to hive-site.xml
Create /tmp and /usr/HadoopEnv/hive-1.2.0/warehouse and set them chmod g+w in HDFS before create a table in Hive. Use the following commands.
Start Apache Hive
Execute hive command from
Unix shell
Suppose Stock Exchange file small-stocks-dataset is in the stock folder then
Hive was developed by Facebook. In Facebook there were large pool of SQL versed engineers and they wanted to use these SQL knowledge developers rather than make all engineer learn Map Reduce.So what they did. They built an engine that will convert SQL query into Map Reduce code, so that how hive came into the picture. The Definition would be Hive is data warehouse infrastructure built on the top of HDFS. Thus everything will convert into Map Reduce code and will run onto the Hadoop EcoSystem through Hive shell.
The shell is the primary way that we will interact with Hive, by issuing commands in HiveQL. HiveQL is Hive’s query language, a dialect of SQL. It is heavily influenced by MySQL, so if you are familiar with MySQL, you should feel at home using Hive.
Limitations of Hive
Hive is not designed for Online transaction processing (OLTP), it is only used for the Online Analytical Processing.Hive supports overwriting or apprehending data, but not updates and deletes. In Hive, sub queries are not supported.
Installing Apache Hive
Download Apache Hive 1.2.0 from Apache Hive Link
Go to the specified location where you want to have hive installable
[hadoopadmin@kingsolomon ~]$ tar -xzf hive-1.2.0.tar.gz
Hive Configuration
Setup Environment Variable
Setup Environment Variable
Setup Environment Variables by editing
bashrc file using Edit ~/.bashrc file and append following values at
end of file. /home/hadoopadmin/.bashrc
export HIVE_HOME=/usr/HadoopEnv/hive-1.2.0
export PATH=$HIVE_HOME/bin
[root@kingsolomon ~]$ source ~/.bashrc
Edit Hive Configuration
Files
[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/hive-1.2.0/conf
Rename these two files as following and made changes accordingly
1. Rename hive-env.sh.template to hive-env.sh
export HADOOP_HEAP_SIZE=2048
export
HADOOP_HOME=/opt/hadoop/hadoop-2.6.0
export
HIVE_CONF=/opt/hadoop/hive/conf
export
HIVE_AUX_JARS_PATH=/opt/hadoop/hive/lib
2. Rename hive-default.xml.template to hive-site.xml
Create /tmp and /usr/HadoopEnv/hive-1.2.0/warehouse and set them chmod g+w in HDFS before create a table in Hive. Use the following commands.
$ cd /usr/HadoopEnv/hive-1.2.0
$HADOOP_HOME/bin/hadoop fs
-mkdir /tmp
$HADOOP_HOME/bin/hadoop fs
-mkdir /usr/HadoopEnv/hive-1.2.0/warehouse
$HADOOP_HOME/bin/hadoop fs
-chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /usr/HadoopEnv/hive-1.2.0/warehouse
Then made changes in the
hive-site.xml as following
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/HadoopEnv/hive-1.2.0/warehouse</value>
<description>location of default database for the
warehouse</description>
</property>
Start Apache Hive
Execute hive command from
Unix shell
[hadoopadmin@kingsolomon
~]$ cd /usr/HadoopEnv/hive-1.2.0/bin
[hadoopadmin@kingsolomon
~]$ hive
Test Apache Hive Shell
hive> show databases;
It will all the databases in the Hive. By default there is default database exist.
hive> use hadoopdb;
Create Table from Stock Exchange Data Set File
First of all copy the stock exchange file into the HDFS using below commands
Suppose Stock Exchange file small-stocks-dataset is in the stock folder then
[hadoopadmin@kingsolomon ~]$ hadoop fs –mkdir -p /usr/HadoopEnv/Project_Data/StockEx
[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/Project_Data/stock
Copy small-stocks-dataset file into the HDFS
[hadoopadmin@kingsolomon stock]$ hadoop fs -copyFromLocal small-stocks-dataset /usr/HadoopEnv/Project_Data/StockEx
Then execute hive command from Unix Shell
### CREATE EXTERNAL TABLE ###
hive> CREATE EXTERNAL TABLE IF NOT EXISTS STOCK_EXCHANGE (
EXCNGE STRING,
SYMBOL STRING,
YEAR_MON_DAY STRING,
PRICE_OPEN FLOAT,
PRICE_HIGH FLOAT,
PRICE_LOW FLOAT,
PRICE_CLOSE FLOAT,
VOLUME INT,
PRICE_ADJ_CLOSE FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/usr/HadoopEnv/Project_Data/StockEx/small-stocks-dataset';
### CALCULATE MAX CLOSING PRICE ###
hive> SELECT SYMBOL , MAX(PRICE_CLOSE) MAX_CLOSE FROM STOCK_EXCHANGE
GROUP BY SYMBOL ;
Author: Iqubal Mustafa Kaki, Technical Specialist.
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Thanks for sharing the very useful info about hadoop and please keep updating......
ReplyDeleteHadoop Admin Online Training
Hadoop Admin Online Training Bangalore
awesome post presented by you..your writing style is fabulous and keep update with your blogsHadoop Admin Online Course Hyderabad
ReplyDeleteI really enjoy the blog.Much thanks again. Really Great salesforce Online course Bangalore.
ReplyDeleteBest blog ever.
ReplyDeleteBig Data and Hadoop Online Training