TECHNICAL BLOGS - BIG DATA, HADOOP, JAVA: Apache Hive - Hadoop EcoSystem with Real Time Project

Apache Hive
Hive was developed by Facebook. In Facebook there were large pool of SQL versed engineers and they wanted to use these SQL knowledge developers rather than make all engineer learn Map Reduce.So what they did. They built an engine that will convert SQL query into Map Reduce code, so that how hive came into the picture. The Definition would be Hive is data warehouse infrastructure built on the top of HDFS. Thus everything will convert into Map Reduce code and will run onto the Hadoop EcoSystem through Hive shell.

The shell is the primary way that we will interact with Hive, by issuing commands in HiveQL. HiveQL is Hive’s query language, a dialect of SQL. It is heavily influenced by MySQL, so if you are familiar with MySQL, you should feel at home using Hive.

Limitations of Hive

Hive is not designed for Online transaction processing (OLTP), it is only used for the Online Analytical Processing.Hive supports overwriting or apprehending data, but not updates and deletes. In Hive, sub queries are not supported.

Installing Apache Hive

Download Apache Hive 1.2.0 from Apache Hive Link

Go to the specified location where you want to have hive installable

[hadoopadmin@kingsolomon ~]$ tar -xzf hive-1.2.0.tar.gz

Hive Configuration
Setup Environment Variable

Setup Environment Variables by editing bashrc file using Edit ~/.bashrc file and append following values at end of file. /home/hadoopadmin/.bashrc

export HIVE_HOME=/usr/HadoopEnv/hive-1.2.0

export PATH=$HIVE_HOME/bin

[root@kingsolomon ~]$ source ~/.bashrc

Edit Hive Configuration Files

[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/hive-1.2.0/conf

Rename these two files as following and made changes accordingly

1. Rename hive-env.sh.template to hive-env.sh

export HADOOP_HEAP_SIZE=2048

export HADOOP_HOME=/opt/hadoop/hadoop-2.6.0

export HIVE_CONF=/opt/hadoop/hive/conf

export HIVE_AUX_JARS_PATH=/opt/hadoop/hive/lib

2. Rename hive-default.xml.template to hive-site.xml

Create /tmp and /usr/HadoopEnv/hive-1.2.0/warehouse and set them chmod g+w in HDFS before create a table in Hive. Use the following commands.

$ cd /usr/HadoopEnv/hive-1.2.0

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp

$HADOOP_HOME/bin/hadoop fs -mkdir /usr/HadoopEnv/hive-1.2.0/warehouse

$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp

$HADOOP_HOME/bin/hadoop fs -chmod g+w /usr/HadoopEnv/hive-1.2.0/warehouse

Then made changes in the hive-site.xml as following

<name>hive.metastore.warehouse.dir</name>

<value>/usr/HadoopEnv/hive-1.2.0/warehouse</value>

<description>location of default database for the warehouse</description>

</property>

Start Apache Hive

Execute hive command from Unix shell

[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/hive-1.2.0/bin

[hadoopadmin@kingsolomon ~]$ hive

Test Apache Hive Shell

hive> show databases;

It will all the databases in the Hive. By default there is default database exist.

hive> use hadoopdb;

Create Table from Stock Exchange Data Set File

First of all copy the stock exchange file into the HDFS using below commands

Suppose Stock Exchange file small-stocks-dataset is in the stock folder then

[hadoopadmin@kingsolomon ~]$ hadoop fs –mkdir -p /usr/HadoopEnv/Project_Data/StockEx

[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/Project_Data/stock

Copy small-stocks-dataset file into the HDFS

[hadoopadmin@kingsolomon stock]$ hadoop fs -copyFromLocal small-stocks-dataset /usr/HadoopEnv/Project_Data/StockEx

Then execute hive command from Unix Shell

### CREATE EXTERNAL TABLE ###

hive> CREATE EXTERNAL TABLE IF NOT EXISTS STOCK_EXCHANGE (

EXCNGE STRING,

SYMBOL STRING,

YEAR_MON_DAY STRING,

PRICE_OPEN FLOAT,

PRICE_HIGH FLOAT,

PRICE_LOW FLOAT,

PRICE_CLOSE FLOAT,

VOLUME INT,

PRICE_ADJ_CLOSE FLOAT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LOCATION '/usr/HadoopEnv/Project_Data/StockEx/small-stocks-dataset';

### CALCULATE MAX CLOSING PRICE ###

hive> SELECT SYMBOL , MAX(PRICE_CLOSE) MAX_CLOSE FROM STOCK_EXCHANGE

GROUP BY SYMBOL ;

Author: Iqubal Mustafa Kaki, Technical Specialist.

Want to connect with me
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com

TECHNICAL BLOGS - BIG DATA, HADOOP, JAVA

Sunday, November 22, 2015

Apache Hive - Hadoop EcoSystem with Real Time Project