Saturday, November 28, 2015

Apache Sqoop : Real Time Importing data from MySQL into Apache Hive using Sqoop

Apache Sqoop
Sqoop is an Apache project that is part of the broader Hadoop EcoSystem. It built on top of MapReduce and take advantage of its parallelism and fault tolerance. Sqoop was designed to move data from and into relational databases using JDBC driver to connect.Its functionality is extensive. This real time example will show how to use Sqoop to import data from MySQL to Hive using real time data. Hope you will enjoy !!!



















Installing Apache Sqoop

Download Apache Sqoop 1.4.5 from Apache Sqoop 1.4.5 Link

Go to the specified location where you want to have hive installable(Here is /usr/lib/)
[hadoopadmin@kingsolomon ~]$ tar -xzf sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
[hadoopadmin@kingsolomon ~]$ mv sqoop-1.4.5.bin__hadoop-2.0.4-alpha sqoop

Sqoop Configuration

Setup Environment Variable
Setup Environment Variables by editing bashrc file using Edit ~/.bashrc file and append following values at end of file. /home/hadoopadmin/.bashrc

export SQOOP_HOME=/usr/lib/sqoop
export PATH=$SQOOP_HOME/bin

[root@kingsolomon ~]$ source ~/.bashrc

Edit Sqoop Configuration Files

[hadoopadmin@kingsolomon ~]$ cd /usr/lib/sqoop/conf


Rename these two files as following and made changes accordingly

1. Rename sqoop-env.template.cmd to sqoop-env.sh


export HADOOP_COMMON_HOME=/usr/HadoopEnv/hadoop-2.6.0
export HADOOP_MAPRED_HOME=/usr/HadoopEnv/hadoop-2.6.0
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk

2. Download and Copy MySQL connector JAR inside SQOOP Library Folder /usr/lib/sqoop/lib
copy mysql-connector-java-5.1.29 inside sqoop lib folder then execute below query

Installing MySQL

[hadoopadmin@localhos desktop]$ su
Password : put your password for root user

[root@localhost Desktop]# yum -y install mysql-server

[root@localhost Desktop]# service mysqld start

[root@localhost Desktop]# /usr/bin/mysql_secure_installation


By default, a MySQL installation has an anonymous user,
allowing anyone
to log into MySQL without having to have
a user account created for
them.  This is intended only
for testing, and to make the installation
go a bit smoother. 
You should remove them before moving into a
production environment.

then set root password.

Remove anonymous users? [Y/n] y
Disallow root login remotely? [Y/n] n

Remove test database and access to it? [Y/n] n

Reload privilege tables now? [Y/n] y


Thanks for using MySQL!

Start MySQL Server
[root@localhost Desktop]# service mysqld start --- Start mysql server
[root@localhost Desktop]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. 

Create Database & Table using below commands

mysql>create database hadoopdb;



















mysql>use hadoopdb;

mysql>CREATE TABLE iqubalcontact (
  contact_id int(11) NOT NULL AUTO_INCREMENT,
  name varchar(45) NOT NULL,
  email varchar(45) NOT NULL,
  address varchar(45) NOT NULL,
  telephone varchar(45) NOT NULL,
  PRIMARY KEY (contact_id)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8;

INSERT INTO iqubalcontact (name,email,address,telephone)
                       VALUES
                       ("Steve Job", "Steve.Jobs@apple.com","CA","8390900000");
INSERT INTO iqubalcontact (name,email,address,telephone)
                       VALUES
                       ("Bill Gates", "Bill.Gates@microsoft.com","CA","8390900000");
INSERT INTO iqubalcontact (name,email,address,telephone)
                       VALUES
                       ("Iqubal M Kaki", "Iqubal.MKaki@szias.com","INDIA","8390900000");



















Now Create same database inside Hive by using Hive shell(For Hive Installation & configuration, please go through my previous blog on Apache Hive). 

Note : Be ensure that Hadoop is up and running by using JPS command and don't close mysql unix shell terminal. Open another unix terminal to perform hive task.

[hadoopadmin@kingsolomon ~]$ hive
hive>  create database hadoodb;


















Importing data from MySQL into Hive using Sqoop

Run the below Sqoop command to import mysql table data to Hive
[hadoopadmin@kingsolomon ~]$ sqoop import --connect jdbc:mysql://localhost/hadoopdb \
   --username root -P \
   --table iqubalcontact \
   --hive-import \
   --hive-table hadoopdb.iqubalcontact -m 1;


















After some time it will successfully import MySQL table data into Hive


















Test Hive Import from Hive Shell

[hadoopadmin@kingsolomon ~]$ hive
hive> use hadoopdb;
hive> select * from iqubalcontact;


















Hope you have enjoyed the article which explore real time example.

Author: Iqubal Mustafa Kaki, Technical Specialist.

Want to connect with me
If you want to connect with me, please connect through my email - 
iqubal.kaki@gmail.com

Sunday, November 22, 2015

Apache Hive - Hadoop EcoSystem with Real Time Project

Apache Hive
Hive was developed by Facebook. In Facebook there were large pool of SQL versed engineers and they wanted to use these SQL knowledge developers rather than make all engineer learn Map Reduce.So what they did. They built an engine that will convert SQL query into Map Reduce code, so that how hive came into the picture. The Definition would be Hive is data warehouse infrastructure built on the top of HDFS. Thus everything will convert into Map Reduce code and will run onto the Hadoop EcoSystem through Hive shell.





















The shell is the primary way that we will interact with Hive, by issuing commands in HiveQL. HiveQL is Hive’s query language, a dialect of SQL. It is heavily influenced by MySQL, so if you are familiar with MySQL, you should feel at home using Hive.

Limitations of Hive

Hive is not designed for Online transaction processing (OLTP), it is only used for the Online Analytical Processing.Hive supports overwriting or apprehending data, but not updates and deletes. In Hive, sub queries are not supported.

Installing Apache Hive

Download Apache Hive 1.2.0 from Apache Hive Link

Go to the specified location where you want to have hive installable
[hadoopadmin@kingsolomon ~]$ tar -xzf hive-1.2.0.tar.gz


Hive Configuration
Setup Environment Variable
Setup Environment Variables by editing bashrc file using Edit ~/.bashrc file and append following values at end of file. /home/hadoopadmin/.bashrc

export HIVE_HOME=/usr/HadoopEnv/hive-1.2.0
export PATH=$HIVE_HOME/bin

[root@kingsolomon ~]$ source ~/.bashrc

Edit Hive Configuration Files

[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/hive-1.2.0/conf


Rename these two files as following and made changes accordingly

1. Rename hive-env.sh.template to hive-env.sh

export HADOOP_HEAP_SIZE=2048
export HADOOP_HOME=/opt/hadoop/hadoop-2.6.0
export HIVE_CONF=/opt/hadoop/hive/conf
export HIVE_AUX_JARS_PATH=/opt/hadoop/hive/lib

2. Rename hive-default.xml.template to hive-site.xml

Create /tmp and /usr/HadoopEnv/hive-1.2.0/warehouse and set them chmod g+w in HDFS before create a table in Hive. Use the following commands.

$ cd /usr/HadoopEnv/hive-1.2.0
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /usr/HadoopEnv/hive-1.2.0/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /usr/HadoopEnv/hive-1.2.0/warehouse

Then made changes in the hive-site.xml as following

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/usr/HadoopEnv/hive-1.2.0/warehouse</value>
    <description>location of default database for the warehouse</description>
</property>

Start Apache Hive

Execute hive command from Unix shell

[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/hive-1.2.0/bin
[hadoopadmin@kingsolomon ~]$ hive













Test Apache Hive Shell

hive> show databases;

It will all the databases in the Hive. By default there is default database exist.

















hive> use hadoopdb;

Create Table from Stock Exchange Data Set File

First of all copy the stock exchange file into the HDFS using below commands












Suppose Stock Exchange file small-stocks-dataset is in the stock folder then

[hadoopadmin@kingsolomon ~]$ hadoop fs –mkdir -p /usr/HadoopEnv/Project_Data/StockEx
[hadoopadmin@kingsolomon ~]$ cd /usr/HadoopEnv/Project_Data/stock 

Copy small-stocks-dataset file into the HDFS

[hadoopadmin@kingsolomon stock]$ hadoop fs -copyFromLocal small-stocks-dataset /usr/HadoopEnv/Project_Data/StockEx

Then execute hive command from Unix Shell

### CREATE EXTERNAL TABLE ###

hive> CREATE EXTERNAL TABLE IF NOT EXISTS STOCK_EXCHANGE (
EXCNGE STRING,
SYMBOL STRING,
YEAR_MON_DAY STRING,
PRICE_OPEN FLOAT,
PRICE_HIGH FLOAT,
PRICE_LOW FLOAT,
PRICE_CLOSE FLOAT,
VOLUME INT,
PRICE_ADJ_CLOSE FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/usr/HadoopEnv/Project_Data/StockEx/small-stocks-dataset';

### CALCULATE MAX CLOSING PRICE ###
hive> SELECT SYMBOL , MAX(PRICE_CLOSE) MAX_CLOSE FROM STOCK_EXCHANGE
GROUP BY SYMBOL ;

Author: Iqubal Mustafa Kaki, Technical Specialist.

Want to connect with me
If you want to connect with me, please connect through my email - 
iqubal.kaki@gmail.com