Apache Oozie - Big Data Workflow Engine
As you know Hadoop is used for handling huge data set. Sometime one job
may not be sufficient for doing all required task. Better you go more number of
jobs and the get output of one job as input for other job. Finally when N number
of job completed you will get the appropriate solution. Also there should be a
co-ordination between these N numbers of jobs. Thus Oozie takes care the co-ordination
between these job. Its co-ordinate these job and take output from job and
passing this as input to another job. Thus you can say Oozie is a workflow scheduler engine which co-ordinate between N number of Job as defined in its configuration XML file. For running Oozie the BOOTSTRAP service should be available.
Here in the above diagram, there are three jobs A, B & C. The output of Job A is the input of Job B. Similarly the output of Job B is the input of Job C.
Thus, Anyone can define Apache Oozie as following.
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie Workflow jobs are Directed Acyclical Graphs (DAG) of actions.
Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Oozie is a scalable, reliable and extensible system.
The backbone of Oozie configuration is workflow XML file. The workflow.xml file will
have all configuration of N number of the jobs. Basically it is having two node. First one is control node and second one is action node. Control node has the job list and the sequence of the execution. Action nodes will have Job's action details. Thus one job will have one action tag for the same. There will be one control tag but N number of action tags that depends on the number of jobs you are performing.
Installation and Configuration
Oozie Installation on CentOS 6
We use official CDH
repository from cloudera’s site to install CDH4. Go to official CDH download section
and download CDH4 (i.e. 4.6) version or you can also use following wget command to download the
repository and install it.
For OS 32 Bit
$ wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm
$ yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
For OS 64 Bit
$ wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
$ yum --nogpgcheck localinstall
cloudera-cdh-4-0.x86_64.rpm
After adding CDH repository,
execute below commands to install Oozie onto the machine
[root@kingsolomon
~]# yum install oozie
However above command should
cover Oozie client installation. If not the execute below command to install Oozie
client.
[root@kingsolomon
~]# yum install oozie-client
Oozie has been installed onto the machine. Now we will configure the Oozie onto the machine.
Oozie Configuration on CentOS 6
As Oozie does not directly interact with Hadoop, we need to
do the configuration accordingly.
Note : Please configure all the settings when Oozie is not up
and running.
Oozie has ‘Derby ‘ as default built in DB however, I would recommend to use MySQL.
[root@kingsolomon ~]$ mysql -u root -p
Enter password:
Welcome to the
MySQL monitor. Commands end with ; or
\g.
Your MySQL
connection id is 3
Server version:
5.5.38 MySQL Community Server (GPL)
Copyright (c)
2000, 2014, Oracle and/or its affiliates. All rights reserved.
Oracle is a
registered trademark of Oracle Corporation and/or its
affiliates.
Other names may be trademarks of their respective
owners.
Type 'help;' or
'\h' for help. Type '\c' to clear the current input statement.
mysql> create database oozie;
Query OK, 1 row
affected (0.00 sec)
mysql> grant all privileges on oozie.* to
'oozie'@'localhost' identified by 'oozie';
Query OK, 0
rows affected (0.00 sec)
mysql> grant all privileges on oozie.* to
'oozie'@'%' identified by 'oozie';
Query OK, 0
rows affected (0.00 sec)
mysql> exit
Bye
|
Now Configure MySQL related configuration into oozie-site.xml file
[root@kingsolomon ~]# cd /etc/oozie/conf
[root@kingsolomon ~]# vi oozie-site.xml
Add the following properties.
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://master:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>root</value>
</property>
|
Download and add the MySQL
JDBC connectivity driver JAR to
Oozie lib directory.
[root@kingsolomon Downloads]# cp mysql-connector-java-5.1.31-bin.jar
/var/lib/oozie/
Create Oozie database schema by executing below commands
[root@kingsolomon ~]# sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh
create -run
Sample Output will be
setting
OOZIE_CONFIG=/etc/oozie/conf
setting
OOZIE_DATA=/var/lib/oozie
setting
OOZIE_LOG=/var/log/oozie
setting
OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
setting
CATALINA_TMPDIR=/var/lib/oozie
setting
CATALINA_PID=/var/run/oozie/oozie.pid
setting
CATALINA_BASE=/usr/lib/oozie/oozie-server-0.20
setting
CATALINA_OPTS=-Xmx1024m
setting
OOZIE_HTTPS_PORT=11443
setting
OOZIE_HTTPS_KEYSTORE_PASS=password
......
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Set MySQL MEDIUMTEXT flag
DONE
Oozie DB has been created for Oozie version
'3.3.2-cdh4.7.1'
The SQL commands have been written to:
/tmp/ooziedb-3060871175627729254.sql
Download ExtJS lib then extract
the contents of the file to /var/lib/oozie/ on the same host as the Oozie
Server.
[root@kingsolomon Downloads]# unzip ext-2.2.zip
[root@kingsolomon Downloads]# mv ext-2.2 /var/lib/oozie/
Make sure MySQL service is up and
running before executing Oozie start command.
[root@kingsolomon
Desktop]# service mysqld start
Starting mysqld:
[ OK ]
Start Oozie server by executing
below commands.
[root@kingsolomon
Desktop]# service oozie start
Sample Output will be
Setting OOZIE_HOME: /usr/lib/oozie
Sourcing:
/usr/lib/oozie/bin/oozie-env.sh
setting
OOZIE_CONFIG=/etc/oozie/conf
setting
OOZIE_DATA=/var/lib/oozie
setting
OOZIE_LOG=/var/log/oozie
setting
OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
setting
CATALINA_TMPDIR=/var/lib/oozie
setting
CATALINA_PID=/var/run/oozie/oozie.pid
setting
CATALINA_BASE=/usr/lib/oozie/oozie-server-0.20
setting
CATALINA_OPTS=-Xmx1024m
setting
OOZIE_HTTPS_PORT=11443
setting
OOZIE_HTTPS_KEYSTORE_PASS=password
....
Using CATALINA_BASE:
/usr/lib/oozie/oozie-server-0.20
Using CATALINA_HOME:
/usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/lib/oozie
Using JRE_HOME:
/usr/lib/jvm/java-openjdk
Using CLASSPATH:
/usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:
/var/run/oozie/oozie.pid
Verify the Oozie Server Status
[root@kingsolomon Desktop]# service oozie status
running
[root@kingsolomon
Desktop]# oozie admin -oozie
http://kingsolomon:11000/oozie -status
System
mode: NORMAL
Hope
you have enjoyed the article.
Author : Iqubal
Mustafa Kaki, Technical
Specialist
If you want to connect with me, please connect through my email - iqubal.kaki@gmail.com
Very informative ..i suggest this blog to my friends..Thank you for sharing
ReplyDeleteHadoop Admin Online Training Bangalore
It is nice blog Thank you porovide importent information and i am searching for same information to save my time.Big data hadoop online course India
ReplyDeleteVery Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
ReplyDeletevery informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training
ReplyDeleteAt this time, it seems like Word Press is the preferred blogging platform available right now. (from what I’ve read) Is that what you’re using on your blog? Great post, however, I was wondering if you could write a little more on this subject?
ReplyDeleteAuthorized macbook pro service center in Chennai | Macbook pro service center in chennai | Authorized iMac service center in Chennai | iMac service center in chennai | Mac service center in chennai | iphone unlocking service in chennai
thanks for sharing this information
ReplyDeletebest python training institute in bangalore
python training in jayanagar bangalore
Artificial Intelligence training in Bangalore
data science with python training in Bangalore
Machine Learning training in bangalore
Qlik Sense Training in Bangalore
Qlikview Training in Bangalore
Excellent information with unique content and it is very useful to know about the datascience with python.datascience with python training in bangalore
ReplyDeleteAs the growth of Big data platform managed service , it is essential to spread knowledge in people. This meetup will work as a burst of awareness.
ReplyDeleteBest blogger.
ReplyDeleteBig Data and Hadoop Online Training
Very nice post,keep sharing more posts with us.
ReplyDeletethank you...
big data and hadoop online training
hadoop administration training
nice post.
ReplyDeleteDynamic CRM Training
Ethical hacking training
Informatica data Quality Training
The content of this website was really informative. 50 High Quality for just 50 INR
ReplyDelete2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest
Thank you ever so for you article. Really Cool.
ReplyDeleteMuleSoft training
python training
Angular js training
selenium trainings
sql server dba training