Friday, February 19, 2016

Java API for Invoking Oozie Workflows

Java API for Invoking Oozie Workflows

Java Client API for the integration of Apache Oozie and Java applications. Here I will explain about Java API for interacting/invoking Apache Oozie Jobs.  
















There are two approaches for writing Java code for invoking Oozie jobs.

- Oozie Client Java API
- Local Oozie Client Java API

You can download below explained project and related configuration files from the link ExecuteOozieJobUsingJavaAPI  or from GitHub download link


Oozie Client Java API

This approach will explain how to submit an Oozie job using the Java Client API.

Below code will elaborate the step for writing Apache Oozie Client.

import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.WorkflowJob;
.
import java.util.Properties;
.
    ...
.
    // get a OozieClient for the Oozie
    OozieClient wc = new OozieClient("http://kingsolomon:8080/oozie");
.
    // create a workflow job configuration and set the workflow application path
    Properties conf = wc.createConfiguration();
    conf.setProperty(OozieClient.APP_PATH, "hdfs://kingsolomon:9000/myapp");
.
    // setting workflow parameters
    conf.setProperty("jobTracker", "kingsolomon:9001");
    conf.setProperty("inputDir", "/myapp/inputdir");
    conf.setProperty("outputDir", "/myapp/outputdir");
    ...
.
    // submit and start the workflow job
    String jobId = wc.run(conf);
    System.out.println("Workflow job submitted");
.
    // wait until the workflow job finishes printing the status every 10 seconds
    while (wc.getJobInfo(jobId).getStatus() == Workflow.Status.RUNNING) {
        System.out.println("Workflow job running ...");
        Thread.sleep(10 * 1000);
    }
.
    // print the final status of the workflow job
    System.out.println("Workflow job completed ...");
    System.out.println(wf.getJobInfo(jobId));
    ...


Sample Java program to call workflow


Suppose your project ExecuteOozieJobUsingJavaAPI path  is  /user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI. Also workflow.xml and other files are in the project folder. You can get the these files and Java code from above source code project folder link. Put all these file & folder inside HDFS, so these are available for Map Reduce jobs internally using by Apache Hive. 

For reference please go through the workflow.xml file which is in the project folder.

Here we need to set all properties which we have used in the workflow.xml file

Note : Ensure to replace configuration highlighted properties with your environment specific configuration.

import java.util.Properties;

import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.WorkflowJob;

public class ExecuteOozieScheduler {

public static void main(String[] args) {
    OozieClient wc = new OozieClient("http://kingsolomon:11000/oozie");

    Properties conf = wc.createConfiguration();

   conf.setProperty(OozieClient.APP_PATH, 
 "hdfs://kingsolomon:9000/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI/workflow.xml");
    conf.setProperty("jobTracker", "kingsolomon:9001");
    conf.setProperty("nameNode", "hdfs://kingsolomon:9000");
    conf.setProperty("queueName", "default");
    conf.setProperty("dbScripts", 
    "hdfs://kingsolomon:9000/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI");
    conf.setProperty("rootFolder", 
    "hdfs://kingsolomon:9000/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI");
    
   // HDFS directories (multiple directories can be separated by a comma) that contain JARs
    conf.setProperty("oozie.libpath", "hdfs://kingsolomon:9000/user/oozie/share/lib");
    conf.setProperty("oozie.use.system.libpath", "true");
    conf.setProperty("oozie.wf.rerun.failnodes", "true");

    try {
        String jobId = wc.run(conf);
        System.out.println("Workflow job, " + jobId + " submitted");

        while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
            System.out.println("Workflow job running ...");
            Thread.sleep(10 * 1000);
        }
        System.out.println("Workflow job completed ...");
        System.out.println(wc.getJobInfo(jobId));
    } catch (Exception r) {
        System.out.println("Errors " + r.getLocalizedMessage());
    }
}
}


Local Oozie Client Java API

Oozie provides an embedded Oozie implementation, LocalOozie , which is useful for development, debugging and testing of workflow applications within the convenience of an IDE.

The code snipped below shows the usage of the LocalOozie class. All the interaction with Oozie is done using Oozie OozieClient Java API, as shown in the previous section.

The examples bundled with Oozie include the complete and running class, LocalOozie Example from where this snipped was taken.

Below code will elaborate the step for writing Local Oozie Client.

import org.apache.oozie.local.LocalOozie;
import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.WorkflowJob;
.
import java.util.Properties;
.
    ...
    // start local Oozie
    LocalOozie.start();
.
    // get a OozieClient for local Oozie
    OozieClient wc = LocalOozie.getClient();
.
    // create a workflow job configuration and set the workflow application path
    Properties conf = wc.createConfiguration();
    conf.setProperty(OozieClient.APP_PATH, 
"kingsolomon:9000/myapp");
.
    // setting workflow parameters
    conf.setProperty("jobTracker", "kingsolomon:9001");
    conf.setProperty("inputDir", "/myapp/inputdir");
    conf.setProperty("outputDir", "/myapp/outputdir");
    ...
.
    // submit and start the workflow job
    String jobId = wc.run(conf);
    System.out.println("Workflow job submitted");
.
    // wait until the workflow job finishes printing the status every 10 seconds
    while (wc.getJobInfo(jobId).getStatus() == Workflow.Status.RUNNING) {
        System.out.println("Workflow job running ...");
        Thread.sleep(10 * 1000);
    }
.
    // print the final status o the workflow job
    System.out.println("Workflow job completed ...");
    System.out.println(wf.getJobInfo(jobId));
.
    // stop local Oozie
    LocalOozie.stop();
    ...


Sample Java program to call LocalOozie workflow


Here we need to set all properties which we have used in the workflow.xml file

Note : Ensure to replace configuration highlighted properties with your environment specific configuration.

import java.util.Properties;

import org.apache.oozie.local.LocalOozie;
import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.WorkflowJob;

public class ExecuteLocalOozieScheduler {
public static void main(String[] args) throws Exception {
    LocalOozie.start();
// get a OozieClient for local Oozie
   OozieClient wc = LocalOozie.getClient();

   Properties conf = wc.createConfiguration();

   conf.setProperty(OozieClient.APP_PATH, 
 "hdfs://kingsolomon:9000/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI/workflow.xml");
   conf.setProperty("jobTracker", "kingsolomon:9001");
   conf.setProperty("nameNode", "hdfs://kingsolomon:9000");
   conf.setProperty("queueName", "default");
   conf.setProperty("dbScripts", "/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI");
   conf.setProperty("rootFolder", "/user/hadoopadmin/Desktop/ExecuteOozieJobUsingJavaAPI");
   // HDFS directories (multiple directories can be separated by a comma) that contain JARs
   conf.setProperty("oozie.libpath", "hdfs://kingsolomon:9000/user/oozie/share/lib");
   conf.setProperty("oozie.use.system.libpath", "true");
   conf.setProperty("oozie.wf.rerun.failnodes", "true");

   try {
       String jobId = wc.run(conf);
       System.out.println("Workflow job, " + jobId + " submitted");

       while (wc.getJobInfo(jobId).getStatus() == WorkflowJob.Status.RUNNING) {
           System.out.println("Workflow job running ...");
           Thread.sleep(10 * 1000);
       }
       System.out.println("Workflow job completed ...");
       System.out.println(wc.getJobInfo(jobId));
   } catch (Exception r) {
       System.out.println("Errors " + r.getLocalizedMessage());
   }
}

}


Sample Output
.....
Workflow job running ...
Workflow job running ...
Workflow job running ...
Workflow job running ...
Workflow job running ...
Workflow job running ...
Workflow job running ...
Workflow job completed ...

Workflow id[......] status[SUCCEEDED]


Hope you have enjoyed the article.
Author : Iqubal Mustafa Kaki, Technical Specialist

Want to connect with me
If you want to connect with me, please connect through my email  iqubal.kaki@gmail.com


3 comments:

  1. Thanks so much for sharing this. Would be helpful for someone preparing for Core JAVA certification. I recently enrolled at e-learnify. They are one of the best training providers for Online Java Learning Courses.

    ReplyDelete
  2. Nice Blog! Thanks for sharing valuable information with us. Keep sharing..
    Big Data Hadoop Online Training

    ReplyDelete