Condor-Web Services Plugin for GridSAM

 

 

We at the Department of Computer Science at University College London have been working in collaboration with the Condor team at the University of Wisconsin and the Department of Earth Sciences at the University of Cambridge on a project funded by DTI, JISC and Microsoft, aimed at developing Web Service interfaces to Condor.

 

Web Service interfaces have now been incorporated into all Condor daemons. The Schedd and Collector interfaces provide the core functionality that is available through the command line tools. As a means of demonstrating the functionality available through the Web Service interfaces, we developed a plugin for GridSAM, which is a Web Service based job submission and monitoring tool that interacts with a number of resource managers including Condor. Our plugin was designed to interact with Condor through the Web Service interfaces as an alternative to the existing plugin which accessed Condor through the command line tools. We were able to incorporate some interesting functionality such as the ability to submit to multiple Schedds which allows a simple form of load balancing as well as the possibility of resource federation. The plugin also includes the ability to submit to Condor-G. The full details of our work on this project is described in the Condor Birdbath All Hands paper.

 

 

Installation

 

Condor

 

The Web Services enabled Condor daemons are available in Condor version 6.7.5 and onwards. Details of these daemons can be found on the Condor BirdBath site. On some versions, the Schedd and Collector WSDL interface definitions are not included in the installation and must be downloaded separately from the Condor BirdBath site. These must then be copied into a directory in your Condor release directory such as ${RELEASE_DIR}/web.

 

In all versions of Condor, the Web Service interfaces must be enabled by adding the following attributes to the condor_config file:

 

WEB_ROOT_DIR = $RELEASE_DIR/web

ENABLE_SOAP = TRUE

ENABLE_WEB_SERVER = TRUE

 

GridSAM

 

The latest version of the plugin is compatible with the release of GridSAM bundled with OMII_2.0.0 which uses Apache Axis-1.2 RC3. Install the OMII server stack which is compatible with Redhat Enterprise Linux 3.0 ES and SuSE 9.0. Then proceed to install the Managed Programme components which will install GridSAM on the OMII server. The latest builds of GridSAM can be downloaded from the GridSAM homepage, however these may be incompatible with the plugin.

 

Plugin

 

Once you have installed the GridSAM server, copy the plugin JAR into the ${TOMCAT_HOME}/webapps/gridsam/WEB_INF/lib/ directory. Copy an appropriately configured version of the jobmanager.xml file into the ${TOMCAT_HOME}/webapps/gridsam/WEB_INF/classes/ directory. The plugin should now be ready for use.

 

 

JSDL

 

We have attempted to reproduce the functionality of the standard Condor plugin so that both plugins can be used interchangeably without having to modify JSDL. The JSDL supported by the standard plugin is described here. In addition to the functionality available in the standard plugin, we have implemented an interpretation of the jsdl-posix:WorkingDirectory element. The standard plugin creates a temporary working directory for each job where files can be staged in to. In our experience, the ability to specify a working directory is particularly useful when submitting from the same host as the GridSAM server, i.e. when client and server share a common file system and data staging is not necessary. Users can simply specify a working directory and all the input files can be specified relative to this directory. If jsdl:Source is not specified in a jsdl:DataStaging element, it is assumed that the file is available on the local file system and the data staging must occur solely between the GridSAM server and the Schedd. In any case, jsdl:DataStaging elements must be present to describe all input files including the executable so that the plugin is aware of the files that need to be staged on to the remote Schedd.

 

 

Jobmanager Configuration

 

The plugin can be configured through the jobmanager.xml file. An example is shown below to illustrate the various configuration points available.

 

<?xml version="1.0" encoding="UTF-8"?>

<module id="jobmanager.ssh" version="1.0.0">

 

    <!-- dependent modules -->

    <sub-module descriptor="org/icenigrid/gridsam/resource/config/common.xml"/>

    <sub-module descriptor="org/icenigrid/gridsam/resource/config/shell.xml"/>

    <sub-module descriptor="org/icenigrid/gridsam/resource/config/condor-wsbased.xml"/>

    <sub-module descriptor="org/icenigrid/gridsam/resource/config/embedded.xml"/>

    <sub-module descriptor="database.xml"/>

 

    <!-- override the factory defaults here -->

    <contribution configuration-id="hivemind.ApplicationDefaults">

        <default symbol="condor.AttachmentSize" value="500"/>

        <default symbol="condor.PollingInterval" value="10"/>

    </contribution>

 

    <!—- specify Schedd details -->

    <contribution configuration-id="condor.ScheddConfig">

        <Schedd hostname="fried.cs.ucl.ac.uk" port="3408" globusResource="lake.esc.cam.ac.uk/jobmanager-pbs">

                <Attributes name="x509userproxysubject" type="STRING-ATTR" value="/C=UK/O=eScience/OU=UCL/L=EISD/CN=some body"/>

                <Attributes name="x509userproxy" type="STRING-ATTR" value="/tmp/x509up_u500"/>

                <Attributes name="GlobusRSL" type="STRING-ATTR" value="(job_type=single)"/>

        </Schedd>

        <Schedd hostname="kotturoti.cs.ucl.ac.uk" collectorHostname="medoc.geol.ucl.ac.uk">

                <Requirements requirements="(OpSys=="WINNT51")" forceRequirements="true"/> 

                <Attributes name="NTDomain" type="STRING-ATTR" value="CS"/>

        </Schedd>

    </contribution>

</module>

 

 

condor.AttachmentSize

The SOAP attachment size (in KB) to be used by Condor to transfer files to and from the remote Schedd.

condor.PollingInterval

The interval (in seconds) between polling requests. The plugin relies on regular polling to query the status of GridSAM jobs on remote Schedds.

 

 

 

Multiplicity

Type

Description

·        Schedd

[0..*]

 

 

§         Hostname

[1]

String

The hostname of the remote Schedd.

§         Port

[0..1]

Integer

The port on which the Schedd is running. If this is not specified, for example if the Schedd port is dynamically allocated, CollectorHostname must be specified so that the Schedd port can be discovered through the Collector.

§         CollectorHostname

[0..1]

String

The hostname of the Collector daemon of the pool that this Schedd belongs to. This is used in case a Schedd port number is not provided or if the provided port number is invalid. The Collector can be queried to obtain the Schedd port number.

§         GlobusResource

[0..1]

String

A URL to a Globus jobmanager so that this Schedd can act as a Condor-G client.

§         Requirements

[0..1]

 

 

-          Requirements

[1]

String

A Condor requirements string.

-          ForceRequirements

[0..1]

Boolean

[true|false]

If set to TRUE, Schedd specific requirements and the job requirements from JSDL will be merged with the AND operator which provides strict requirements enforcement. We envisage a scenario where an administrator may wish to restrict the type of jobs that are allowed to run on a particular pool based on the job requirements. If this attribute is set to FALSE, Schedd specific requirements and the job requirements from JSDL will be merged with the OR operator which would serve as a guidance to submitters who may not be aware of the nature of the resources they are submitting to. Hence if the job requirements can not be satisfied by the pool, an attempt will still be made to schedule and execute the job on available resources. Default is FALSE.

§         Attributes

[0..*]

 

 

-          Name

[1]

String

Attribute Name.

-          Type

[1]

[INTEGER-ATTR| FLOAT-ATTR| STRING-ATTR| EXPRESSION-ATTR| BOOLEAN-ATTR| UNDEFINED-ATTR| ERROR-ATTR]

Attribute Type.

-          Value

[1]

String

Attribute Value.

 

 

Condor-G Submission

 

The plugin allows users to submit jobs to Globus via Condor-G. In the jobmanager, specific Schedds can be assigned to be a Condor-G client by setting the globusResource attribute. This configures the job ClassAd with the appropriate Condor-G attributes. However a few additional attributes have to be set manually as they are specific to each user/job. These are:

 

§         x509userproxysubject

§         x509userproxy

§         GlobusRSL

 

They can be set through the jobmanager as described above. An example is shown below:

 

<Attributes name="x509userproxysubject" type="STRING-ATTR" value="/C=UK/O=eScience/OU=UCL/L=EISD/CN=some body"/>

<Attributes name="x509userproxy" type="STRING-ATTR" value="/tmp/x509up_u500"/>

<Attributes name="GlobusRSL" type="STRING-ATTR" value="(job_type=single)"/>

 

 

Downloads

 

Plugin Jar

 

jobmanager.xml

 

Birdbath Paper