Edit Info Other
Create account
Login

FedStage DRMAA Guide

1. Introduction

This is a short tutorial about writing applications using DRMAA library. It also contains FedStage DRMAA implementations specific notes. Please refer to specification or DRMAA web site if you need more information.

2. Installation

2.1. FedStage DRMAA for LSF

The library can be compiled using the make utility. Before compilation please make sure that all the variables in Makefile are set accordingly to your LSF paths.

2.2. FedStage DRMAA for PBS Pro

To compile the library just go to main source directory and type:

  $ ./configure [--prefix=/installation/directory] && make

If you had installed PBS in a non standard directory pass it in --with-pbs configure parameter. Unless you have taken sources directly from SVN repository, wish to run test-suite or change sources, there are no unusual requirements. ANSI C compiler and standard make should suffice.

2.2.1. Configuration

During DRMAA session initialization (drmaa_init) library tries to read its configuration parameters from locations: /etc/pbs_drmaa.conf, ~/.pbs_drmaa.conf and from file given in PBS_DRMAA_CONF environment variable (if set to non-empty string). If multiple configuration sources are present then all configurations are merged with values from user-defined files taking precedence (in following order: $PBS_DRMAA_CONF, ~/.pbs_drmaa.conf, /etc/pbs_drmaa.conf).

Currently recognized configuration parameters are:

2.2.2. Configuration file syntax

Configuration file is in form a dictionary. Dictionary is set of zero or more key-value pairs. Key is a string while value could be a string, an integer or another dictionary.

  configuration: dictionary | dictionary_body
  dictionary: '{' dictionary_body '}'
  dictionary_body: (string ':' value ',')*
  value: integer | string | dictionary
  string: unquoted-string | single-quoted-string | double-quoted-string
  unquoted-string: [^ \t\n\r:,0-9][^ \t\n\r:,]*
  single-quoted-string: '[^']*'
  double-quoted-string: "[^"]*"
  integer: [0-9]+

3. Compiling DRMAA applications

To write and run your DRMAA-enabled application you need two files - the DRMAA library (on Unix-like systems this is commonly libdrmaa.so shared library) and the header file drmaa.h. The header file is needed for the compilation, while the library is needed for linking. It is recommended to install those files in appropriate system paths, e.g. /usr/local/include for drmaa.h and /usr/local/lib for libdrmaa.so. You may also put them into the directory of your application but you must then remember to set appropriate flags for gcc while compiling (i.e. -I. -L.) and running (i.e. LD_LIBRARY_PATH environment variable) the application.

For example, let us assume you are developing a simple application using DRMAA and your source file is named main.c. You should then include appropriate header file at the beginning using #include <drmaa.h>. Assuming you have installed the DRMAA files to system locations, the process of compilation may look like this:

    gcc -L/usr/local/lib -I/usr/local/include -ldrmaa main.c -o main

3.1. Example 1

This is an example of fully working DRMAA application. You may think of it as a general skeleton of your future DRMAA applications. It performs the following steps referenced later in the source code comments:

   1 #include <stdio.h>
   2 #include <drmaa.h>
   3 
   4 /* We are supposed to always close session - also if something goes wrong. This
   5  * macro ends drmaa session and terminates this application. */
   6 #define RAISE_ERROR_2() \
   7 { \
   8   errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER); \
   9   if (errnum != DRMAA_ERRNO_SUCCESS) \
  10     fprintf(stderr, "Could not shut down the DRMAA library: %s\n", error); \
  11   return 1; \
  12 }
  13 
  14 /* In some cases we must also free additional structures, so this helper macro
  15  * terminates the application neatly */
  16 #define RAISE_ERROR_1() \
  17 { \
  18   errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER); \
  19   if (errnum != DRMAA_ERRNO_SUCCESS) { \
  20     fprintf(stderr, "Could not delete job template: %s\n", error); \
  21   } \
  22   RAISE_ERROR_2(); \
  23 }
  24 
  25 int
  26 main()
  27 {
  28   /* All drmaa routines accept error buffer as one of function arguments for
  29    * storing error information. Also all drmaa routines return errnum (error
  30    * code), one is happy if one gets DRMAA_ERRNO_SUCCESS. */
  31   char error[DRMAA_ERROR_STRING_BUFFER];
  32   int errnum = 0;
  33 
  34   /* Job template is needed only for job submission, you will have/may to set
  35    * different attributes, as for example: command (application to run),
  36    * arguments the job takes, etc. */
  37   drmaa_job_template_t *jt = NULL;
  38 
  39   /* INIT SESSION - you always need to start session before interacting with
  40    * DRMAA. Session stores some valuable information. You are able for example
  41    * to submit MANY jobs in one session and wait for ANY job to finish (from
  42    * current session), without explicitly saying which one.  You may also
  43    * synchronize with ALL jobs (from current session) - once again - without
  44    * explicitly listing the jobs. */
  45   errnum = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER);
  46   if (errnum != DRMAA_ERRNO_SUCCESS) {
  47     fprintf(stderr, "Could not initialize the DRMAA library: %s\n", error);
  48     return 1;
  49   }
  50 
  51   /* SET JOB TEMPLATE */
  52   {
  53     /* Hope everybody has /bin/sleep, let's sleep for 5 seconds */
  54     const char *command = "/bin/sleep";
  55     const char *args[2] = { "5", NULL };
  56 
  57     /* First we allocate the job template in memory */
  58     errnum = drmaa_allocate_job_template(&jt, error, DRMAA_ERROR_STRING_BUFFER);
  59     if (errnum != DRMAA_ERRNO_SUCCESS) {
  60       fprintf(stderr, "Could not create job template: %s\n", error);
  61       RAISE_ERROR_2();
  62     }
  63 
  64     /* Command - the only attribute that MUST be set,
  65      * all of the others do not have to be */
  66     errnum =
  67         drmaa_set_attribute(jt, DRMAA_REMOTE_COMMAND, command,
  68                             error, DRMAA_ERROR_STRING_BUFFER);
  69     if (errnum != DRMAA_ERRNO_SUCCESS) {
  70       fprintf(stderr, "Could not set attribute \"%s\": %s\n",
  71               DRMAA_REMOTE_COMMAND, error);
  72       RAISE_ERROR_1();
  73     }
  74 
  75     /* Arguments passed to the job. It is a vector attribute which means you
  76      * may pass many different strings. /bin/sleep however takes only on
  77      * argument, let's stick to that.  So the command and arguments we set
  78      * result in: "/bin/sleep 5" - this will be send to the DRM system. */
  79     errnum = drmaa_set_vector_attribute(jt, DRMAA_V_ARGV, args, error,
  80                                         DRMAA_ERROR_STRING_BUFFER);
  81     if (errnum != DRMAA_ERRNO_SUCCESS) {
  82       fprintf(stderr, "Could not set attribute \"%s\": %s\n", DRMAA_V_ARGV,
  83               error);
  84       RAISE_ERROR_1();
  85     }
  86 
  87     /* This means the job will be in HOLD state after submission (so it will
  88      * not run until we release the job). This is very useful in two-phase
  89      * submit scenarios. */
  90     errnum =
  91         drmaa_set_attribute(jt, DRMAA_JS_STATE, DRMAA_SUBMISSION_STATE_HOLD,
  92                             error, DRMAA_ERROR_STRING_BUFFER);
  93     if (errnum != DRMAA_ERRNO_SUCCESS) {
  94       fprintf(stderr, "Could not set attribute \"%s\": %s\n", DRMAA_JS_STATE, error);
  95       RAISE_ERROR_1();
  96     }
  97   }
  98 
  99   {
 100     /* Filled by drmaa_run_job with the new jobid of submitted job */
 101     char jobid[DRMAA_JOBNAME_BUFFER];
 102 
 103     /* Filled by drmaa_wait with a jobid of a job we have just waited for */
 104     char jobid_out[DRMAA_JOBNAME_BUFFER];
 105 
 106     /* Filled by drmaa_job_ps with a current status of a job */
 107     int status = 0;
 108 
 109     /* Filled by drmaa_wait with information about used resources */
 110     drmaa_attr_values_t *rusage = NULL;
 111 
 112     /* RUN JOB */
 113     errnum = drmaa_run_job(jobid, DRMAA_JOBNAME_BUFFER, jt,
 114                            error, DRMAA_ERROR_STRING_BUFFER);
 115     if (errnum != DRMAA_ERRNO_SUCCESS) {
 116       fprintf(stderr, "Could not submit job: %s\n", error);
 117       RAISE_ERROR_1();
 118     }
 119 
 120     /* Release job template - it is no longer needed */
 121     errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER);
 122     if (errnum != DRMAA_ERRNO_SUCCESS) {
 123       fprintf(stderr, "Could not delete job template: %s\n", error);
 124     }
 125 
 126     /* CHECK JOB STATUS */
 127     errnum = drmaa_job_ps(jobid, &status, error, DRMAA_ERROR_STRING_BUFFER);
 128     if (errnum != DRMAA_ERRNO_SUCCESS) {
 129       fprintf(stderr, "Could not get job' status: %s\n", error);
 130       RAISE_ERROR_2();
 131     }
 132     if (status != DRMAA_PS_USER_ON_HOLD) {
 133       fprintf(stderr, "Expected DRMAA_USER_ON_HOLD job state\n");
 134       RAISE_ERROR_2()
 135     }
 136 
 137     printf("Your job has been submitted ON HOLD with id %s\n", jobid);
 138 
 139     /* CONTROL JOB */
 140     errnum = drmaa_control(jobid, DRMAA_CONTROL_RELEASE,
 141                            error, DRMAA_ERROR_STRING_BUFFER);
 142     if (errnum != DRMAA_ERRNO_SUCCESS) {
 143       fprintf(stderr, "Could not release job: %s\n", error);
 144       RAISE_ERROR_2();
 145     }
 146 
 147     /* One may wish to once again check the status -
 148      * mostly often it will be pending or running */
 149     printf("Your job %s has been released\n", jobid);
 150 
 151     /* WAIT JOB - instead of jobid you may use DRMAA_JOBIDS_SESSION_ANY,
 152      * which will make a wait for any submitted job during this session */
 153     errnum =
 154         drmaa_wait(jobid, jobid_out, DRMAA_JOBNAME_BUFFER, &status,
 155                    DRMAA_TIMEOUT_WAIT_FOREVER, &rusage, error,
 156                    DRMAA_ERROR_STRING_BUFFER);
 157 
 158     if (errnum != DRMAA_ERRNO_SUCCESS) {
 159       fprintf(stderr, "Could not wait for job: %s\n", error);
 160       RAISE_ERROR_2();
 161     }
 162 
 163     /* AFTER WAIT - check exit status and resources used */
 164     {
 165       char usage[DRMAA_ERROR_STRING_BUFFER];
 166       int aborted = 0;
 167 
 168       drmaa_wifaborted(&aborted, status, NULL, 0);
 169       if (aborted == 1) {
 170         printf("Job %s never ran\n", jobid);
 171       } else {
 172         int exited = 0;
 173 
 174         drmaa_wifexited(&exited, status, NULL, 0);
 175         if (exited == 1) {
 176           int exit_status = 0;
 177 
 178           drmaa_wexitstatus(&exit_status, status, NULL, 0);
 179           printf("Job %s finished regularly with exit status %d\n",
 180                  jobid, exit_status);
 181         } else {
 182           int signaled = 0;
 183 
 184           drmaa_wifsignaled(&signaled, status, NULL, 0);
 185           if (signaled == 1) {
 186             char termsig[DRMAA_SIGNAL_BUFFER];
 187 
 188             drmaa_wtermsig(termsig, DRMAA_SIGNAL_BUFFER, status, NULL, 0);
 189             printf("Job %s finished due to signal %s\n", jobid, termsig);
 190           } else {
 191             printf("Job %s finished with unclear conditions\n", jobid);
 192           }
 193         }
 194       }
 195 
 196       printf("Resourced used by job %s\n", jobid_out);
 197       while (drmaa_get_next_attr_value(rusage, usage, DRMAA_ERROR_STRING_BUFFER)
 198              == DRMAA_ERRNO_SUCCESS) {
 199         printf("  %s\n", usage);
 200       }
 201       drmaa_release_attr_values(rusage);
 202     }
 203   }
 204 
 205   errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER);
 206   if (errnum != DRMAA_ERRNO_SUCCESS) {
 207     fprintf(stderr, "Could not shut down the DRMAA library: %s\n", error);
 208     return 1;
 209   }
 210 
 211   return 0;
 212 }

3.2. Example 2

DRMAA also supports bulk jobs and synchronization with all or chosen jobs from the current session. Feel free to experiment with the code snippets that perform the following steps:

   1 {
   2   /* RUN BULK JOB */
   3   drmaa_job_ids_t *ids = NULL;
   4 
   5   /* submit bulk with task ids as follows: 1, 3, 5, 7, 9, 11, ..., 27, 29 */
   6   errnum = drmaa_run_bulk_jobs (&ids, jt, 1, 30, 2, error, DRMAA_ERROR_STRING_BUFFER);
   7   if (errnum != DRMAA_ERRNO_SUCCESS) {
   8     fprintf (stderr, "Could not submit job: %s\n", error);
   9     RAISE_ERROR_1();
  10   }
  11 
  12   /* release job template - it is no longer needed */
  13   errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER);
  14   if (errnum != DRMAA_ERRNO_SUCCESS)
  15     fprintf (stderr, "Could not delete job template: %s\n", error);
  16 
  17   /* let's see what job and task ids we have */
  18   {
  19     char jobid[DRMAA_JOBNAME_BUFFER];
  20 
  21     while (drmaa_get_next_job_id (ids, jobid, DRMAA_JOBNAME_BUFFER)
  22            == DRMAA_ERRNO_SUCCESS) {
  23       printf("A job task has been submitted with id %s\n", jobid);
  24     }
  25 
  26     drmaa_release_job_ids(ids);
  27   }
  28 
  29   /* SYNCHRONIZE JOBS */
  30   errnum = drmaa_synchronize(DRMAA_JOB_IDS_SESSION_ALL, DRMAA_TIMEOUT_WAIT_FOREVER, 1,
  31                              error, DRMAA_ERROR_STRING_BUFFER);
  32   if (errnum != DRMAA_ERRNO_SUCCESS) {
  33     fprintf(stderr, "Could not wait for jobs: %s\n", error);
  34     RAISE_ERROR_2();
  35   }
  36 
  37   printf("All job tasks have finished.\n");
  38 }

4. DRMAA quick guide

This is a list of most commonly used DRMAA functions.

4.1. drmaa_init

4.2. drmaa_exit

4.3. drmaa_run_job

4.4. drmaa_run_bulk_jobs

4.5. drmaa_wait

4.6. drmaa_synchronize

4.7. drmaa_job_ps

4.8. drmaa_control

4.9. Job template handling functions

   1   drmaa_job_template_t *jt;
   2 
   3   drmaa_allocate_job_template(&jt, ...);
   4   drmaa_set_attribute(jt, ...);
   5   drmaa_set_vector_attribute(jt, ...);
   6   drmaa_delete_job_template(jt, ...);

5. FedStage DRMAA for LSF specific notes

6. FedStage DRMAA for PBS Pro specific notes

Library covers nearly all DRMAA 1.0 specification with exceptions listed below. It passes the official DRMAA test-suite except of tests which require job termination status. All mandatory and some optional job attributes (namely: transfer files, wall clock time limit, job run duration hlimit) are implemented.

For more information please see README file enclosed to distribution.

6.1. Native specification

DRMAA interface allows to pass DRM dependant job submission options. Those options may be specified by settings drmaa_native_specification or drmaa_job_category job attribute. drmaa_native_specification accepts space delimited qsub options while drmaa_job_category is name of job category defined in configuration file. qsub options which does not set job attributes (-b, -z, -C) as well as meant for submission of interactive jobs (-I, -X) or to specify directories (-d, -D) are not supported. Also instead of -W option following long options are accepted within native specification: --depend, --group-list, --stagein and --stageout. For detailed description of each option see PBS documentation.

Attributes set in native specification overrides corresponding DRMAA job attributes which overrides those set by job category. Vector attributes: resource list (-l), mail list (-M), environment variables (-v), job dependency (--depend), stagein (--stagein), and stageout (--stageout) which comes from various sources are not overridden but merged together. Other vector attributes are hard to merge correctly therefore their values are overridden in usual way. Those options are: list of shell interpreters (-S), user list (-u), group list (--group-list), input path (-i), output path (-o) and error path (-e).

Following table presents native specification strings and corresponding DRMAA attributes:

DRMAA attribute

PBS attribute

PBS resource

native specification

Attributes which get overridden

drmaa_job_name

Job_Name

-N job name

drmaa_output_path

Output_Path

-o output path

drmaa_error_path

Error_Path

-e error path

drmaa_join_files

Join_Path

-j join options

drmaa_block_email

Mail_Points

-m mail options

drmaa_start_time

Execution_Time

-a start time

drmaa_js_state

Hold_Types

-h

Account_Name

-A account string

Checkpoint

-c interval

Keep_Files

-k keep

Priority

-p priority

destination

-q queue

Rerunable

-r y/n

Shell_Path_List

-S path list

User_List

-u user list

group_list

--group-list= groups

Attributes which values are merged

drmaa_v_env

Variable_List

-v variable list

Variable_List

-V

drmaa_v_email

Mail_Users

-M user list

drmaa_duration_hlimit

Resource_List

cput

-l cput= limit

drmaa_wct_hlimit

Resource_List

walltime

-l walltime= limit

Resource_List

-l resources

depend

--depend= dependency

stagein

--stagein= stagein

stageout

--stageout= stageout

6.1.1. Example

Source code (test.c):

   1   #include <stdio.h>
   2   #include <drmaa.h>
   3 
   4   int main( int argc, const char *argv[] )
   5   {
   6     int rc = DRMAA_ERRNO_SUCCESS;
   7     drmaa_job_template_t *jt = NULL;
   8     char err[ DRMAA_ERROR_STRING_BUFFER ];
   9     char jobid[ DRMAA_JOBNAME_BUFFER ];
  10     const char *env[] = {
  11       "CLASSPATH=./lib",
  12       NULL
  13     };
  14 
  15     rc = drmaa_init( NULL, err, sizeof(err) );
  16     if( rc == DRMAA_ERRNO_SUCCESS )
  17       rc = drmaa_allocate_job_template( &jt, err, sizeof(err) );
  18     if( rc == DRMAA_ERRNO_SUCCESS )
  19       rc = drmaa_set_attribute( jt, DRMAA_REMOTE_COMMAND, "java",
  20           err, sizeof(err) );
  21     if( rc == DRMAA_ERRNO_SUCCESS )
  22       rc = drmaa_set_vector_attribute( jt, DRMAA_V_ARGV, argv+1,
  23           err, sizeof(err) );
  24     if( rc == DRMAA_ERRNO_SUCCESS )
  25       rc = drmaa_set_vector_attribute( jt, DRMAA_V_ENV, env,
  26           err, sizeof(err) );
  27     if( rc == DRMAA_ERRNO_SUCCESS )
  28       rc = drmaa_set_attribute( jt, DRMAA_DURATION_HLIMIT,
  29           "10:00", err, sizeof(err) );
  30     if( rc == DRMAA_ERRNO_SUCCESS )
  31       rc = drmaa_set_attribute( jt, DRMAA_NATIVE_SPECIFICATION,
  32           "-p 0 -V -l nodes=1:ppn=4", err, sizeof(err) );
  33     if( rc == DRMAA_ERRNO_SUCCESS )
  34       rc = drmaa_set_attribute( jt, DRMAA_JOB_CATEGORY, "java",
  35           err, sizeof(err) );
  36     if( rc == DRMAA_ERRNO_SUCCESS )
  37       rc = drmaa_run_job( jobid, sizeof(jobid), jt, err, sizeof(err) );
  38     if( rc == DRMAA_ERRNO_SUCCESS )
  39       printf( "submitted job id: %s\n", jobid );
  40     if( jt != NULL )
  41       drmaa_delete_job_template( jt, NULL, 0 );
  42     drmaa_exit( NULL, 0 );
  43     if( rc != DRMAA_ERRNO_SUCCESS )
  44       fprintf( stderr, "DRMAA error: %s\n", err );
  45     return rc;
  46   }

Configuration file (/etc/pbs_drmaa.conf or ~/.pbs_drmaa.conf):

  job_categories: {
    default: "-l pmem=100mb",
    java: "-l software=java,pmem=300mb -p -10 -v PATH=/opt/sun-jdk-1.6/bin:/usr/bin:/bin",
  }

After compilation (gcc -o test test.c -ldrmaa) running above code as ./test foo.jar is equivalent to following invocation of qsub:

  $ echo "java foo.jar" | \
  > qsub -v PATH=/opt/sun-jdk-1.6/bin:/usr/bin:/bin,CLASSPATH=./lib -V \
  > -l software=java,pmem=300mb,cput=10:00,nodes=1:ppn=4 -p 0

FedStage DRMAA Guide (last edited 2008-01-16 04:14:22 by KrzysztofKurowski)