Bug 696080

Summary: qmf submit performance issue
Product: Red Hat Enterprise MRG Reporter: Martin Kudlej <mkudlej>
Component: condor-qmfAssignee: Matthew Farrellee <matt>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: iboverma, matt, tross, tstclair
Target Milestone: 2.1   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-30 17:41:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
qpidd and condor configuration files none

Description Martin Kudlej 2011-04-13 08:26:11 UTC
Created attachment 491682 [details]
qpidd and condor configuration files

Description of problem:
Command line submitting: 100 jobs have token 8.236s -> ~12.1jobs/s
Submitting over QMF: 100 jobs have token 11.469s -> ~8.7 jobs/s

Are these results OK?

command line:
$ time for i in `seq 100`; do runuser -s /bin/bash -l condor -c "condor_submit /tmp/tmp7fOHaV.sub"; done
$ cat /tmp/tmp7fOHaV.sub
executable=/bin/sleep
iwd=/tmp
universe=vanilla
arguments=10000
queue

QMF:
from qmf.console import Session
import time
ad = {"cmd":"/bin/sleep",
      "args":"10000",
      "requirements":"TRUE",
      "iwd":"/tmp",
      "owner":"condor",
      "universe":"vanilla",
      "!!descriptors":  {"requirements":"com.redhat.grid.Expression"}
}                                                                          

session = Session();
broker = session.addBroker("cumin/cumin@localhost:5672", 60, 'PLAIN')
scheduler = session.getObjects(_class="scheduler", _package="com.redhat.grid")[0]
                                                                                 
for i in range(100):
  time1 = time.time()
  result = scheduler.SubmitJob(ad)
  time2 = time.time()
  print str(time2-time1)

session.delBroker(broker)
session.close()

condor cleaning before test run:
service condor stop
rm -f /var/lib/condor/spool/*
rm -f /var/log/condor/*
service condor start
"wait for starting startd, so schedd should be already up"

Version-Release number of selected component (if applicable):
qpid-cpp-client-0.10-3.el5
qpid-tools-0.10-2.el5
python-qpid-qmf-0.10-5.el5
python-condorutils-1.5-2.el5
condor-wallaby-client-4.0-5.el5
ruby-qpid-qmf-0.10-5.el5
condor-wallaby-tools-4.0-5.el5
python-qpid-0.10-1.el5
condor-aviary-7.6.0-0.6.el5
qpid-cpp-server-0.10-3.el5
qpid-qmf-0.10-5.el5
condor-qmf-7.6.0-0.6.el5
condor-7.6.0-0.6.el5

Steps to Reproduce:
1. install condor,qmf,qpidd and configure them
2. run above tests with cleaning condor and restarting qpidd

Comment 1 Ted Ross 2011-04-13 13:09:08 UTC
A couple of comments on this issue:

1) QMF does not have a "submit" feature.  If QMF method-call throughput is in question, then it should be benchmarked using asynchronous method calls.

2) The measurements made here are based on synchronous method calls (caller is blocked until the method completes and returns) where the remote processing takes some time.  It is unlikely that you will see very high submit throughput unless you use asynchronous method calls or use many separate submitter processes.

Comment 2 Martin Kudlej 2011-04-14 10:12:55 UTC
1) This bug was submitted for Grid QMF object com.redhat.grid.scheduler which has method SubmitJob.

2) I would call this method synchronously as I do synchronously submitting via command line.

Sorry for confusing, but my question was if that difference between command line submitting and QMF submitting is OK or not?
Do I anything wrong? Is those condor settings ok for best Condor<->QMF performance?

Comment 3 Matthew Farrellee 2011-08-30 17:06:29 UTC
Summary - take with a grain of salt -

CLI @ 21/s
QMF @ 8/s
Aviary @ 43/s

--

CLI -

T510, 12 runs - avg 4.73s ~= 21/s (single, time ./sub.sh 100)

$ cat sub.sh 
#!/bin/sh

TMP=$(mktemp XXXXXX.sub)
cat > $TMP <<EOF
executable=/bin/sleep
iwd=/tmp
universe=vanilla
arguments=10000
queue
EOF

TIMES=$1
for i in $(seq $TIMES); do
   condor_submit $TMP
done

rm $TMP

--

QMF -

T510, 12 runs - avg 11.44s ~= 8/s (single, ./sub.py)

NOTE: There is a 3-4 second startup delay for each run, thus the script calculates its own runtime.

$ cat sub.py 
#!/usr/bin/python

from qmf.console import Session
import time
ad = {"cmd":"/bin/sleep",
      "args":"10000",
      "requirements":"TRUE",
      "iwd":"/tmp",
      "owner":"matt",
      "universe":"vanilla",
      "!!descriptors":  {"requirements":"com.redhat.grid.Expression"}
}

session = Session();
broker = session.addBroker("cumin/cumin@localhost:5672", 60, 'PLAIN')
scheduler = session.getObjects(_class="scheduler",
_package="com.redhat.grid")[0]

start = time.time()
print "START", start
for i in range(100):
  time1 = time.time()
  result = scheduler.SubmitJob(ad)
  time2 = time.time()
  print str(time2-time1)
stop = time.time()
print "STOP", stop
print "RUN", stop-start

session.delBroker(broker)
session.close()

--

Aviary -

T510, 12 runs - avg 2.29s ~= 43/s (single, ./submit.py)

NOTE: The overhead (diff w/ time) is ~0.1s, but the script still calculates its own runtime to correspond to QMF submission.

$ cat submit.py 
#!/usr/bin/python

from suds import *
from suds.client import Client
from sys import exit, argv
import time, pwd, os
import logging
import argparse

# change these for other default locations and ports
wsdl = 'file:/var/lib/condor/aviary/services/job/aviary-job.wsdl'
url = 'http://localhost:9090/services/job/submitJob'
client = Client(wsdl)
client.set_options(location=url)
start = time.time()
for i in range(100):
   time1 = time.time()
   result = client.service.submitJob(
               '/bin/sleep',
	       '120',
	       "matt",
	       '/tmp',
	       'perf sub')
   if result.status.code != "OK":
      print result.status.code,"; ", result.status.text
      exit(1)
   time2 = time.time()
   print time2-time1
stop = time.time()
print "RUN", stop-start

Comment 4 Matthew Farrellee 2011-08-30 17:41:14 UTC
Martin, thank you for bringing attention to this. If you wish to produce performance regression test for submission methods please focus on the CLI and Aviary methods. 40Hz CLI and 80Hz Aviary make a reasonable baseline -- from grid0.

The QMF submission performance is disappointing, but does not warrant attention at this time. It will be replaced with Aviary submission, which performs better than CLI and QMF, in the simple tests found in comment 3.