Bug 845545

Summary: aviary jobserver timeouts for ssl connections
Product: Red Hat Enterprise MRG Reporter: Martin Kudlej <mkudlej>
Component: condor-aviaryAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.2CC: iboverma, matt, pmackinn
Target Milestone: 2.3   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-04 11:43:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kudlej 2012-08-03 12:01:51 UTC
Description of problem:
I've tried to write aviary client with ssl for our testing API. I've got code from aviary examples and refactored it. I've used the same code for ssl with same valid certificates for submitting and also for querying information about jobs. Submitting via aviary with ssl works but query to jobserver ends with timeout.

I've checked my code many times, I've debug it with python debugger but I haven't found anything wrong and any difference from your code expect of different classes. Strange is that same code without ssl works and same code for ssl works for submitting. I've also tried small(10s)/big(1 minute) timeout and it didn't work.

My code for ssl:
class HTTPSTransport(suds.transport.http.HttpTransport):
  def __init__(self, timeout=300, *args, **kwargs):
    suds.transport.http.HttpTransport.__init__(self, *args, **kwargs)
    ca_dir = mrg_utils.MRGEnv.get_env_var('MRG_GRID_CERTIFICATES_DIR', '/etc/condor/certs/') # this function just get variable from bash env
    self.key = mrg_utils.MRGEnv.get_env_var('MRG_OPENSSL_CLIENT_KEY', ca_dir + 'client.key')
    self.cert = mrg_utils.MRGEnv.get_env_var('MRG_OPENSSL_CLIENT_CRT', ca_dir + 'client.crt')
    self.ca_cert = mrg_utils.MRGEnv.get_env_var('MRG_OPENSSL_CA_CRT', ca_dir + 'ca.crt')
    self.timeout = timeout

  def u2open(self, u2request):
    url = urllib2.build_opener(HTTPSAuthHandler(self.key, self.cert, self.ca_cert, self.timeout))
    if self.u2ver() < 2.6:
      socket.setdefaulttimeout(self.timeout)
      return url.open(u2request)
    else:
      return url.open(u2request, timeout=self.timeout)

class HTTPSAuthHandler(urllib2.HTTPSHandler):
  def __init__(self, key, cert, ca_cert, timeout=300):
    urllib2.HTTPSHandler.__init__(self)
    self.key = key
    self.cert = cert
    self.ca_cert = ca_cert
    self.timeout = timeout

  def https_open(self, req):
    #print req.get_full_url()
    return self.do_open(self._get_connection, req)

  def _get_connection(self, host, timeout=300):
    return HTTPSConnection(host, key=self.key, cert=self.cert, ca_cert=self.ca_cert, timeout=self.timeout)

class HTTPSConnection(M2Crypto.httpslib.HTTPSConnection):
  def __init__(self, host, port=None, key=None, cert=None,
             ca_cert=None, strict=None, timeout=None):
    self.my_timeout = timeout
    ctx = M2Crypto.SSL.Context()
    ctx.load_cert(cert, key)
    ctx.load_verify_locations(cafile=ca_cert)
    ctx.load_client_CA(cafile=ca_cert)
    ctx.set_verify(M2Crypto.SSL.verify_peer | M2Crypto.SSL.verify_fail_if_no_peer_cert, depth=9)
    M2Crypto.httpslib.HTTPSConnection.__init__(self, host, port, strict, key_file=key, cert_file=cert, ssl_context=ctx)

  def connect(self):
    self.set_debuglevel(10)
    M2Crypto.httpslib.HTTPSConnection.connect(self)
    if self.my_timeout is not None:
      self.sock.settimeout(self.my_timeout)

message send by sud client and reply:
send: u'POST /services/query/getJobDetails HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-length: 395\r\nSoapaction: "http://grid.redhat.com/aviary-query/job/details"\r\nHost: mrg-qe-04.lab.eng.brq.redhat.com:9091\r\nUser-agent: Python-urllib/2.4\r\nConnection: close\r\nContent-type: text/xml; charset=utf-8\r\n\r\n'
send: '<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:ns0="http://query.aviary.grid.redhat.com" xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Header/><ns1:Body><ns0:GetJobDetails><ids><job>3.0</job></ids></ns0:GetJobDetails></ns1:Body></SOAP-ENV:Envelope>'
reply: ''

Submit works ok:
send: 'POST /services/job/submitJob HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-length: 1136\r\nSoapaction: "http://grid.redhat.com/aviary-job/submit"\r\nHost: mrg-qe-04.lab.eng.brq.redhat.com:9090\r\nUser-agent: Python-urllib/2.4\r\nConnection: close\r\nContent-type: text/xml; charset=utf-8\r\n\r\n'
send: '<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="http://job.aviary.grid.redhat.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Header/><ns0:Body><ns1:SubmitJob><cmd>/bin/sleep</cmd><args>71</args><owner>condor</owner><iwd>/tmp</iwd><submission_name>/bin/sleep 71</submission_name><extra><name>Requirements</name><type>EXPRESSION</type><value>(FileSystemDomain =!= UNDEFINED &amp;&amp; Arch =!= UNDEFINED)</value></extra><extra><name>JobUniverse</name><type>INTEGER</type><value>5</value></extra><extra><name>Err</name><type>STRING</type><value>/tmp/mrg_1.1.err43xJL</value></extra><extra><name>Out</name><type>STRING</type><value>/tmp/mrg_1.1.outDqcVv</value></extra><extra><name>UserLog</name><type>STRING</type><value>/tmp/mrg_1.1.logUK7wV</value></extra><extra><name>WhenToTransferOutput</name><type>STRING</type><value>ON_EXIT</value></extra><extra><name>ShouldTransferFiles</name><type>STRING</type><value>IF_NEEDED</value></extra></ns1:SubmitJob></ns0:Body></SOAP-ENV:Envelope>'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Thu Aug  2 06:08:35 2012 GMT^M
header: Server: Axis2C/1.6.0 (Simple Axis2 HTTP Server)^M
header: Content-Type: text/xml;charset=utf-8^M
header: Connection: close^M
header: Content-Length: 456^M

If I try this same operation by client from aviary exmples it works and I get proper response.

Version-Release number of selected component (if applicable):
condor-7.6.5-0.19.el5
condor-aviary-7.6.5-0.19.el5
condor-classads-7.6.5-0.19.el5
condor-qmf-7.6.5-0.19.el5
condor-wallaby-base-db-1.22-5.el5
condor-wallaby-client-4.1.2-1.el5
condor-wallaby-tools-4.1.2-1.el5
python-condorutils-1.5-4.el5
python-wallabyclient-4.1.2-1.el5
ruby-wallaby-0.12.5-10.el5
wallaby-0.12.5-10.el5
wallaby-utils-0.12.5-10.el5
wso2-axis2-2.1.0-8.el5
wso2-wsf-cpp-2.1.0-8.el5
wso2-wsf-cpp-devel-2.1.0-8.el5


How reproducible:
100%

  
Actual results:
I've got timeout from jobserver and I haven't timeout from scheduler with same code for ssl and creating suds client.

Comment 2 Pete MacKinnon 2012-08-03 13:59:53 UTC
[Thu Aug  2 06:14:52 2012] [info]  [ssl] Client verified OK
[Thu Aug  2 06:15:52 2012] [error] http_request_line.c(117) Invalid status line or invalid request line
[Thu Aug  2 06:15:52 2012] [error] simple_http_svr_conn.c(161) Invalid status line or invalid request line
[Thu Aug  2 06:15:52 2012] [error] /builddir/build/BUILD/condor-7.6.4/src/condor_contrib/aviary/src/Axis2SoapProvider.cpp(273) Could not create request

Hmmm, getSubmissionSummary is OK so there's something about the getJobDetails call.

Comment 4 Martin Kudlej 2012-10-04 11:43:50 UTC
I've found that there is bug 863070 in python. -->NOTABUG