Hide Forgot
Version-Release number of selected component (if applicable): condor-7.6.1-0.6.el6.i686 condor-aviary-7.6.1-0.6.el6.i686 condor-classads-7.6.1-0.6.el6.i686 condor-debuginfo-7.6.1-0.6.el6.i686 condor-qmf-7.6.1-0.6.el6.i686 condor-wallaby-base-db-1.12-1.el6.noarch condor-wallaby-client-4.0-6.el6.noarch condor-wallaby-tools-4.0-6.el6.noarch python-condorutils-1.5-3.el6.noarch python-qpid-qmf-0.10-7.el6.i686 qpid-qmf-0.10-7.el6.i686 ruby-qpid-qmf-0.10-7.el6.i686 wso2-axis2-2.1.0-3.el6.i686 wso2-rampart-2.1.0-3.el6.i686 wso2-wsf-cpp-2.1.0-3.el6.i686 wso2-wsf-cpp-debuginfo-2.1.0-3.el6.i686 Red Hat Enterprise Linux Server release 6.1 (Santiago) How reproducible: 100% Steps to Reproduce: 1. install aviary 2. submit 10 jobs via aviary 3. if those jobs end, call getData on each of them in this order of data types: ['ERR', 'LOG', 'OUT'] 4. client stucks on first call of getData and after manual break of client based on suds I see this: ... result = client.service.getJobDetails(ids_avia) File "/usr/lib/python2.4/site-packages/suds/client.py", line 539, in __call__ return client.invoke(args, kwargs) File "/usr/lib/python2.4/site-packages/suds/client.py", line 598, in invoke result = self.send(msg) File "/usr/lib/python2.4/site-packages/suds/client.py", line 623, in send reply = transport.send(request) File "/usr/lib/python2.4/site-packages/suds/transport/https.py", line 64, in send return HttpTransport.send(self, request) File "/usr/lib/python2.4/site-packages/suds/transport/http.py", line 77, in send fp = self.u2open(u2request) File "/usr/lib/python2.4/site-packages/suds/transport/http.py", line 116, in u2open return url.open(u2request) File "/usr/lib/python2.4/urllib2.py", line 358, in open response = self._open(req, data) File "/usr/lib/python2.4/urllib2.py", line 376, in _open '_open', req) File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain result = func(*args) File "/usr/lib/python2.4/urllib2.py", line 1118, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.4/urllib2.py", line 1090, in do_open r = h.getresponse() KeyboardInterrupt Actual results: Calling of getData doesn't work. Expected results: Calling of getData via aviari will work and there will be no coredump there.
After the valgrind cleanup looks like I'm zigging while Axis2/C is zagging... #16 <signal handler called> #17 0x005de16a in malloc_consolidate () from /lib/libc.so.6 #18 0x005e0c85 in _int_malloc () from /lib/libc.so.6 #19 0x005e1efe in malloc () from /lib/libc.so.6 #20 0x0027e522 in xmlBufferCreate () from /usr/lib/libxml2.so.2 #21 0x0076d35c in axiom_xml_writer_create_for_memory () from /usr/lib/libaxis2_parser.so.0 #22 0x0087d179 in axis2_http_transport_sender_invoke () from /usr/lib/libaxis2_http_sender.so.0 Possibly mismatched malloc/delete.
The hang appears be due to the fact that the stack has gotten catastrophically whacked. #0 0x00946424 in __kernel_vsyscall () #1 0x0065d1a3 in __lll_lock_wait_private () from /lib/libc.so.6 #2 0x005e4131 in _L_lock_9450 () from /lib/libc.so.6 #3 0x005e1ef4 in malloc () from /lib/libc.so.6
*** Bug 707543 has been marked as a duplicate of this bug. ***
Created attachment 500938 [details] Patch to create separate JobDataType ptr for return Diffed from upstream 7.6 branch to up-to-date FH master
memory was corrupted so that the runtime was stuck in a low-level libc lock on malloc causing the appearance of a hang when in fact a SEGV has occured
Tested on RHEL 5.6/6.1 x x86_64/i386 with condor-7.6.1-0.8 and it works. -->VERIFIED