| Summary: | aviary doesn't return answer to client from calling getData function + coredump of condor_preen | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Martin Kudlej <mkudlej> | ||||
| Component: | condor-aviary | Assignee: | Pete MacKinnon <pmackinn> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Kudlej <mkudlej> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | Development | CC: | iboverma, jneedle, matt, pmackinn | ||||
| Target Milestone: | 2.0 | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | condor-7.6.1-0.8 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-06-27 14:20:10 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
After the valgrind cleanup looks like I'm zigging while Axis2/C is zagging... #16 <signal handler called> #17 0x005de16a in malloc_consolidate () from /lib/libc.so.6 #18 0x005e0c85 in _int_malloc () from /lib/libc.so.6 #19 0x005e1efe in malloc () from /lib/libc.so.6 #20 0x0027e522 in xmlBufferCreate () from /usr/lib/libxml2.so.2 #21 0x0076d35c in axiom_xml_writer_create_for_memory () from /usr/lib/libaxis2_parser.so.0 #22 0x0087d179 in axis2_http_transport_sender_invoke () from /usr/lib/libaxis2_http_sender.so.0 Possibly mismatched malloc/delete. The hang appears be due to the fact that the stack has gotten catastrophically whacked. #0 0x00946424 in __kernel_vsyscall () #1 0x0065d1a3 in __lll_lock_wait_private () from /lib/libc.so.6 #2 0x005e4131 in _L_lock_9450 () from /lib/libc.so.6 #3 0x005e1ef4 in malloc () from /lib/libc.so.6 *** Bug 707543 has been marked as a duplicate of this bug. *** Created attachment 500938 [details]
Patch to create separate JobDataType ptr for return
Diffed from upstream 7.6 branch to up-to-date FH master
memory was corrupted so that the runtime was stuck in a low-level libc lock on malloc causing the appearance of a hang when in fact a SEGV has occured Tested on RHEL 5.6/6.1 x x86_64/i386 with condor-7.6.1-0.8 and it works. -->VERIFIED |
Version-Release number of selected component (if applicable): condor-7.6.1-0.6.el6.i686 condor-aviary-7.6.1-0.6.el6.i686 condor-classads-7.6.1-0.6.el6.i686 condor-debuginfo-7.6.1-0.6.el6.i686 condor-qmf-7.6.1-0.6.el6.i686 condor-wallaby-base-db-1.12-1.el6.noarch condor-wallaby-client-4.0-6.el6.noarch condor-wallaby-tools-4.0-6.el6.noarch python-condorutils-1.5-3.el6.noarch python-qpid-qmf-0.10-7.el6.i686 qpid-qmf-0.10-7.el6.i686 ruby-qpid-qmf-0.10-7.el6.i686 wso2-axis2-2.1.0-3.el6.i686 wso2-rampart-2.1.0-3.el6.i686 wso2-wsf-cpp-2.1.0-3.el6.i686 wso2-wsf-cpp-debuginfo-2.1.0-3.el6.i686 Red Hat Enterprise Linux Server release 6.1 (Santiago) How reproducible: 100% Steps to Reproduce: 1. install aviary 2. submit 10 jobs via aviary 3. if those jobs end, call getData on each of them in this order of data types: ['ERR', 'LOG', 'OUT'] 4. client stucks on first call of getData and after manual break of client based on suds I see this: ... result = client.service.getJobDetails(ids_avia) File "/usr/lib/python2.4/site-packages/suds/client.py", line 539, in __call__ return client.invoke(args, kwargs) File "/usr/lib/python2.4/site-packages/suds/client.py", line 598, in invoke result = self.send(msg) File "/usr/lib/python2.4/site-packages/suds/client.py", line 623, in send reply = transport.send(request) File "/usr/lib/python2.4/site-packages/suds/transport/https.py", line 64, in send return HttpTransport.send(self, request) File "/usr/lib/python2.4/site-packages/suds/transport/http.py", line 77, in send fp = self.u2open(u2request) File "/usr/lib/python2.4/site-packages/suds/transport/http.py", line 116, in u2open return url.open(u2request) File "/usr/lib/python2.4/urllib2.py", line 358, in open response = self._open(req, data) File "/usr/lib/python2.4/urllib2.py", line 376, in _open '_open', req) File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain result = func(*args) File "/usr/lib/python2.4/urllib2.py", line 1118, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.4/urllib2.py", line 1090, in do_open r = h.getresponse() KeyboardInterrupt Actual results: Calling of getData doesn't work. Expected results: Calling of getData via aviari will work and there will be no coredump there.