Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 510747

Summary:

Out of Bounds exception when sending large QMF response

Product:

Red Hat Enterprise MRG

Reporter:

Matthew Farrellee <matt>

Component:

qpid-qmf

Assignee:

Ted Ross <tross>

Status:

CLOSED ERRATA

QA Contact:

Jan Sarenik <jsarenik>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

1.1.1

CC:

fnadge, freznice, iboverma, jsarenik, tross

Target Milestone:

1.3

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Previously, a QMF method would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-10-14 16:01:58 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Verification scripts	none
Updated verification scripts	none
Fixed verification scripts	none

Description Matthew Farrellee 2009-07-10 15:03:50 UTC

Description of problem:

When trying to send a QMF method out argument with strings that grow to over 64K, the agent will throw an exception and crash.


Version-Release number of selected component (if applicable):

qmf-0.5.752600-5.fc10.i386
qpidc-0.5.752600-5.fc10.i386


How reproducible:

100%


Steps to Reproduce:
1. start condor_job_server with a queue containing jobs with long attribute values, e.g. a big environment
2. use qpid-tool to call the server's "GetJob" method
3. watch server crash

  
Actual results:

terminate called after throwing an instance of 'qpid::framing::OutOfBounds'
  what():  Out of Bounds
Stack dump for process 22429 at timestamp 1247238103 (21 frames)
./condor_job_server(dprintf_dump_stack+0xd0)[0x80ff64c]
./condor_job_server[0x80ff80a]
[0x276400]
/lib/libc.so.6(abort+0x188)[0x71fe28]
/usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x158)[0x7eb3c48]
/usr/lib/libstdc++.so.6[0x7eb1b35]
/usr/lib/libstdc++.so.6[0x7eb1b72]
/usr/lib/libstdc++.so.6[0x7eb1caa]
/usr/lib/libqpidcommon.so.0(_ZN4qpid7framing6Buffer10getRawDataERSsj+0xbb)[0xc8f83b]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl16ConnectionThread10sendBufferERNS_7framing6BufferEjRKSsS7_+0xf6)[0xeeca36]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl19invokeMethodRequestERNS_7framing6BufferEjSs+0x2c3)[0xef0fa3]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl13pollCallbacksEj+0x10c)[0xef16ec]
./condor_job_server(_Z16HandleMgmtSocketP7ServiceP6Stream+0x1c)[0x80c2353]
./condor_job_server(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x177)[0x80ec889]
./condor_job_server(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x34)[0x80ecb88]
./condor_job_server(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8141f35]
./condor_job_server(_ZN10DaemonCore17CallSocketHandlerERib+0x1b4)[0x80df028]
./condor_job_server(_ZN10DaemonCore6DriverEv+0x1959)[0x80e0a0b]
./condor_job_server(main+0x17d3)[0x80f6dc6]
/lib/libc.so.6(__libc_start_main+0xe5)[0x7096e5]
./condor_job_server[0x80bb2f1]

Comment 1 Ted Ross 2009-09-08 14:50:19 UTC

*** Bug 508145 has been marked as a duplicate of this bug. ***

Comment 6 Ted Ross 2010-03-31 21:19:14 UTC

Fix committed upstream at revision 929716.

Comment 7 Frantisek Reznicek 2010-06-04 08:36:55 UTC

May I ask you for more info, please? An example would be very appreciated.
Raising NEEDINFO.

Comment 8 Matthew Farrellee 2010-06-04 10:39:41 UTC

I ran into the bug when submitting jobs to a schedd and then querying for all of them. However, the broker has an echo method. You may be able to simply send >64K of data to that method to reproduce.

Comment 10 Jan Sarenik 2010-10-04 13:28:11 UTC

There are some uncertainities:

 1. There is no condor_job_server in 1.1.1 Grid release.
 2. When I try to run 1.3RC Grid against 1.1.1 broker,
    I can not access grid objects via qpid-tool (either
    1.3RC or 1.1.1) and I think it may be caused by
    QMF versions.

How should I verify it?

Comment 11 Florian Nadge 2010-10-07 11:26:34 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 10 MB.

Comment 12 Florian Nadge 2010-10-07 11:52:37 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 10 MB.+Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.

Comment 13 Martin Prpič 2010-10-07 14:16:11 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.+Previously, a QMF method would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.

Comment 14 Jan Sarenik 2010-10-07 14:18:20 UTC

Created attachment 452111 [details]
Verification scripts

During phone meeting I was told this bug does not have to
be reproduced to verify it is working.

[user@host bz601828]$ ./runtest.sh 
x86_64
redhat-release-5Server-5.5.0.2
condor-qmf-7.4.4-0.16.el5
python-qpid-0.7.946106-14.el5
python-qmf-0.7.946106-13.el5
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: ...SUCCESS

I will have to finish it for RHEL4 and RHEL5 i386 tomorrow.

Comment 15 Jan Sarenik 2010-10-07 14:18:48 UTC

Cleaning NEEDINFO.

Comment 16 Jan Sarenik 2010-10-08 08:04:20 UTC

x86_64
redhat-release-4AS-9
condor-qmf-7.4.4-0.16.el4
python-qpid-0.7.946106-14.el4
python-qmf-0.7.946106-13.el4
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 17 Jan Sarenik 2010-10-08 08:07:40 UTC

i686
redhat-release-4AS-9
condor-qmf-7.4.4-0.16.el4
python-qpid-0.7.946106-14.el4
python-qmf-0.7.946106-13.el4
qpid-cpp-server-0.7.946106-17.el4
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 18 Jan Sarenik 2010-10-08 08:14:03 UTC

It is vital to set auth=no on the broker for
Condor QMF agents to appear on all the latest
qpid-cpp-server-0.7.946106-17 builds.

The test should run under an unprivileged user
which has sudo NOPASSWORD right, see sudo(8)
manual page for more info.

Comment 19 Jan Sarenik 2010-10-08 08:28:28 UTC

Created attachment 452296 [details]
Updated verification scripts

$ ./runtest.sh 100
i686
redhat-release-5Server-5.5.0.2
qpid-cpp-server-0.7.946106-17.el5
condor-qmf-7.4.4-0.16.el5
python-qpid-0.7.946106-14.el5
python-qmf-0.7.946106-13.el5
Clean: .
Submit: ..Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 20 Jan Sarenik 2010-10-08 08:28:59 UTC

Verified on all supported architectures and RHEL versions.

Comment 21 Jan Sarenik 2010-10-08 14:26:44 UTC

Created attachment 452353 [details]
Fixed verification scripts

Sorry, the test was deleting /var/lib/qpidd which contains SASL database.
Here are the updated scripts. No need for auth=no anymore.

Comment 23 errata-xmlrpc 2010-10-14 16:01:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html