Bug 510747 - Out of Bounds exception when sending large QMF response
Summary: Out of Bounds exception when sending large QMF response
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-qmf
Version: 1.1.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.3
: ---
Assignee: Ted Ross
QA Contact: Jan Sarenik
URL:
Whiteboard:
: 508145 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-10 15:03 UTC by Matthew Farrellee
Modified: 2011-08-12 16:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a QMF method would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:01:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Verification scripts (24.61 KB, application/x-gzip)
2010-10-07 14:18 UTC, Jan Sarenik
no flags Details
Updated verification scripts (24.72 KB, application/x-gzip)
2010-10-08 08:28 UTC, Jan Sarenik
no flags Details
Fixed verification scripts (24.76 KB, application/x-gzip)
2010-10-08 14:26 UTC, Jan Sarenik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 658936 0 high CLOSED QMF engine-based agents segfault when returning more than 64k of argument data 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Internal Links: 658936

Description Matthew Farrellee 2009-07-10 15:03:50 UTC
Description of problem:

When trying to send a QMF method out argument with strings that grow to over 64K, the agent will throw an exception and crash.


Version-Release number of selected component (if applicable):

qmf-0.5.752600-5.fc10.i386
qpidc-0.5.752600-5.fc10.i386


How reproducible:

100%


Steps to Reproduce:
1. start condor_job_server with a queue containing jobs with long attribute values, e.g. a big environment
2. use qpid-tool to call the server's "GetJob" method
3. watch server crash

  
Actual results:

terminate called after throwing an instance of 'qpid::framing::OutOfBounds'
  what():  Out of Bounds
Stack dump for process 22429 at timestamp 1247238103 (21 frames)
./condor_job_server(dprintf_dump_stack+0xd0)[0x80ff64c]
./condor_job_server[0x80ff80a]
[0x276400]
/lib/libc.so.6(abort+0x188)[0x71fe28]
/usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x158)[0x7eb3c48]
/usr/lib/libstdc++.so.6[0x7eb1b35]
/usr/lib/libstdc++.so.6[0x7eb1b72]
/usr/lib/libstdc++.so.6[0x7eb1caa]
/usr/lib/libqpidcommon.so.0(_ZN4qpid7framing6Buffer10getRawDataERSsj+0xbb)[0xc8f83b]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl16ConnectionThread10sendBufferERNS_7framing6BufferEjRKSsS7_+0xf6)[0xeeca36]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl19invokeMethodRequestERNS_7framing6BufferEjSs+0x2c3)[0xef0fa3]
/usr/lib/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl13pollCallbacksEj+0x10c)[0xef16ec]
./condor_job_server(_Z16HandleMgmtSocketP7ServiceP6Stream+0x1c)[0x80c2353]
./condor_job_server(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x177)[0x80ec889]
./condor_job_server(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x34)[0x80ecb88]
./condor_job_server(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x29)[0x8141f35]
./condor_job_server(_ZN10DaemonCore17CallSocketHandlerERib+0x1b4)[0x80df028]
./condor_job_server(_ZN10DaemonCore6DriverEv+0x1959)[0x80e0a0b]
./condor_job_server(main+0x17d3)[0x80f6dc6]
/lib/libc.so.6(__libc_start_main+0xe5)[0x7096e5]
./condor_job_server[0x80bb2f1]

Comment 1 Ted Ross 2009-09-08 14:50:19 UTC
*** Bug 508145 has been marked as a duplicate of this bug. ***

Comment 6 Ted Ross 2010-03-31 21:19:14 UTC
Fix committed upstream at revision 929716.

Comment 7 Frantisek Reznicek 2010-06-04 08:36:55 UTC
May I ask you for more info, please? An example would be very appreciated.
Raising NEEDINFO.

Comment 8 Matthew Farrellee 2010-06-04 10:39:41 UTC
I ran into the bug when submitting jobs to a schedd and then querying for all of them. However, the broker has an echo method. You may be able to simply send >64K of data to that method to reproduce.

Comment 10 Jan Sarenik 2010-10-04 13:28:11 UTC
There are some uncertainities:

 1. There is no condor_job_server in 1.1.1 Grid release.
 2. When I try to run 1.3RC Grid against 1.1.1 broker,
    I can not access grid objects via qpid-tool (either
    1.3RC or 1.1.1) and I think it may be caused by
    QMF versions.

How should I verify it?

Comment 11 Florian Nadge 2010-10-07 11:26:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 10 MB.

Comment 12 Florian Nadge 2010-10-07 11:52:37 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 10 MB.+Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.

Comment 13 Martin Prpič 2010-10-07 14:16:11 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously,the Qpid Management Framework (QMF) method  would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.+Previously, a QMF method would exit with a segmentation fault when the result was larger than 64kB. With this update, this method works as expected, even for larger results.

Comment 14 Jan Sarenik 2010-10-07 14:18:20 UTC
Created attachment 452111 [details]
Verification scripts

During phone meeting I was told this bug does not have to
be reproduced to verify it is working.

[user@host bz601828]$ ./runtest.sh 
x86_64
redhat-release-5Server-5.5.0.2
condor-qmf-7.4.4-0.16.el5
python-qpid-0.7.946106-14.el5
python-qmf-0.7.946106-13.el5
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: ...SUCCESS

I will have to finish it for RHEL4 and RHEL5 i386 tomorrow.

Comment 15 Jan Sarenik 2010-10-07 14:18:48 UTC
Cleaning NEEDINFO.

Comment 16 Jan Sarenik 2010-10-08 08:04:20 UTC
x86_64
redhat-release-4AS-9
condor-qmf-7.4.4-0.16.el4
python-qpid-0.7.946106-14.el4
python-qmf-0.7.946106-13.el4
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 17 Jan Sarenik 2010-10-08 08:07:40 UTC
i686
redhat-release-4AS-9
condor-qmf-7.4.4-0.16.el4
python-qpid-0.7.946106-14.el4
python-qmf-0.7.946106-13.el4
qpid-cpp-server-0.7.946106-17.el4
Clean: .
Submit: ..Submitting job(s)............
12 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 18 Jan Sarenik 2010-10-08 08:14:03 UTC
It is vital to set auth=no on the broker for
Condor QMF agents to appear on all the latest
qpid-cpp-server-0.7.946106-17 builds.

The test should run under an unprivileged user
which has sudo NOPASSWORD right, see sudo(8)
manual page for more info.

Comment 19 Jan Sarenik 2010-10-08 08:28:28 UTC
Created attachment 452296 [details]
Updated verification scripts

$ ./runtest.sh 100
i686
redhat-release-5Server-5.5.0.2
qpid-cpp-server-0.7.946106-17.el5
condor-qmf-7.4.4-0.16.el5
python-qpid-0.7.946106-14.el5
python-qmf-0.7.946106-13.el5
Clean: .
Submit: ..Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 1.
Verify: SUCCESS

Comment 20 Jan Sarenik 2010-10-08 08:28:59 UTC
Verified on all supported architectures and RHEL versions.

Comment 21 Jan Sarenik 2010-10-08 14:26:44 UTC
Created attachment 452353 [details]
Fixed verification scripts

Sorry, the test was deleting /var/lib/qpidd which contains SASL database.
Here are the updated scripts. No need for auth=no anymore.

Comment 23 errata-xmlrpc 2010-10-14 16:01:58 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.