Bug 903279 - plumage collector plugin crashes view server if it can't contact negotiator
Summary: plumage collector plugin crashes view server if it can't contact negotiator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-plumage
Version: 2.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: 2.5
: ---
Assignee: Pete MacKinnon
QA Contact: Stanislav Graf
URL:
Whiteboard:
Depends On:
Blocks: 1089287
TreeView+ depends on / blocked
 
Reported: 2013-01-23 16:03 UTC by Pete MacKinnon
Modified: 2014-04-28 16:46 UTC (History)
6 users (show)

Fixed In Version: condor-7.8.9-0.7
Doc Type: Bug Fix
Doc Text:
Cause: The plumage view-server plugin did not correctly detect a null socket, caused by failure to contact the negotiator. Consequence: Attempt to de-reference null socket caused a crash in the plugin. Fix: Plugin logic was fixed so that null sockets are detected and handled gracefully. Result: Plumage plugins no longer crash when the negotiator cannot be contacted.
Clone Of:
: 1089287 (view as bug list)
Environment:
Last Closed: 2014-04-28 16:46:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1004222 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Product Errata RHSA-2014:0440 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Grid 2.5 security, bug fix, and enhancement update 2014-04-28 20:43:37 UTC

Internal Links: 1004222

Description Pete MacKinnon 2013-01-23 16:03:28 UTC
If the Plumage view server plugin cannot connect to the negotiator for obtaining userprio data, it crashes with a stack like the following:

01/23/13 15:58:06 Accumulating data: Time=1358953086
01/23/13 15:58:07 Can't find address for negotiator 
01/23/13 15:58:07 ODSAccountant: Can't connect negotiator for Accountant ad!
01/23/13 15:58:07 ODSAccountant: failed to send GET_PRIORITY command to negotiator!
Stack dump for process 18775 at timestamp 1358953087 (10 frames)
/usr/lib64/libcondor_utils_7_8_7.so(dprintf_dump_stack+0x131)[0x389633eca1]
/usr/lib64/libcondor_utils_7_8_7.so[0x3896385a22]
/lib64/libpthread.so.0[0x3632c0f500]
/usr/lib64/condor/plugins/PlumageCollectorPlugin-plugin.so(_ZN7plumage3etl13ODSAccountant7fetchAdEv+0x64)[0x7f0d78074934]
/usr/lib64/condor/plugins/PlumageCollectorPlugin-plugin.so(_ZN22PlumageCollectorPlugin18recordAccountantAdEv+0x50)[0x7f0d780747e0]
/usr/lib64/libcondor_utils_7_8_7.so(_ZN12TimerManager7TimeoutEPiPd+0x1a1)[0x38964478c1]
/usr/lib64/libcondor_utils_7_8_7.so(_ZN10DaemonCore6DriverEv+0x763)[0x3896455083]
/usr/lib64/libcondor_utils_7_8_7.so(_Z7dc_mainiPPc+0xf50)[0x38964445d0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x363241ecdd]
condor_collector[0x40d9f9]

The code needs to return from the fetchAd method instead of trying to go ahead and decode from the NULL socket.

Comment 3 Martin Kudlej 2013-06-20 09:17:06 UTC
I've found this bug in our long term condor instance. If problem occurs it is not possible to contact collector by condor_status.

Comment 9 Stanislav Graf 2014-03-10 14:15:25 UTC
I have checked that the patch is part of condor-7.8.9-0.7.el6 source rpm.

Our test suite using condor-plumage and mongodb works as before with this change. No issues found.

--> VERIFIED

Comment 11 errata-xmlrpc 2014-04-28 16:46:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0440.html


Note You need to log in before you can comment on or make changes to this bug.