Bug 469388

Summary: master core dumps after plugin reconfiguration
Product: Red Hat Enterprise MRG Reporter: Robert Rati <rrati>
Component: gridAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Kim van der Riet <kim.vdriet>
Severity: urgent Docs Contact:
Priority: high    
Version: 1.0CC: freznice, matt
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 16:04:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 470167    
Bug Blocks:    

Description Robert Rati 2008-10-31 15:54:37 UTC
Description of problem:
Condor was running on RHEL5 with plugins enabled via PLUGIN_DIR, then reconfigured to use <subsys>.PLUGINS and restarted.  The master seems to coredump on shutdown.  This was seem on multiple nodes of different configurations (HA CM only, HA Schedd only, execute node).

Version-Release number of selected component (if applicable):
7.1.4-0.1

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
10/31 09:49:22 DaemonCore: Command Socket at <10.16.32.110:49238>
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtScheddPlugin-plugin.so reason: /usr/libexec/condor/MgmtScheddPlugin-plugin.so: undefined symbol: _ZTI16ClassAdLogPlugin
10/31 09:49:22 MasterPlugin registration succeeded
10/31 09:49:22 Successfully loaded plugin: /usr/libexec/condor/MgmtMasterPlugin-plugin.so
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtNegotiatorPlugin-plugin.so reason: /usr/libexec/condor/MgmtNegotiatorPlugin-plugin.so: undefined symbol: _ZTI16NegotiatorPlugin
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtCollectorPlugin-plugin.so reason: /usr/libexec/condor/MgmtCollectorPlugin-plugin.so: undefined symbol: _ZTI15CollectorPlugin
10/31 09:49:22 MgmtMasterPlugin initializing...
10/31 09:49:22 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 12940
10/31 10:10:02 Got SIGTERM. Performing graceful shutdown.
10/31 10:10:02 Sent SIGTERM to STARTD (pid 12940)
10/31 10:10:02 The STARTD (pid 12940) exited with status 0
10/31 10:10:02 All daemons are gone.  Exiting.
10/31 10:10:02 **** condor_master (condor_MASTER) EXITING WITH STATUS 0
Stack dump for process 12934 at timestamp 1225465802 (12 frames)
condor_master(dprintf_dump_stack+0xc0)[0x4c13cd]
condor_master[0x4c16a2]
/lib64/libpthread.so.0[0x375c40de70]
/lib64/libc.so.6(gsignal+0x35)[0x3fc0030155]
/lib64/libc.so.6(abort+0x110)[0x3fc0031bf0]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3fc00295d6]
/usr/libexec/condor/MgmtMasterPlugin-plugin.so(_ZN4qpid3sys5Mutex4lockEv+0x54)[0x2b33021a8552]
/usr/lib64/libqpidclient.so.0(_ZN4qpid6client10Dispatcher3runEv+0x9d0)[0x2b330285dc30]
/usr/lib64/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl16ConnectionThread3runEv+0x43d)[0x2b3302ac903d]
/usr/lib64/libqpidcommon.so.0[0x2b3302525eea]
/lib64/libpthread.so.0[0x375c4062f7]
/lib64/libc.so.6(clone+0x6d)[0x3fc00d1b6d]


Expected results:


Additional info:

Comment 1 Matthew Farrellee 2008-11-06 19:14:13 UTC
This is fixed in condor-7.1.4-0.4

The testing procedure is the same as for BZ470167, except run the condor_master in place of the qmf-agent example.

Plugins currently live in /usr/libexec/condor and can be configured with PLUGIN_DIR = /usr/libexec/condor

Plugins must be enabled and loaded for the test to be meaningful. Verify plugins are loaded by looking at log output. Successful loading is reported as part of daemon startup.

Running the master will also start other daemons that used to fail with an error as well. To see if any daemons are crashing: grep -i stack `condor_config_val LOG`/* Alternatively you can check the MasterLog for any non 0 exit values.

Comment 3 Frantisek Reznicek 2008-11-20 08:48:15 UTC
RHTS test qpid_test_qmf_agent_bz470167 validates that this issue has been fixed.
->VERIFIED

Comment 5 errata-xmlrpc 2009-02-04 16:04:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html