469388 – master core dumps after plugin reconfiguration

Bug 469388 - master core dumps after plugin reconfiguration

Summary: master core dumps after plugin reconfiguration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	grid
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	1.1
Target Release:	---
Assignee:	Matthew Farrellee
QA Contact:	Kim van der Riet
Docs Contact:
URL:
Whiteboard:
Depends On:	470167
Blocks:
TreeView+	depends on / blocked

Reported:	2008-10-31 15:54 UTC by Robert Rati
Modified:	2009-02-04 16:04 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-04 16:04:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:0036	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Grid 1.1 Release	2009-02-04 16:03:49 UTC

Description Robert Rati 2008-10-31 15:54:37 UTC

Description of problem:
Condor was running on RHEL5 with plugins enabled via PLUGIN_DIR, then reconfigured to use <subsys>.PLUGINS and restarted.  The master seems to coredump on shutdown.  This was seem on multiple nodes of different configurations (HA CM only, HA Schedd only, execute node).

Version-Release number of selected component (if applicable):
7.1.4-0.1

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
10/31 09:49:22 DaemonCore: Command Socket at <10.16.32.110:49238>
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtScheddPlugin-plugin.so reason: /usr/libexec/condor/MgmtScheddPlugin-plugin.so: undefined symbol: _ZTI16ClassAdLogPlugin
10/31 09:49:22 MasterPlugin registration succeeded
10/31 09:49:22 Successfully loaded plugin: /usr/libexec/condor/MgmtMasterPlugin-plugin.so
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtNegotiatorPlugin-plugin.so reason: /usr/libexec/condor/MgmtNegotiatorPlugin-plugin.so: undefined symbol: _ZTI16NegotiatorPlugin
10/31 09:49:22 Failed to load plugin: /usr/libexec/condor/MgmtCollectorPlugin-plugin.so reason: /usr/libexec/condor/MgmtCollectorPlugin-plugin.so: undefined symbol: _ZTI15CollectorPlugin
10/31 09:49:22 MgmtMasterPlugin initializing...
10/31 09:49:22 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 12940
10/31 10:10:02 Got SIGTERM. Performing graceful shutdown.
10/31 10:10:02 Sent SIGTERM to STARTD (pid 12940)
10/31 10:10:02 The STARTD (pid 12940) exited with status 0
10/31 10:10:02 All daemons are gone.  Exiting.
10/31 10:10:02 **** condor_master (condor_MASTER) EXITING WITH STATUS 0
Stack dump for process 12934 at timestamp 1225465802 (12 frames)
condor_master(dprintf_dump_stack+0xc0)[0x4c13cd]
condor_master[0x4c16a2]
/lib64/libpthread.so.0[0x375c40de70]
/lib64/libc.so.6(gsignal+0x35)[0x3fc0030155]
/lib64/libc.so.6(abort+0x110)[0x3fc0031bf0]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3fc00295d6]
/usr/libexec/condor/MgmtMasterPlugin-plugin.so(_ZN4qpid3sys5Mutex4lockEv+0x54)[0x2b33021a8552]
/usr/lib64/libqpidclient.so.0(_ZN4qpid6client10Dispatcher3runEv+0x9d0)[0x2b330285dc30]
/usr/lib64/libqmfagent.so.0(_ZN4qpid10management19ManagementAgentImpl16ConnectionThread3runEv+0x43d)[0x2b3302ac903d]
/usr/lib64/libqpidcommon.so.0[0x2b3302525eea]
/lib64/libpthread.so.0[0x375c4062f7]
/lib64/libc.so.6(clone+0x6d)[0x3fc00d1b6d]


Expected results:


Additional info:

Comment 1 Matthew Farrellee 2008-11-06 19:14:13 UTC

This is fixed in condor-7.1.4-0.4

The testing procedure is the same as for BZ470167, except run the condor_master in place of the qmf-agent example.

Plugins currently live in /usr/libexec/condor and can be configured with PLUGIN_DIR = /usr/libexec/condor

Plugins must be enabled and loaded for the test to be meaningful. Verify plugins are loaded by looking at log output. Successful loading is reported as part of daemon startup.

Running the master will also start other daemons that used to fail with an error as well. To see if any daemons are crashing: grep -i stack `condor_config_val LOG`/* Alternatively you can check the MasterLog for any non 0 exit values.

Comment 3 Frantisek Reznicek 2008-11-20 08:48:15 UTC

RHTS test qpid_test_qmf_agent_bz470167 validates that this issue has been fixed.
->VERIFIED

Comment 5 errata-xmlrpc 2009-02-04 16:04:33 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html

Note You need to log in before you can comment on or make changes to this bug.