Bug 528015 - Crash in condor_schedd/QMF Agent on handling SIGQUIT
Summary: Crash in condor_schedd/QMF Agent on handling SIGQUIT
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-qmf
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.2
: ---
Assignee: Ken Giusti
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-10-08 15:42 UTC by Gordon Sim
Modified: 2015-11-16 01:11 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Management bug fix C: The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. C: The SIGQUIT signal would cause the QMF Agent thread to crash. F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT. R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used. The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. This caused the QMF Agent thread to crash. The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT, and the thread will now shut down cleanly.
Clone Of: 489557
Environment:
Last Closed: 2009-12-03 09:15:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
core file (540.53 KB, application/x-bzip2)
2009-10-09 06:41 UTC, Gordon Sim
no flags Details
logs (7.70 KB, application/x-compressed-tar)
2009-10-09 06:45 UTC, Gordon Sim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Comment 1 Gordon Sim 2009-10-09 06:41:50 UTC
Created attachment 364206 [details]
core file

Comment 2 Gordon Sim 2009-10-09 06:45:23 UTC
Created attachment 364207 [details]
logs

Comment 3 Ken Giusti 2009-10-20 13:52:40 UTC
Matt - the fix for bz489557 should cause the qmf agent to cleanly shutdown.  Condor should no longer require a SIGQUIT to exit the agent.  Please retest the shutdown implementation against the fix for bz489557 - you should no longer require a SIGQUIT.

https://bugzilla.redhat.com/show_bug.cgi?id=489557

Thanks,

-K

Comment 6 Irina Boverman 2009-10-29 14:13:40 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 7 Frantisek Reznicek 2009-10-30 11:00:57 UTC
I just filled bug 532002 which show that QMF agent shutdown is still not clean.
(This explains why SIGQUIT is/was used in condor)

Adding dependency on bug 532002, because it has to be resolved first.

Feel free to raise objections...

Comment 8 Frantisek Reznicek 2009-11-04 14:14:12 UTC
Removing dependency on bug 532002. Under test atm.

Comment 9 Frantisek Reznicek 2009-11-09 09:20:10 UTC
No crash of any component (uncluding condor_shedd) observed during long term restart test on RHEL 4.8 / 5.4 i386 / x86_64 on packages:

[root@hp-dl360-06 bz]# rpm -qa | egrep '(condor|qmf|qpid|rhm)' | sort
condor-7.4.1-0.4.el4
condor-qmf-plugins-7.4.1-0.4.el4
condor-remote-configuration-1.0-23.el4
python-qpid-0.5.760500-6.el4
qmf-0.5.752581-32.el4
qmf-devel-0.5.752581-32.el4
qpidc-0.5.752581-32.el4
qpidc-debuginfo-0.5.752581-32.el4
qpidc-devel-0.5.752581-32.el4
qpidc-ssl-0.5.752581-32.el4
qpidd-0.5.752581-32.el4
qpidd-acl-0.5.752581-32.el4
qpidd-devel-0.5.752581-32.el4
qpid-dotnet-0.4.738274-2.el4
qpidd-ssl-0.5.752581-32.el4
qpidd-xml-0.5.752581-32.el4
qpid-java-client-0.5.751061-9.el4
qpid-java-common-0.5.751061-9.el4
rhm-0.5.3206-25.el4
rhm-docs-0.5.756148-1.el4

As understood from discussion with Matt, condor keeps using SIGQUIT to qmf agents, so there might good to tune release notes for this bug (remove SIGQUIT part)

-> VERIFIED

Comment 10 Frantisek Reznicek 2009-11-11 13:04:42 UTC
An update:
This is both (common) RHEL 4.8 and RHEL 5.4 issue currently
RHEL 4.8 i386 / x86_64 and RHEL 5.4 i386 are showing the defect.

Comment 11 Frantisek Reznicek 2009-11-11 13:05:49 UTC
Ignore please last post(by mistake posted here instead of similar bug 534073)

Comment 12 Lana Brindley 2009-11-19 04:42:39 UTC
Noting comment #9, can someone please review the relnote and provide advice?

Cheers.

LKB

Comment 13 Lana Brindley 2009-11-19 04:42:39 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
+Management bug fix
+
+C: The QMF agent would send a SIGQUIT signal
+C: condor_schedd would crash
+F:
+R:
+
+
 The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 14 Ken Giusti 2009-11-19 14:29:36 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,9 +1,9 @@
 Management bug fix
 
-C: The QMF agent would send a SIGQUIT signal
-C: condor_schedd would crash
-F:
-R:
+C: The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate.
+C: The SIGQUIT signal would cause the QMF Agent thread to crash.
+F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT.
+R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used.
 
 
 The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 15 Lana Brindley 2009-11-20 00:47:46 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -5,5 +5,4 @@
 F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT.
 R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used.
 
-
+The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. This caused the QMF Agent thread to crash. The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT, and the thread will now shut down cleanly.-The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 16 errata-xmlrpc 2009-12-03 09:15:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.