Bug 528015

Summary: Crash in condor_schedd/QMF Agent on handling SIGQUIT
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-qmfAssignee: Ken Giusti <kgiusti>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1CC: esammons, gsim, iboverma, jneedle, jsarenik, kgiusti, lans.carstensen, lbrindle, matt, mcressma, tao, tross
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Management bug fix C: The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. C: The SIGQUIT signal would cause the QMF Agent thread to crash. F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT. R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used. The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. This caused the QMF Agent thread to crash. The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT, and the thread will now shut down cleanly.
Story Points: ---
Clone Of: 489557 Environment:
Last Closed: 2009-12-03 09:15:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    
Attachments:
Description Flags
core file
none
logs none

Comment 1 Gordon Sim 2009-10-09 06:41:50 UTC
Created attachment 364206 [details]
core file

Comment 2 Gordon Sim 2009-10-09 06:45:23 UTC
Created attachment 364207 [details]
logs

Comment 3 Ken Giusti 2009-10-20 13:52:40 UTC
Matt - the fix for bz489557 should cause the qmf agent to cleanly shutdown.  Condor should no longer require a SIGQUIT to exit the agent.  Please retest the shutdown implementation against the fix for bz489557 - you should no longer require a SIGQUIT.

https://bugzilla.redhat.com/show_bug.cgi?id=489557

Thanks,

-K

Comment 6 Irina Boverman 2009-10-29 14:13:40 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 7 Frantisek Reznicek 2009-10-30 11:00:57 UTC
I just filled bug 532002 which show that QMF agent shutdown is still not clean.
(This explains why SIGQUIT is/was used in condor)

Adding dependency on bug 532002, because it has to be resolved first.

Feel free to raise objections...

Comment 8 Frantisek Reznicek 2009-11-04 14:14:12 UTC
Removing dependency on bug 532002. Under test atm.

Comment 9 Frantisek Reznicek 2009-11-09 09:20:10 UTC
No crash of any component (uncluding condor_shedd) observed during long term restart test on RHEL 4.8 / 5.4 i386 / x86_64 on packages:

[root@hp-dl360-06 bz]# rpm -qa | egrep '(condor|qmf|qpid|rhm)' | sort
condor-7.4.1-0.4.el4
condor-qmf-plugins-7.4.1-0.4.el4
condor-remote-configuration-1.0-23.el4
python-qpid-0.5.760500-6.el4
qmf-0.5.752581-32.el4
qmf-devel-0.5.752581-32.el4
qpidc-0.5.752581-32.el4
qpidc-debuginfo-0.5.752581-32.el4
qpidc-devel-0.5.752581-32.el4
qpidc-ssl-0.5.752581-32.el4
qpidd-0.5.752581-32.el4
qpidd-acl-0.5.752581-32.el4
qpidd-devel-0.5.752581-32.el4
qpid-dotnet-0.4.738274-2.el4
qpidd-ssl-0.5.752581-32.el4
qpidd-xml-0.5.752581-32.el4
qpid-java-client-0.5.751061-9.el4
qpid-java-common-0.5.751061-9.el4
rhm-0.5.3206-25.el4
rhm-docs-0.5.756148-1.el4

As understood from discussion with Matt, condor keeps using SIGQUIT to qmf agents, so there might good to tune release notes for this bug (remove SIGQUIT part)

-> VERIFIED

Comment 10 Frantisek Reznicek 2009-11-11 13:04:42 UTC
An update:
This is both (common) RHEL 4.8 and RHEL 5.4 issue currently
RHEL 4.8 i386 / x86_64 and RHEL 5.4 i386 are showing the defect.

Comment 11 Frantisek Reznicek 2009-11-11 13:05:49 UTC
Ignore please last post(by mistake posted here instead of similar bug 534073)

Comment 12 Lana Brindley 2009-11-19 04:42:39 UTC
Noting comment #9, can someone please review the relnote and provide advice?

Cheers.

LKB

Comment 13 Lana Brindley 2009-11-19 04:42:39 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
+Management bug fix
+
+C: The QMF agent would send a SIGQUIT signal
+C: condor_schedd would crash
+F:
+R:
+
+
 The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 14 Ken Giusti 2009-11-19 14:29:36 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,9 +1,9 @@
 Management bug fix
 
-C: The QMF agent would send a SIGQUIT signal
-C: condor_schedd would crash
-F:
-R:
+C: The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate.
+C: The SIGQUIT signal would cause the QMF Agent thread to crash.
+F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT.
+R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used.
 
 
 The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 15 Lana Brindley 2009-11-20 00:47:46 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -5,5 +5,4 @@
 F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT.
 R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used.
 
-
+The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. This caused the QMF Agent thread to crash. The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT, and the thread will now shut down cleanly.-The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)

Comment 16 errata-xmlrpc 2009-12-03 09:15:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html