Created attachment 364206 [details] core file
Created attachment 364207 [details] logs
Matt - the fix for bz489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent. Please retest the shutdown implementation against the fix for bz489557 - you should no longer require a SIGQUIT. https://bugzilla.redhat.com/show_bug.cgi?id=489557 Thanks, -K
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)
I just filled bug 532002 which show that QMF agent shutdown is still not clean. (This explains why SIGQUIT is/was used in condor) Adding dependency on bug 532002, because it has to be resolved first. Feel free to raise objections...
Removing dependency on bug 532002. Under test atm.
No crash of any component (uncluding condor_shedd) observed during long term restart test on RHEL 4.8 / 5.4 i386 / x86_64 on packages: [root@hp-dl360-06 bz]# rpm -qa | egrep '(condor|qmf|qpid|rhm)' | sort condor-7.4.1-0.4.el4 condor-qmf-plugins-7.4.1-0.4.el4 condor-remote-configuration-1.0-23.el4 python-qpid-0.5.760500-6.el4 qmf-0.5.752581-32.el4 qmf-devel-0.5.752581-32.el4 qpidc-0.5.752581-32.el4 qpidc-debuginfo-0.5.752581-32.el4 qpidc-devel-0.5.752581-32.el4 qpidc-ssl-0.5.752581-32.el4 qpidd-0.5.752581-32.el4 qpidd-acl-0.5.752581-32.el4 qpidd-devel-0.5.752581-32.el4 qpid-dotnet-0.4.738274-2.el4 qpidd-ssl-0.5.752581-32.el4 qpidd-xml-0.5.752581-32.el4 qpid-java-client-0.5.751061-9.el4 qpid-java-common-0.5.751061-9.el4 rhm-0.5.3206-25.el4 rhm-docs-0.5.756148-1.el4 As understood from discussion with Matt, condor keeps using SIGQUIT to qmf agents, so there might good to tune release notes for this bug (remove SIGQUIT part) -> VERIFIED
An update: This is both (common) RHEL 4.8 and RHEL 5.4 issue currently RHEL 4.8 i386 / x86_64 and RHEL 5.4 i386 are showing the defect.
Ignore please last post(by mistake posted here instead of similar bug 534073)
Noting comment #9, can someone please review the relnote and provide advice? Cheers. LKB
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,9 @@ +Management bug fix + +C: The QMF agent would send a SIGQUIT signal +C: condor_schedd would crash +F: +R: + + The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,9 +1,9 @@ Management bug fix -C: The QMF agent would send a SIGQUIT signal -C: condor_schedd would crash -F: -R: +C: The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. +C: The SIGQUIT signal would cause the QMF Agent thread to crash. +F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT. +R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used. The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -5,5 +5,4 @@ F: The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT. R: The QMF Agent thread will shutdown cleanly when Condor begins its shutdown sequence. A SIGQUIT of the QMF Agent thread is no longer used. - +The shutdown code for the QMF Agent thread of the Condor daemon would not exit properly, requiring the Condor shutdown sequence to issue a SIGQUIT signal to force the QMF Agent to terminate. This caused the QMF Agent thread to crash. The QMF Agent thread shutdown code was fixed to cleanly shutdown without the need for a SIGQUIT, and the thread will now shut down cleanly.-The fix for bz 489557 should cause the qmf agent to cleanly shutdown. Condor should no longer require a SIGQUIT to exit the agent (528015)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html