Bug 486480

Summary: [RFE] Master should send obituary from .old logs if necessary
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: Timothy St. Clair <tstclair>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: medium Docs Contact:
Priority: low    
Version: 1.1CC: ltoscano, ltrilety, matt, mkudlej, tstclair
Target Milestone: 2.3Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: FutureFeature
Fixed In Version: condor-7.8.2-0.1 Doc Type: Enhancement
Doc Text:
C: When the condor master daemon sends an obituary during a log rollover event of the failed daemon. C: The obituary will not be sent. F: Update logic to check for rollover log, and send. R: The master should send an obituary email when a daemon fails during a log rollover.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 18:38:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2009-02-19 22:47:54 UTC
condor-7.2.2-0.1.el5 and related to BZ486462

From /var/log/condor/MasterLog:

2/19 11:00:08 Sending obituary for "/usr/sbin/condor_schedd"
2/19 11:00:08 Forking Mailer process...
2/19 11:00:08 Failed to email /var/log/condor/SchedLog: cannot open file

The Schedd failed in the middle of a log rotation. The Master was not able to email an obituary because the new SchedLog had not been created. In such a case the Master should attempt to mail part of the SchedLog.old instead.

Comment 1 Timothy St. Clair 2011-05-18 18:47:46 UTC
Is there an easy repro condition?

Comment 2 Matthew Farrellee 2011-05-18 18:58:38 UTC
I would expect setting MAX_SCHEDD_LOG to a small number then sending SIGKILL to the condor_schedd would assist in reproducing.

Comment 3 Luigi Toscano 2012-03-07 19:51:46 UTC
(In reply to comment #2)
> I would expect setting MAX_SCHEDD_LOG to a small number then sending SIGKILL to
> the condor_schedd would assist in reproducing.

Is this still the suggested way to reproduce it?

Comment 4 Timothy St. Clair 2012-03-14 16:48:29 UTC
-------------------------------------------------------
To repro: 

1.) Navigate to your LOG locations && `rm -f SchedLog*`

2.) set SCHEDD = /some/path/to/a/script/like/the/old/below in your config 

3.) drop a script which ~= the one below

#!/bin/sh 
echo "MASSIVE FAIL OBIT TEST" >> /your/LOG/loc/SchedLog.old
exit 1

4.) Start condor.

before fix you'll see a fail to open like in comment #1

after fix you'll see it fork the daemon and email.
-------------------------------------------------------

Tracking changes upstream.

Comment 6 Timothy St. Clair 2012-03-19 16:23:48 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: When the condor master daemon sends an obituary during a log rollover event of the failed daemon. 
C: The obituary will not be sent.
F: Update logic to check for rollover log, and send.
R: The master should send an obituary email when a daemon fails during a log rollover.

Comment 10 Martin Kudlej 2013-02-06 09:00:40 UTC
Tested on RHEL 5.9/6.4 x i386/x86_64 with condor-7.8.8-0.4.1 and it works. -->VERIFIED

Comment 12 errata-xmlrpc 2013-03-06 18:38:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html