Bug 921612

Summary: SAR log file corruption in odd Feb 28th edge-case
Product: Red Hat Enterprise Linux 6 Reporter: Ted Rule <ejtr>
Component: sysstatAssignee: Peter Schiffer <pschiffe>
Status: CLOSED ERRATA QA Contact: Branislav NĂ¡ter <bnater>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: cdavis, fkrska, iain.morrison, jhunt, msaxena, ovasik, pportant, pschiffe, thatsafunnyname, yozone
Target Milestone: rcKeywords: Patch, Regression, Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: sysstat-9.0.4-25.el6 Doc Type: Bug Fix
Doc Text:
Cause: sysstat didn't check whether the sa daily data files exist from previous month when appending new statistics to them Consequence: in some edge cases (as for example short month) sysstat was appending new statistics to the old sa daily data files Fix: sysstat was modified to check whether the old sa daily data file exists and remove it if it does before appending new data to it Result: sysstat doesn't append new statistics to the old sa daily data files anymore
Story Points: ---
Clone Of:
: 1100365 (view as bug list) Environment:
Last Closed: 2014-10-14 06:40:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1056252, 1100365    
Attachments:
Description Flags
sysstat-9.0.4-history-25.patch
none
sysstat-9.0.4-overwrite-sa.patch none

Description Ted Rule 2013-03-14 14:23:06 UTC
Description of problem:

In order to retain some additional SAR logging, we locally chose to raise HISTORY in /etc/sysconfig/sysstat from the RPM default of 7 to 28. By an odd quirk, we believe that the SAR logs can become corrupted if HISTORY > 27, and the system itself is reconfigured with additional devices during February during a non-Leap-Year.

Version-Release number of selected component (if applicable):

sysstat-9.0.4-20.el6.i686

Additional info:

The SAR scripts /usr/lib/sa/sa1 and sa2 both normally use logs in /var/log/sa itself, but if HISTORY is > 28 , the scripts use a tree of log directories under /var/log/sa. There may be a problem with this log file layout as well, but I haven't been able to check for that case as yet. 

When HISTORY is < 28, then /usr/lib/sa/sa2's action to expunge old logs which are more than HISTORY days old will mean that "tomorrow's" saXX file never exists prior to /usr/lib/sa/sa1 creating it on the first pass of the day.

However, when HISTORY == 28, then on March 1st in a non-leap year, the log file sa01 will already exist from Feb 1st having not been pre-expunged by the sa2 script. Similarly for March 2nd through 28th.

It so happened that we reconfigured a CentOS6 system with an additional disk during February, and seemingly as a consequence the exact layout of the SAR logs changed sufficiently for sadc to regard it as invalid when sa1 attempted to reuse the log of the same name 28 days later.

Therefore, we believe it would be prudent to bullet-proof the sa1 script by pre-expunging a SAR log if it is more than 1 day old, which will prevent a log file from a month ago being re-used "today".



$ diff -u /usr/lib/sa/sa1 sa1.script.tweak  
--- /usr/lib/sa/sa1	2012-06-22 11:11:48.000000000 +0100
+++ sa1.script.tweak	2013-03-14 14:16:38.211424137 +0000
@@ -9,13 +9,15 @@
 SADC_OPTIONS="-S DISK"
 SYSCONFIG_DIR=/etc/sysconfig
 [ -r ${SYSCONFIG_DIR}/sysstat ] && . ${SYSCONFIG_DIR}/sysstat
+
+CURRENTDIR=`date +%Y%m`
+DATE=`date +%d`
+CURRENTFILE=sa${DATE}
+DDIR=/var/log/sa
+cd ${DDIR} || exit 1
+
 if [ ${HISTORY} -gt 28 ]
 then
-	CURRENTDIR=`date +%Y%m`
-	DATE=`date +%d`
-	CURRENTFILE=sa${DATE}
-	DDIR=/var/log/sa
-	cd ${DDIR} || exit 1
 	[ -d ${CURRENTDIR} ] || mkdir -p ${CURRENTDIR}
 	# If ${CURRENTFILE} exists and is a regular file, then make sure
        	# the file was modified this day (and not e.g. month ago)
@@ -24,11 +26,30 @@
 		[ -f ${CURRENTFILE} ] &&
 		[ "`date +%Y%m%d -r ${CURRENTFILE}`" = "${CURRENTDIR}${DATE}" ] &&
 		mv -f ${CURRENTFILE} ${CURRENTDIR}/${CURRENTFILE}
+	# If ${CURRENTFILE} exists and is a regular file, then make sure
+       	# the file was modified this day (and not e.g. month ago).
+	# If it is old, remove it so that it is recreated by sadc afresh
+	if [ -f ${CURRENTFILE} ]; then
+		find ${CURRENTDIR} -type f -name ${CURRENTFILE} -mtime -1  -print | grep -q ${CURRENTFILE}
+		if [ $? -ne 0 ]; then
+			rm -f ${CURRENTDIR}/${CURRENTFILE}
+		fi
+	fi
 	touch ${CURRENTDIR}/${CURRENTFILE}
 	# Remove the "compatibility" link and recreate it to point to
 	# the (new) current file
 	rm -f ${CURRENTFILE}
 	ln -s ${CURRENTDIR}/${CURRENTFILE} ${CURRENTFILE}
+else
+	# If ${CURRENTFILE} exists and is a regular file, then make sure
+       	# the file was modified this day (and not e.g. month ago).
+	# If it is old, remove it so that it is recreated by sadc afresh
+	if [ ! -L ${CURRENTFILE} -a -f ${CURRENTFILE} ]; then
+		find . -type f -name ${CURRENTFILE} -mtime -1  -print | grep -q ${CURRENTFILE}
+		if [ $? -ne 0 ]; then
+			rm -f ${CURRENTFILE}
+		fi
+	fi
 fi
 umask 0022
 ENDIR=/usr/lib/sa
$ 



$ cat sa1.script.tweak 
#!/bin/sh
# /usr/lib/sa/sa1
# (C) 1999-2009 Sebastien Godard (sysstat <at> orange.fr)
#
#@(#) sysstat-9.0.4
#@(#) sa1: Collect and store binary data in system activity data file.
#
HISTORY=0
SADC_OPTIONS="-S DISK"
SYSCONFIG_DIR=/etc/sysconfig
[ -r ${SYSCONFIG_DIR}/sysstat ] && . ${SYSCONFIG_DIR}/sysstat

CURRENTDIR=`date +%Y%m`
DATE=`date +%d`
CURRENTFILE=sa${DATE}
DDIR=/var/log/sa
cd ${DDIR} || exit 1

if [ ${HISTORY} -gt 28 ]
then
	[ -d ${CURRENTDIR} ] || mkdir -p ${CURRENTDIR}
	# If ${CURRENTFILE} exists and is a regular file, then make sure
       	# the file was modified this day (and not e.g. month ago)
	# and move it to ${CURRENTDIR}
	[ ! -L ${CURRENTFILE} ] &&
		[ -f ${CURRENTFILE} ] &&
		[ "`date +%Y%m%d -r ${CURRENTFILE}`" = "${CURRENTDIR}${DATE}" ] &&
		mv -f ${CURRENTFILE} ${CURRENTDIR}/${CURRENTFILE}
	# If ${CURRENTFILE} exists and is a regular file, then make sure
       	# the file was modified this day (and not e.g. month ago).
	# If it is old, remove it so that it is recreated by sadc afresh
	if [ -f ${CURRENTFILE} ]; then
		find ${CURRENTDIR} -type f -name ${CURRENTFILE} -mtime -1  -print | grep -q ${CURRENTFILE}
		if [ $? -ne 0 ]; then
			rm -f ${CURRENTDIR}/${CURRENTFILE}
		fi
	fi
	touch ${CURRENTDIR}/${CURRENTFILE}
	# Remove the "compatibility" link and recreate it to point to
	# the (new) current file
	rm -f ${CURRENTFILE}
	ln -s ${CURRENTDIR}/${CURRENTFILE} ${CURRENTFILE}
else
	# If ${CURRENTFILE} exists and is a regular file, then make sure
       	# the file was modified this day (and not e.g. month ago).
	# If it is old, remove it so that it is recreated by sadc afresh
	if [ ! -L ${CURRENTFILE} -a -f ${CURRENTFILE} ]; then
		find . -type f -name ${CURRENTFILE} -mtime -1  -print | grep -q ${CURRENTFILE}
		if [ $? -ne 0 ]; then
			rm -f ${CURRENTFILE}
		fi
	fi
fi
umask 0022
ENDIR=/usr/lib/sa
cd ${ENDIR}
[ "$1" = "--boot" ] && shift && BOOT=y || BOOT=n
if [ $# = 0 ] && [ "${BOOT}" = "n" ]
then
# Note: Stats are written at the end of previous file *and* at the
# beginning of the new one (when there is a file rotation) only if
# outfile has been specified as '-' on the command line...
	exec ${ENDIR}/sadc -F -L ${SADC_OPTIONS} 1 1 -
else
	exec ${ENDIR}/sadc -F -L ${SADC_OPTIONS} $* -
fi

$

Comment 10 Peter Schiffer 2014-05-15 16:11:25 UTC
*** Bug 1072565 has been marked as a duplicate of this bug. ***

Comment 11 Peter Schiffer 2014-05-22 15:11:27 UTC
Created attachment 898410 [details]
sysstat-9.0.4-history-25.patch

This problem is fixed by this patch and lowering the default history value to 25 days.

Comment 14 Peter Schiffer 2014-05-23 11:49:54 UTC
Created attachment 898661 [details]
sysstat-9.0.4-overwrite-sa.patch

Comment 15 Peter Schiffer 2014-05-23 11:53:00 UTC
The patch was updated reverting the change of boundary when the sa data files are stored in directories and default history value from 25 back to 28, as this change could potentially lead to broken backward compatibility issues in some cases.

Comment 17 Peter Portante 2014-06-04 13:56:49 UTC
I am seeing this happen not only in the February / March case, but April / May as well, and probably with all other 30 months as well.

I have collected sosreports that show this.

Comment 19 Cynthia Davis 2014-10-10 17:16:18 UTC
Hi - I am the SRM for Raymond James - do you have an idea of when the Errata will be ready for the customer?  Thanks!

Cynthia Baldwin, RHCE
Sr. SRM

Comment 21 errata-xmlrpc 2014-10-14 06:40:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1468.html