Bug 1072565
Summary: | sar files appended rather than replaced | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | redhat <iain.morrison> |
Component: | sysstat | Assignee: | Peter Schiffer <pschiffe> |
Status: | CLOSED DUPLICATE | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.5 | CC: | cfairchild, hnoefer, klepikho, pportant |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-05-15 16:11:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
redhat@mrc-epid.cam.ac.uk
2014-03-04 19:20:16 UTC
Hello, we got the same problem, all sa and sar files are twice as big as normal. wc -l /var/log/sa/sar01 8111 wc -l /var/log/sa/sar28 4073 Cron restarted, no success. Started on March 1 2014. Installed version is sysstat-9.0.4-22.el6.x86_64 -rw-r--r-- 1 root root 734292 Mar 1 23:50 /var/log/sa/sa01 -rw-r--r-- 1 root root 734292 Mar 2 23:50 /var/log/sa/sa02 -rw-r--r-- 1 root root 734292 Mar 3 23:50 /var/log/sa/sa03 -rw-r--r-- 1 root root 734292 Mar 4 23:50 /var/log/sa/sa04 -rw-r--r-- 1 root root 734292 Mar 5 23:50 /var/log/sa/sa05 -rw-r--r-- 1 root root 734292 Mar 6 23:50 /var/log/sa/sa06 -rw-r--r-- 1 root root 734292 Mar 7 23:50 /var/log/sa/sa07 -rw-r--r-- 1 root root 734292 Mar 8 23:50 /var/log/sa/sa08 -rw-r--r-- 1 root root 734292 Mar 9 23:50 /var/log/sa/sa09 -rw-r--r-- 1 root root 734292 Mar 10 23:50 /var/log/sa/sa10 -rw-r--r-- 1 root root 734292 Mar 11 23:50 /var/log/sa/sa11 -rw-r--r-- 1 root root 734292 Mar 12 23:50 /var/log/sa/sa12 -rw-r--r-- 1 root root 734292 Mar 13 23:50 /var/log/sa/sa13 -rw-r--r-- 1 root root 581412 Mar 14 13:50 /var/log/sa/sa14 -rw-r--r-- 1 root root 367380 Feb 15 23:50 /var/log/sa/sa15 -rw-r--r-- 1 root root 367380 Feb 16 23:50 /var/log/sa/sa16 -rw-r--r-- 1 root root 367380 Feb 17 23:50 /var/log/sa/sa17 -rw-r--r-- 1 root root 367380 Feb 18 23:50 /var/log/sa/sa18 -rw-r--r-- 1 root root 367380 Feb 19 23:50 /var/log/sa/sa19 -rw-r--r-- 1 root root 367380 Feb 20 23:50 /var/log/sa/sa20 -rw-r--r-- 1 root root 367380 Feb 21 23:50 /var/log/sa/sa21 -rw-r--r-- 1 root root 367380 Feb 22 23:50 /var/log/sa/sa22 -rw-r--r-- 1 root root 367380 Feb 23 23:50 /var/log/sa/sa23 -rw-r--r-- 1 root root 367380 Feb 24 23:50 /var/log/sa/sa24 -rw-r--r-- 1 root root 367380 Feb 25 23:50 /var/log/sa/sa25 -rw-r--r-- 1 root root 367380 Feb 26 23:50 /var/log/sa/sa26 -rw-r--r-- 1 root root 367380 Feb 27 23:50 /var/log/sa/sa27 -rw-r--r-- 1 root root 367380 Feb 28 23:50 /var/log/sa/sa28 -rw-r--r-- 1 root root 700753 Mar 1 23:53 /var/log/sa/sar01 -rw-r--r-- 1 root root 700753 Mar 2 23:53 /var/log/sa/sar02 -rw-r--r-- 1 root root 700753 Mar 3 23:53 /var/log/sa/sar03 -rw-r--r-- 1 root root 700756 Mar 4 23:53 /var/log/sa/sar04 -rw-r--r-- 1 root root 700753 Mar 5 23:53 /var/log/sa/sar05 -rw-r--r-- 1 root root 700753 Mar 6 23:53 /var/log/sa/sar06 -rw-r--r-- 1 root root 700753 Mar 7 23:53 /var/log/sa/sar07 -rw-r--r-- 1 root root 700753 Mar 8 23:53 /var/log/sa/sar08 -rw-r--r-- 1 root root 700753 Mar 9 23:53 /var/log/sa/sar09 -rw-r--r-- 1 root root 700753 Mar 10 23:53 /var/log/sa/sar10 -rw-r--r-- 1 root root 700753 Mar 11 23:53 /var/log/sa/sar11 -rw-r--r-- 1 root root 700753 Mar 12 23:53 /var/log/sa/sar12 -rw-r--r-- 1 root root 700753 Mar 13 23:53 /var/log/sa/sar13 -rw-r--r-- 1 root root 351076 Feb 14 23:53 /var/log/sa/sar14 -rw-r--r-- 1 root root 351076 Feb 15 23:53 /var/log/sa/sar15 -rw-r--r-- 1 root root 351076 Feb 16 23:53 /var/log/sa/sar16 -rw-r--r-- 1 root root 351076 Feb 17 23:53 /var/log/sa/sar17 -rw-r--r-- 1 root root 351076 Feb 18 23:53 /var/log/sa/sar18 -rw-r--r-- 1 root root 351076 Feb 19 23:53 /var/log/sa/sar19 -rw-r--r-- 1 root root 351076 Feb 20 23:53 /var/log/sa/sar20 -rw-r--r-- 1 root root 351076 Feb 21 23:53 /var/log/sa/sar21 -rw-r--r-- 1 root root 351076 Feb 22 23:53 /var/log/sa/sar22 -rw-r--r-- 1 root root 351076 Feb 23 23:53 /var/log/sa/sar23 -rw-r--r-- 1 root root 351076 Feb 24 23:53 /var/log/sa/sar24 -rw-r--r-- 1 root root 351076 Feb 25 23:53 /var/log/sa/sar25 -rw-r--r-- 1 root root 351076 Feb 26 23:53 /var/log/sa/sar26 -rw-r--r-- 1 root root 351076 Feb 27 23:53 /var/log/sa/sar27 -rw-r--r-- 1 root root 351076 Feb 28 23:53 /var/log/sa/sar28 Hi, I think I found the problem. In /usr/lib64/sa/sa2 are the following lines. find ${DDIR} \( -name 'sar??' -o -name 'sa??' -o -name 'sar??.gz' -o -name 'sa??.gz' -o -name 'sar??.bz2' -o -name 'sa??.bz2' \) \ -mtime +"${HISTORY}" -exec rm -f {} \; HISTORY is 28 by default. Because February has 28 days the old sa files are not deleted. So I think with a HISTORY of 26 everything should be fine for a February with 28 days. All other month should have a HISTORY with 28. Can someone please recheck this? Regards, Holger As mentioned above, between releases sysstat-9.0.4-20.el6 and sysstat-9.0.4-22.el6 the value for HISTORY in the file /etc/sysconfig/sysstat was changed from 7 to 28. I also adjusted mine to 26 but am still seeing the same behaviour. The sa18 file from February 18th is still on the system (which is now over 27 days old) and my log files since changing HISTORY to 26 still contain 2 months worth of data. So even with a value of 26 for the HISTORY setting, the files seem to wrap during March (sar still reports the entries from February and March). I think the necessity of allowing extra days is the result of the inaccuracy of -mtime (from the man page: When find figures out how many 24-hour periods ago the file was last accessed, any fractional part is ignored, so to match -atime +1, a file has to have been accessed at least two days ago) Hi, only the actual day will be overwritten. At time of writing, did you have March 18? /usr/lib64/sa/sa2 deletes all sa files which are modified more than HISTORY days ago and the script does it at 11.53pm So the next /usr/lib64/sa/sa1 at 0am will create a new sadd file. When the file has not been deleted, the output is appended and you have the same behavior like now. I changed the HISTORY option at March 14 and from then every new day is created without any problems. -rw-r--r-- 1 root root 734292 Mar 13 23:50 /var/log/sa/sa13 -rw-r--r-- 1 root root 734292 Mar 14 23:50 /var/log/sa/sa14 -rw-r--r-- 1 root root 367380 Mar 15 23:50 /var/log/sa/sa15 -rw-r--r-- 1 root root 367380 Mar 16 23:50 /var/log/sa/sa16 -rw-r--r-- 1 root root 367380 Mar 17 23:50 /var/log/sa/sa17 -rw-r--r-- 1 root root 138060 Mar 18 08:50 /var/log/sa/sa18 -rw-r--r-- 1 root root 367380 Feb 19 23:50 /var/log/sa/sa19 -rw-r--r-- 1 root root 367380 Feb 20 23:50 /var/log/sa/sa20 -rw-r--r-- 1 root root 367380 Feb 21 23:50 /var/log/sa/sa21 Today we have March 18 and everything is fine. Regards, Holger I can confirm that all my systems which have HISTORY set to 26 in /etc/sysconfig/sysstat are still appending to February's data, so running sar lists two days worth of sysstat statistics (February's followed by March's) and the averages are corrupted. The system on which I set HISTORY to 25 has only today's statistics. The logic in /usr/lib64/sa/sa1 which prevents adding to data an sa?? file from the previous month is only run if the value for HISOTRY is greater than 28. The logic in /usr/lib64/sa/sa2 relies on the find command for deleting old log files (past HISTORY days old) and while running find /var/log/sa/ -name 'sa??' -mtime +26 at 23:53 should have found last months file and deleted it, it does not seem to be working in practice. I think that I have found the issue which would explain why my systems are still appending while others may not be with the same settings. There was a new twist added to this when we went to daylight savings time on March 9th. Up until that time the value of 26 for HISTORY was working fine, however with the loss of the hour due to daylight savings time the files which are time stamped for 23:50 before the time change are no longer a full day older at 23:53 in daylight savings time (find uses 24 hour periods rather than hh:mm:ss). So for us, there will be a period in March (from the time shift to the 28th) in which a value of 26 for HISTORY will still append to the February stats in any year which is not a leap year. I don't think this is about how the sa data files. The sadc program itself should be able to detect an old datafile and rewrite it. The upstream code in sadc.c in at least 10.1.5+ does this, but that code does not exist in RHEL 5 or RHEL 6 code bases. See sadc.c, line 752 - 763, from 10.1.5: /* * If we are using the standard daily data file (file specified * as "-" on the command line) and it is from a past month, * then overwrite (truncate) it. */ get_time(&rectime, 0); if (((file_hdr.sa_month != rectime.tm_mon) || (file_hdr.sa_year != rectime.tm_year)) && WANT_SA_ROTAT(flags)) { close(*ofd); create_sa_file(ofd, ofile); return; } Here is a simple script that can be used to simulate the problem, run as root, and be aware is forcibly changes the time on your system: #!/bin/bash for m in 02 03 ; do for d in {1..28} ; do for h in 00 23 ; do for i in 00 50 ; do dt=$(printf "%s%02d%s%s" $m $d $h $i) date $dt /usr/lib64/sa/sa1 1 2 done done dt=$(printf "%s%02d2353" $m $d) date $dt /usr/lib64/sa/sa2 -A done done Hi guys, thanks for reporting this bug. Problem has been fixed in upstream [1] and I'm going to backport that patch, lower the default HISTORY value to 25 and backport patch [2] mentioned in comment 8. Also, I'm closing this bug as a duplicate of similar, older bug. Thanks, peter [1] https://github.com/sysstat/sysstat/commit/b07b592990fe8f140044930676dafe92377f6eae [2] https://github.com/sysstat/sysstat/commit/0aced09880c724952a02b86fdaf142b01e1bc988 *** This bug has been marked as a duplicate of bug 921612 *** Do we need to change the default HISTORY value to 25? Is that required to fix this? Hi Peter, the default history value will be lowered to 25 days, even though it's not really required. If the history is greater than 25, the sa data files are stored in separate subfolders by month, and we don't want this as a default configuration. Anyway, to fix this issue, besides lowering the default history, I've backported following patches to keep as close as possible to the upstream and future versions of sysstat: https://github.com/sysstat/sysstat/commit/4b06da42237b0c864c1c802c90163c3c1b5db50b https://github.com/sysstat/sysstat/commit/0aced09880c724952a02b86fdaf142b01e1bc988 https://github.com/sysstat/sysstat/commit/b07b592990fe8f140044930676dafe92377f6eae https://github.com/sysstat/sysstat/commit/f41438cb6c4e1fe9e265bcf39797a027b3493b16 You can find the final patch attached to bug #921612. Hopefully, this will solve this problem for good. peter So.. after the Peter pointed out in email thread with sysstat upstream that changing the boundary when the sysstat stores sa data files in directories could potentially lead to broken backward compatibility in some cases, I dropped that change from RHEL-6 version of sysstat. At the end, only this two upstream commits are backported in RHEL-6: https://github.com/sysstat/sysstat/commit/4b06da42237b0c864c1c802c90163c3c1b5db50b https://github.com/sysstat/sysstat/commit/0aced09880c724952a02b86fdaf142b01e1bc988 Also, default history value was also reverted back to 28 days. peter |