Description of problem: Accounting file /var/account/pacct does not appear to be growing (i.e. no accounting records are being written) after boot. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Boot system, see good status of psacct.service (systemctl status psacct) 2. execute lastcomm, no records are listed 3. Verify that pacct is not growing (ls -l /var/account/pacct twice) Actual results: lastcomm fails to produce any results from the current boot Expected results: lastcomm shows processes that have ended since accounting was turned on at boot time Additional info: I am able to 'kickstart' accounting by doing 'systemctl restart psacct', but accton /var/account/pacct doesn't seem to do it. I have substituted 'systemctl restart psacct' for the 'accton' command in '/etc/logrotate.d/psacct' as a workaround. FWIW restarting the service is the behavior done in the cron.daily/acct job on Debian and it works well.
Hmmm - I just went through the reproduce steps and didn't have the problem. I think it might be OK if pacct exists, but the problem may only manifest itself if pacct is created in the service 'ExecStartPre' step. Also, I just upgraded to Fedora 21 and the machine hadn't been on for 11 days so there may have been some logrotate funny business happening. I will fiddle around with it for another day to let it go through the normal logrotate cycles and see what happens.
Just had the problem again, no records being written to pacct. A side effect of this problem is that no process exit codes are available in 'atop' (no procacct) because no accounting records are available. Now not even restarting psacct starts things up. I briefly had success specifying account file pacct-20150101 in the 'accton' command, but now that won't go either. Next thing I will try is delete all of the files in the '/var/account' directory and reboot.
Started with empty '/var/account', psacct got created but sat empty (0 bytes) after ssh login and some ls commands. systemctl restart psacct got things going and records are being written. It almost seems like the file has to primed with at least one record, maybe like a dummy header or something.
Figured it out. The problem was two-fold: 1) psacct.service was disabled (don't know why/how it was being run at startup? Maybe something to do with it being Type=oneshot and item two below?) 2) There was a conflict with the 'atopacctd' daemon from atop version 2.1, which was installed to fix a SIGFPE in atop 2.0.2. Turns out the daemon likes to take control of turning on process accounting and moves records from pacct to a shadow file in the name of preserving disk space. From 'man atopacctd': "After process accounting is actived, the atopacctd daemon transfers every process accounting record that is available in the source file to a shadow file. Client processes (like atop processes) will read the shadow files instead of the process accounting source file. This approach has the following advantages: o The atopacctd daemon takes care that the source file is kept to a limited size. As soon as its maximum size is reached, it is truncated to a size of zero again (this is not noticed by the atop processes)." <* snip *> I'm pretty sure psacct client processes (i.e. lastcomm) are unaware of atop V2.1 shadow files, and the atop/atopacctd turning process accounting on/off was seriously not cool. After disabling atopacct service (chkconfig atopacct off or whatever), and enabling the psacct service, all is well again; I have psacct writing to pacct like it should (OK, it's growing fast, but better too much data than none at all, right? It'll get rotated at midnight anyway). I also have process exit status in atop. Seems like there should be a conflict declared between atop 2.1 and psacct. I don't know how this is handled with rpm packages/yum/DNF, but in Debian the control file contains information such as 'Conflicts: atop >= 2.1'.
Spoke too soon. It's still broken, exact same problem. It seems that when logrotate scripts, or whenever pacct is set to 0, accounting is broken. When I copy 'pacct.1' to 'pacct' and restart psacct, records are written.
I think the problem is the logrotate script ('/etc/logrotate.d/psacct') does not stop accounting records before it rotates the logs. After the logrotate script ran last night I at 03:06 I was left pacct-20150118.gz (109934 bytes), pacct.1 (can't remember the number of bytes) and pacct (0 bytes), all with c/mtime of 03:06. The system stayed up all night and today and I just checked them just now. Also, the whole idea of turning the accounting off and on outside of the purview of the systemd psacct.service seems like it could be causing problems. I'm going to add a 'prerotate' clause to the logrotate config file to stop accounting with 'systemctl stop psacct.service' and change the 'postrotate' script to start it with 'systemctl start psacct.service'.
I think I see how the logrotate script is messing up the account files, and this may trigger further problems. Testing with 'logrotate -vf /etc/logrotate.d/psacct' and 'ls -lrt /var/account/*' showed kernel was writing to the newly rotated pacct file ('pacct.1') while pacct stayed 0. This is caused by logrotate renaming the current accounting file before the new one is created, while the kernel still has an open file handle to the (now rotated) accounting file ('pacct.1'). I was able to get things going again by manually turning accounting off and then on a few times. Adding a small loop to the logrotate config ('psacct') seems to solve this particular problem: # Logrotate file for psacct RPM /var/account/pacct { compress delaycompress notifempty daily rotate 31 #create 0600 root root create 0640 root adm postrotate # # Logrotate renames 'psacct' to 'psacct.1' before # the new file is created, causing kernel to keep # writing to 'pacct.1'. Keep telling kernel to # turn off accounting and turn it on to write # to the the new file until it starts doing it. # (See BZ #1777949) # if systemctl --quiet is-active psacct.service >/dev/null 2>&1; then ACCTSIZE1=0 ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') while [ $ACCTSIZE1 -eq $ACCTSIZE2 ];do /usr/sbin/accton off sleep 3 /usr/sbin/accton /var/account/pacct ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') done fi endscript } Now to get back to testing the original problem from the title, which the logrotate issue may have directly or indirectly caused to some extent.
Eliminated unnecessary assignment from new 'psacct' logrotate config: # Logrotate file for psacct RPM /var/account/pacct { compress delaycompress notifempty daily rotate 31 #create 0600 root root create 0640 root adm postrotate # # Logrotate renames 'psacct' to 'psacct.1' before # the new file is created, causing kernel to keep # writing to 'pacct.1'. Keep telling kernel to # turn off accounting and turn it on to write # to the the new file until it starts doing it. # (See BZ #1777949) # if systemctl --quiet is-active psacct.service >/dev/null 2>&1; then ACCTSIZE=$(ls -l /var/account/pacct | awk '{print $5}') while [ $ACCTSIZE -eq 0 ];do /usr/sbin/accton off sleep 3 /usr/sbin/accton /var/account/pacct ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') done fi endscript } The log rotation went fine this morning after a suspend/resume (~6:34am) cycle: ls -lrt total 5156 <* snip *> -rw-r----- 1 root adm 329605 Jan 20 03:06 pacct-20150120.gz -rw-r----- 1 root adm 1871296 Jan 21 06:34 pacct-20150121 -rw-r----- 1 root adm 1040640 Jan 21 08:50 pacct
sed 's/ACCTFILE2/ACCTFILE/'
Hello Dominique. Thanks for the report. For checking the file size it's recommended to use the following command: stat --printf="%s" myfile.txt Anyway. It would be wiser to have a discussion with the kernel guys first as I consider the above a dirty workaround only. We simply cannot turn the accounting off, even when it is just for 3 seconds. During that time we could drop important accounting data and that's worse than broken log rotation. This needs a reliable solution. Maybe it needs a new syscall to be added for releasing the old handle.
Btw. I'm unable to reproduce your issue ... The logrotate daemon renames the pacct file and kernel still uses the old handle, but the accton call then releases the handle and kernel starts writing to the newly created empty file. To me it looks ok.
s|daemon|job|
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
OK, thanks for the explanation Jaromir. Just a note that I am working on slow machines (Pentium 4, Atom N270), so the YMMV rule applies in the ability to recreate the problem. I am going to leave the wait in place until transition to Release 23, will revisit then.
Installed Fedora 23 on 3 machines, latest was on a USB stick. Testing on the USB stick shows the same problem, except now no records are written no matter how many times I stop and restart via psacct.service or accton / accton off. Simple test using 'sudo accton /tmp/pacct' worked first time. I don't know if it is a problem with write permissions, but I don't see why that would be the case. The '/var/account' directory is owned by root:root and has permissions 755. The pacct file gets created with mode 600. I'll strace both cases and see what I can see.
Disabled psacct service again. It seems that whenever systemd gets involved with this process accounting things don't work. If I did a manual 'accton /var/account/pacct' accounting records started being written there. If I subsequently started the psacct.service, records _stopped_ being written. Once I stopped psaact.service and manually issued two accton commands the accounting records writes resumed. I have changed the logrotate loop clause to not turn accounting off, but instead just keep turning it on until I get non-zero pacct size: ACCTSIZE=$(stat --printf="%s" /var/account/pacct) while [ $ACCTSIZE -eq 0 ];do sleep 1 /usr/sbin/accton /var/account/pacct ACCTSIZE=$(stat --printf="%s" /var/account/pacct) done
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Hello Dominique. I'm really unable to reproduce the described behaviour here. If you're able to reproduce on a regular basis, maybe you could share step by step instructions including guest OS installation in virt-manager. Or even better share a qemu/kvm image where you're able to reproduce the issue. Maybe it's just some kind of misunderstanding.
It's bad now. I can't get accounting to write to '/var/account' or any other directory in the root partition. It will work to write to '/tmp' but those files can't be rotated. This may have something to do with the root filesystem being almost full, I'm not sure. df -h <* snip *> /dev/mapper/fedora-root 9.8G 8.9G 369M 97% / I have two other Fedora machines that are working (with the loop I put in), so this is baffling as well as annoying, because this latest problem is happening on my production machine. I am pretty sure something is happening at the kernel level, I will see if I can find out more by looking at the 'acct' call (man 2 acct).
It was indeed a matter of acct refusing to write records to the logical volume. After extending it another 5 GB, sudo lvextend -L+5G /dev/fedora/root Size of logical volume fedora/root changed from 10.00 GiB (2560 extents) to 15.00 GiB (3840 extents). Logical volume root successfully resized. then extending the underlying filesystem, sudo resize2fs /dev/fedora/root resize2fs 1.42.13 (17-May-2015) Filesystem at /dev/fedora/root is mounted on /; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 1 The filesystem on /dev/fedora/root is now 3932160 (4k) blocks long. df -h <* snip *> /dev/mapper/fedora-root 15G 8.9G 5.1G 64% / processing accounting starts writing records. /var/account$ ls -l total 8 4 drwxr-xr-x. 2 root root 4096 Feb 20 09:50 . 4 drwxr-xr-x. 22 root root 4096 Feb 20 12:46 .. 0 -rw-r----- 1 root adm 0 Feb 20 09:50 pacct /var/account$ sudo accton /var/account/pacct Turning on process accounting, file set to '/var/account/pacct'. /var/account$ ls -l total 12 4 drwxr-xr-x. 2 root root 4096 Feb 20 09:50 . 4 drwxr-xr-x. 22 root root 4096 Feb 20 12:46 .. 4 -rw-r----- 1 root adm 320 Feb 20 14:27 pacct
From 'man 5 proc': /proc/sys/kernel/acct This file contains three numbers: highwater, lowwater, and frequency. If BSD-style process accounting is enabled, these values control its behavior. If free space on filesystem where the log lives goes below lowwater percent, accounting sus‐ pends. If free space gets above highwater percent, accounting resumes. frequency determines how often the kernel checks the amount of free space (value is in seconds). Default values are 4, 2 and 30. That is, suspend accounting if 2% or less space is free; resume it if 4% or more space is free; consider information about amount of free space valid for 30 seconds. I think this might explain this problem, as at one point in the past I did a LVM resize of root, shrinking it to make more room for the home LV. The root resize wasn't completed correctly so the root LV was always operating within a thin margin of getting to the low volume percentage. Once the log rotation was done and old files were deleted, the disk space would go back over the high water mark and accounting records would resume being written.