Bug 1177949 - No accounting records being written after boot
Summary: No accounting records being written after boot
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: psacct
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jaromír Cápík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-31 14:47 UTC by Dominique Brazziel
Modified: 2016-02-20 19:51 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
: 1240179 (view as bug list)
Environment:
Last Closed: 2015-12-02 06:41:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dominique Brazziel 2014-12-31 14:47:51 UTC
Description of problem:
Accounting file /var/account/pacct does not appear to be growing (i.e. no accounting records are being written) after boot.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Boot system, see good status of psacct.service (systemctl status psacct)
2. execute lastcomm, no records are listed
3. Verify that pacct is not growing (ls -l /var/account/pacct twice)

Actual results:
lastcomm fails to produce any results from the current boot 

Expected results:
lastcomm shows processes that have ended since accounting was turned on at boot time

Additional info:
I am able to 'kickstart' accounting by doing 'systemctl restart psacct', but
accton /var/account/pacct doesn't seem to do it.
I have substituted 'systemctl restart psacct' for the 'accton' command in
'/etc/logrotate.d/psacct' as a workaround.  FWIW restarting the service is
the behavior done in the cron.daily/acct job on Debian and it works well.

Comment 1 Dominique Brazziel 2014-12-31 15:31:51 UTC
Hmmm - I just went through the reproduce steps and didn't have the problem.  I think it might be OK if pacct exists, but the problem may only manifest itself
if pacct is created in the service 'ExecStartPre' step.   Also, I just upgraded 
to Fedora 21 and the machine hadn't been on for 11 days so there may have been some logrotate funny business happening.  

I will fiddle around with it for another day to let it go through the normal
logrotate cycles and see what happens.

Comment 2 Dominique Brazziel 2015-01-02 04:04:38 UTC
Just had the problem again, no records being written to pacct.  A side effect of this problem is that no process exit codes are available in 'atop' (no procacct) because no accounting records are available.  Now not even restarting psacct starts things up.  I briefly had success specifying account file pacct-20150101
in the 'accton' command, but now that won't go either.  

Next thing I will try is delete all of the files in the '/var/account' directory
and reboot.

Comment 3 Dominique Brazziel 2015-01-02 05:06:46 UTC
Started with empty '/var/account', psacct got created but sat empty (0 bytes) after ssh login and some ls commands.  systemctl restart psacct got things going 
and records are being written.  It almost seems like the file has to primed with
at least one record, maybe like a dummy header or something.

Comment 4 Dominique Brazziel 2015-01-03 20:07:37 UTC
Figured it out.  The problem was two-fold:  

1) psacct.service was disabled (don't know why/how it was being run at startup?  Maybe something to do with it being Type=oneshot and item two below?)

2) There was a conflict with the 'atopacctd' daemon from atop version 2.1, which 
was installed to fix a SIGFPE in atop 2.0.2.  Turns out the daemon likes to 
take control of turning on process accounting and moves records from pacct to a shadow file in the name of preserving disk space.  From 'man atopacctd':

"After process accounting is actived, the atopacctd daemon transfers every process accounting record that is available in the source file to a shadow file. Client processes (like atop processes) will read the shadow files instead of the process accounting source file.

 This approach has the following advantages:

o
    The atopacctd daemon takes care that the source file is kept to a limited size. As soon as its maximum size is reached, it is truncated to a size of zero again (this is not noticed by the atop processes)."

<* snip *>

I'm pretty sure psacct client processes (i.e. lastcomm) are unaware of atop V2.1 shadow files, and the atop/atopacctd turning process accounting on/off
was seriously not cool.  After disabling atopacct service (chkconfig atopacct off or whatever), and enabling the psacct service, all is well again;  I have psacct writing to pacct like it should (OK, it's growing fast, but better too much data than none at all, right?  It'll get rotated at midnight anyway).  I also have process exit status in atop.

Seems like there should be a conflict declared between atop 2.1 and psacct.  I don't know how this is handled with rpm packages/yum/DNF, but in Debian the control file contains information such as 'Conflicts:  atop >= 2.1'.

Comment 5 Dominique Brazziel 2015-01-06 20:07:12 UTC
Spoke too soon.  It's still broken, exact same problem.  It seems that when 
logrotate scripts, or whenever pacct is set to 0, accounting is broken.  When I copy 'pacct.1' to 'pacct' and restart psacct, records are written.

Comment 6 Dominique Brazziel 2015-01-19 01:06:09 UTC
I think the problem is the logrotate script ('/etc/logrotate.d/psacct') does not
stop accounting records before it rotates the logs.  After the logrotate script
ran last night I at 03:06 I was left pacct-20150118.gz (109934 bytes), pacct.1 (can't remember the number of bytes) and pacct (0 bytes), all with c/mtime of
03:06.  The system stayed up all night and today and I just checked them
just now.

Also, the whole idea of turning the accounting off and on outside of the purview
of the systemd psacct.service seems like it could be causing problems.  I'm 
going to add a 'prerotate' clause to the logrotate config file to stop
accounting with 'systemctl stop psacct.service' and change the 'postrotate' script to start it with 'systemctl start psacct.service'.

Comment 7 Dominique Brazziel 2015-01-20 13:03:19 UTC
I think I see how the logrotate script is messing up the account files, and this
may trigger further problems.  Testing with 'logrotate -vf /etc/logrotate.d/psacct' and 'ls -lrt /var/account/*' showed kernel was writing to the newly rotated pacct file ('pacct.1') while pacct stayed 0.  This is caused by logrotate
renaming the current accounting file before the new one is created, while the kernel still has an open file handle to the (now rotated) accounting file ('pacct.1').

I was able to get things going again by manually turning accounting off and then on a few times.  Adding a small loop to the logrotate config ('psacct') seems
to solve this particular problem:

# Logrotate file for psacct RPM

/var/account/pacct {
    compress
    delaycompress
    notifempty
    daily
    rotate 31
    #create 0600 root root
    create 0640 root adm
    postrotate
    #
    # Logrotate renames 'psacct' to 'psacct.1' before
    # the new file is created, causing kernel to keep
    # writing to 'pacct.1'. Keep telling kernel to 
    # turn off accounting and turn it on to write
    # to the the new file until it starts doing it.
    # (See BZ #1777949)
    #
    	if systemctl --quiet is-active psacct.service >/dev/null 2>&1; then
            ACCTSIZE1=0
            ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') 
            while [ $ACCTSIZE1 -eq $ACCTSIZE2 ];do
                /usr/sbin/accton off
                sleep 3
                /usr/sbin/accton /var/account/pacct
                ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') 
            done
       fi	
    endscript
}

Now to get back to testing the original problem from the title, which the logrotate issue may have directly or indirectly caused to some extent.

Comment 8 Dominique Brazziel 2015-01-21 13:52:27 UTC
Eliminated unnecessary assignment from new 'psacct' logrotate config:

# Logrotate file for psacct RPM

/var/account/pacct {
    compress
    delaycompress
    notifempty
    daily
    rotate 31
    #create 0600 root root
    create 0640 root adm
    postrotate
    #
    # Logrotate renames 'psacct' to 'psacct.1' before
    # the new file is created, causing kernel to keep
    # writing to 'pacct.1'. Keep telling kernel to 
    # turn off accounting and turn it on to write
    # to the the new file until it starts doing it.
    # (See BZ #1777949)
    #
    	if systemctl --quiet is-active psacct.service >/dev/null 2>&1; then
            ACCTSIZE=$(ls -l /var/account/pacct | awk '{print $5}') 
            while [ $ACCTSIZE -eq 0  ];do
                /usr/sbin/accton off
                sleep 3
                /usr/sbin/accton /var/account/pacct
                ACCTSIZE2=$(ls -l /var/account/pacct | awk '{print $5}') 
            done
       fi	
    endscript
}

The log rotation went fine this morning after a suspend/resume (~6:34am) cycle:

ls -lrt
total 5156
<* snip *> 
-rw-r----- 1 root adm  329605 Jan 20 03:06 pacct-20150120.gz
-rw-r----- 1 root adm 1871296 Jan 21 06:34 pacct-20150121
-rw-r----- 1 root adm 1040640 Jan 21 08:50 pacct

Comment 9 Dominique Brazziel 2015-01-25 01:13:15 UTC
sed 's/ACCTFILE2/ACCTFILE/'

Comment 10 Jaromír Cápík 2015-01-26 17:34:06 UTC
Hello Dominique.

Thanks for the report. For checking the file size it's recommended to use the following command:

stat --printf="%s" myfile.txt

Anyway. It would be wiser to have a discussion with the kernel guys first as I consider the above a dirty workaround only. We simply cannot turn the accounting off, even when it is just for 3 seconds. During that time we could drop important accounting data and that's worse than broken log rotation. This needs a reliable solution. Maybe it needs a new syscall to be added for releasing the old handle.

Comment 11 Jaromír Cápík 2015-01-26 17:53:18 UTC
Btw. I'm unable to reproduce your issue ... 

The logrotate daemon renames the pacct file and kernel still uses the old handle, but the accton call then releases the handle and kernel starts writing to the newly created empty file. To me it looks ok.

Comment 12 Jaromír Cápík 2015-01-26 17:56:33 UTC
s|daemon|job|

Comment 13 Fedora End Of Life 2015-11-04 13:27:55 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Dominique Brazziel 2015-11-05 22:15:35 UTC
OK, thanks for the explanation Jaromir.  Just a note that I am working on
slow machines (Pentium 4, Atom N270), so the YMMV rule applies in the ability to 
recreate the problem.

I am going to leave the wait in place until transition to Release 23, will revisit then.

Comment 15 Dominique Brazziel 2015-11-26 19:40:05 UTC
Installed Fedora 23 on 3 machines, latest was on a USB stick.  Testing on the USB stick shows the same problem, except now no records are written no matter how many times I stop and restart via psacct.service or accton / accton off.

Simple test using 'sudo accton /tmp/pacct' worked first time.  I don't know if it is a problem with write permissions, but I don't see why that would be the case.  The '/var/account' directory is owned by root:root and has permissions 755.  The pacct file gets created with mode 600.

I'll strace both cases and see what I can see.

Comment 16 Dominique Brazziel 2015-11-26 20:21:55 UTC
Disabled psacct service again.  It seems that whenever systemd gets involved with this process accounting things don't work.  

If I did a manual 'accton /var/account/pacct' accounting records started being written there.  If I subsequently started the psacct.service, records _stopped_ being written.  Once I stopped psaact.service and manually issued two accton commands the accounting records writes resumed.  

I have changed the logrotate loop clause to not turn accounting off, but instead
just keep turning it on until I get non-zero pacct size:

ACCTSIZE=$(stat --printf="%s" /var/account/pacct) 
while [ $ACCTSIZE -eq 0  ];do
   sleep 1
   /usr/sbin/accton /var/account/pacct
   ACCTSIZE=$(stat --printf="%s" /var/account/pacct) 
done

Comment 17 Fedora End Of Life 2015-12-02 06:41:45 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 18 Jaromír Cápík 2015-12-04 14:13:32 UTC
Hello Dominique.

I'm really unable to reproduce the described behaviour here. If you're able to reproduce on a regular basis, maybe you could share step by step instructions including guest OS installation in virt-manager. Or even better share a qemu/kvm image where you're able to reproduce the issue. Maybe it's just some kind of misunderstanding.

Comment 19 Dominique Brazziel 2016-02-20 15:23:53 UTC
It's bad now.  I can't get accounting to write to '/var/account' or any other
directory in the root partition.  It will work to write to '/tmp' but those files
can't be rotated.  

This may have something to do with the root filesystem being almost full, I'm not sure.

df -h
<* snip *>
/dev/mapper/fedora-root  9.8G  8.9G  369M  97% /

I have two other Fedora machines that are working (with the loop I put in), so this is baffling as well as annoying, because this latest problem is happening on my production machine.  

I am pretty sure something is happening at the kernel level, I will see if I can find out more by looking at the 'acct' call (man 2 acct).

Comment 20 Dominique Brazziel 2016-02-20 19:37:49 UTC
It was indeed a matter of acct refusing to write records to the logical volume.
After extending it another 5 GB,

sudo lvextend -L+5G /dev/fedora/root
  Size of logical volume fedora/root changed from 10.00 GiB (2560 extents) to 15.00 GiB (3840 extents).
  Logical volume root successfully resized.

then extending the underlying filesystem,

sudo resize2fs /dev/fedora/root
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/fedora/root is mounted on /; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/fedora/root is now 3932160 (4k) blocks long.

df -h
<* snip *>
/dev/mapper/fedora-root   15G  8.9G  5.1G  64% /

processing accounting starts writing records.

/var/account$ ls -l
total 8
4 drwxr-xr-x.  2 root root 4096 Feb 20 09:50 .
4 drwxr-xr-x. 22 root root 4096 Feb 20 12:46 ..
0 -rw-r-----   1 root adm     0 Feb 20 09:50 pacct
  
/var/account$ sudo accton /var/account/pacct
Turning on process accounting, file set to '/var/account/pacct'.

/var/account$ ls -l
total 12
4 drwxr-xr-x.  2 root root 4096 Feb 20 09:50 .
4 drwxr-xr-x. 22 root root 4096 Feb 20 12:46 ..
4 -rw-r-----   1 root adm   320 Feb 20 14:27 pacct

Comment 21 Dominique Brazziel 2016-02-20 19:51:54 UTC
From 'man 5 proc':

/proc/sys/kernel/acct
              This  file  contains  three  numbers: highwater, lowwater, and frequency.  If BSD-style process accounting is enabled, these
              values control its behavior.  If free space on filesystem where the log lives goes below lowwater percent,  accounting  sus‐
              pends.   If  free  space gets above highwater percent, accounting resumes.  frequency determines how often the kernel checks
              the amount of free space (value is in seconds).  Default values are 4, 2 and 30.  That is, suspend accounting if 2% or  less
              space is free; resume it if 4% or more space is free; consider information about amount of free space valid for 30 seconds.

I think this might explain this problem, as at one point in the past I did a LVM resize of root, shrinking it to make more room for the home LV.  The root resize wasn't completed correctly so the root LV was always operating within a thin margin of getting to the low volume percentage.  Once the log rotation was done and old files were deleted, the disk space would go back over the high water mark and accounting records would resume being written.


Note You need to log in before you can comment on or make changes to this bug.