Bug 999756 - audit log queue too small causing messages to be lost before auditd starts
audit log queue too small causing messages to be lost before auditd starts
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Richard Guy Briggs
Fedora Extras Quality Assurance
: FutureFeature, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-22 00:05 EDT by Richard Guy Briggs
Modified: 2014-01-23 11:17 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-23 11:17:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard Guy Briggs 2013-08-22 00:05:27 EDT
Description of problem:
The default audit_backlog_limit is 64.  This was a reasonable limit at one time.

systemd causes so much audit queue activity on startup that auditd doesn't
start before the hold queue has already overflowed.  On a system with audit=
not set on the kernel command line, this isn't an issue since that history
isn't kept for auditd when it is available.  On a system with audit=1 set on
the kernel command line, kaudit tries to keep that history until auditd is able
to drain the queue.

On a stock install of fedora 18 it was observed that with the defaults, about
180 messages were dropped.


Version-Release number of selected component (if applicable):
Fedora 18 with kernel vmlinuz-3.7.2-201.fc18.x86_64


How reproducible:
It overflows by 180 +/-5


Steps to Reproduce:
1. Edit grub kernel boot line to add "audit=1"
2. Boot
3. Watch for "backlog limit exceeded" in /var/log/messages

Actual results:
Audit log messages are lost before auditd is able to start and drain the queue.


Expected results:
Auditd starts up early enough that it is able to drain the queue before it overflows.


Additional info:

One way to solve this would be to bump the default hold queue size to 512 to avoid losing
any messages before auditd is able to consume them.  This would not be helpful to the embedded community and might not be sufficient in some situations.

Another way to solve it might be to add a kconfig option to set the default based on the system type.  An embedded system would get the current (or smaller) default, while Workstations might get 320 and servers might get more.

This default can still be changed by the "-b" option in audit.rules once the system has booted, but won't help with lost messages on boot.

None of these solutions helps if a system's compiled default is too small to see the lost messages without compiling a new kernel.

Recommend the minimum of adding a kernel boot parameter (audit already has one to enable/disable it) such as "audit_queue_len=<n>" that would override the default to allow the system administrator to set the queue length.
Comment 1 Josh Boyer 2013-08-22 09:13:25 EDT
3.7.2-201 is a rather old kernel.  F18 is on 3.10.7 now, with 3.10.9 submitted for updates-testing.  While it might not matter, you should probably focus on the latest kernel in each release, and I'd recommend fixing this upstream first (and getting it into rawhide that way).
Comment 2 Richard Guy Briggs 2013-08-22 09:36:01 EDT
(In reply to Josh Boyer from comment #1)
> 3.7.2-201 is a rather old kernel.  F18 is on 3.10.7 now, with 3.10.9
> submitted for updates-testing.  While it might not matter, you should
> probably focus on the latest kernel in each release, and I'd recommend
> fixing this upstream first (and getting it into rawhide that way).

Agreed.  I'm patching in upstream.
Comment 3 Richard Guy Briggs 2013-09-18 18:31:57 EDT
Patch posted upstream as part of patchset to address bz990806
    https://lkml.org/lkml/2013/9/18/477
Comment 4 Richard Guy Briggs 2013-09-18 18:34:39 EDT
oops, forgot to add other list link:
    https://www.redhat.com/archives/linux-audit/2013-September/msg00030.html
Comment 5 Justin M. Forbes 2013-10-18 17:03:53 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.
Comment 6 Richard Guy Briggs 2013-10-21 11:30:37 EDT
Issue is still present in f18 and f19 (and upstream).
Comment 7 Josh Boyer 2013-10-21 11:35:41 EDT
Moving to rawhide and setting keywords so we don't auto-needinfo this.
Comment 8 Richard Guy Briggs 2014-01-23 11:17:43 EST
The following upstream patches mostly address this issue:

40c0775 audit: allow unlimited backlog queue
51cc83f audit: add audit_backlog_wait_time configuration option
f910fde audit: add kernel set-up parameter to override default backlog limit
7ecf69b audit: efficiency fix 2: request exclusive wait since all need same resource
db89731 audit: efficiency fix 1: only wake up if queue shorter than backlog limit
ae887e0 audit: make use of remaining sleep time from wait_for_auditd
e789e56 audit: reset audit backlog wait time after error recovery

One obvious remaining optimization is to start auditd earlier, but this is outside of the scope of the kernel.

Note You need to log in before you can comment on or make changes to this bug.