Description of problem: The default audit_backlog_limit is 64. This was a reasonable limit at one time. systemd causes so much audit queue activity on startup that auditd doesn't start before the hold queue has already overflowed. On a system with audit= not set on the kernel command line, this isn't an issue since that history isn't kept for auditd when it is available. On a system with audit=1 set on the kernel command line, kaudit tries to keep that history until auditd is able to drain the queue. On a stock install of fedora 18 it was observed that with the defaults, about 180 messages were dropped. Version-Release number of selected component (if applicable): Fedora 18 with kernel vmlinuz-3.7.2-201.fc18.x86_64 How reproducible: It overflows by 180 +/-5 Steps to Reproduce: 1. Edit grub kernel boot line to add "audit=1" 2. Boot 3. Watch for "backlog limit exceeded" in /var/log/messages Actual results: Audit log messages are lost before auditd is able to start and drain the queue. Expected results: Auditd starts up early enough that it is able to drain the queue before it overflows. Additional info: One way to solve this would be to bump the default hold queue size to 512 to avoid losing any messages before auditd is able to consume them. This would not be helpful to the embedded community and might not be sufficient in some situations. Another way to solve it might be to add a kconfig option to set the default based on the system type. An embedded system would get the current (or smaller) default, while Workstations might get 320 and servers might get more. This default can still be changed by the "-b" option in audit.rules once the system has booted, but won't help with lost messages on boot. None of these solutions helps if a system's compiled default is too small to see the lost messages without compiling a new kernel. Recommend the minimum of adding a kernel boot parameter (audit already has one to enable/disable it) such as "audit_queue_len=<n>" that would override the default to allow the system administrator to set the queue length.
3.7.2-201 is a rather old kernel. F18 is on 3.10.7 now, with 3.10.9 submitted for updates-testing. While it might not matter, you should probably focus on the latest kernel in each release, and I'd recommend fixing this upstream first (and getting it into rawhide that way).
(In reply to Josh Boyer from comment #1) > 3.7.2-201 is a rather old kernel. F18 is on 3.10.7 now, with 3.10.9 > submitted for updates-testing. While it might not matter, you should > probably focus on the latest kernel in each release, and I'd recommend > fixing this upstream first (and getting it into rawhide that way). Agreed. I'm patching in upstream.
Patch posted upstream as part of patchset to address bz990806 https://lkml.org/lkml/2013/9/18/477
oops, forgot to add other list link: https://www.redhat.com/archives/linux-audit/2013-September/msg00030.html
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.
Issue is still present in f18 and f19 (and upstream).
Moving to rawhide and setting keywords so we don't auto-needinfo this.
The following upstream patches mostly address this issue: 40c0775 audit: allow unlimited backlog queue 51cc83f audit: add audit_backlog_wait_time configuration option f910fde audit: add kernel set-up parameter to override default backlog limit 7ecf69b audit: efficiency fix 2: request exclusive wait since all need same resource db89731 audit: efficiency fix 1: only wake up if queue shorter than backlog limit ae887e0 audit: make use of remaining sleep time from wait_for_auditd e789e56 audit: reset audit backlog wait time after error recovery One obvious remaining optimization is to start auditd earlier, but this is outside of the scope of the kernel.