Red Hat Bugzilla – Bug 121970
Performance severely impacted when LAuS audit enabled
Last modified: 2015-01-07 19:07:46 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;
Description of problem:
We recently applied the updated kernel and laus RPMs suggested as a
response to our bug 121459. Now that the system is stable, we've
begun to do some testing.
When we run a regression test for some of our software with auditing
turned off the test completes in approximately 4 minutes. When we run
the same test with auditing enabled, the test takes well over an hour
to complete. The system is spending a lot of time in iowait. The top
command shows that the CPU state for auditd is "D" - uninterruptable
sleep (usually I/O wait). Is this a known problem? We plan to update
our systems with RHEL 3 Update 2 shortly after the official release
of the product because our NASA security policies require the audit
function, but the current performance impact of LAuS is not
My audit configuration was included in bug 121459. I will attach the
same group of files to this bug report.
The regression test that we run creates and deletes a large number of
files. It is a real test of software, not intentionally designed to
stress auditing. We can probably provide the regression test, or
perhaps a similar test if needed.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
We can provide a test if required. It currently runs under a specific
Created attachment 99772 [details]
gzipped tar of audit configuration files
This relevant files in this attachment are the audit configuration files
(audit.conf, filter.conf, filesets.conf)
Created attachment 99774 [details]
Sample testcase and output of test with and without audit
As a simpler test than our original test case, I ran the script "testit" with
auditing enabled and again with auditing disabled. The attachment shows the
script, as well as the resulting run time with audit enabled, then disabled.
As an example for a ten fold increase you would issue the following
command from the bash prompt -
echo 10240 > /proc/sys/dev/audit/max-messages
After you issue the command you can cat the contents, i.e cat
/proc/sys/dev/audit/max-messages, to verify your change before you run
*** Bug 123372 has been marked as a duplicate of this bug. ***
I've run the testit script in a number of scenarios.
There is some improvement using larger kernel message buffers but that
clearly won't scale on a busy, busy system.
One thing that I've noticed is that auditd does an fsync() after every
single audit record is written. This is very safe and conservative.
It's also possible to turn this behavior off in /etc/audit/audit.conf.
When I do so I get dramatically better real-time results on the test
fsync = yes fsync=no audit disabled
real: 3m12.907s 0m10.132s 0m09.420s
user: 0m00.870s 0m01.810s 0m00.250s
sys: 0m03.020s 0m06.080s 0m01.640s
There is a risk of losing records left in the buffer cache on
a system crash. Is this a risk you're willing to take?
I've retested one of our regression tests with sync set to no. The
performance is significantly better for our test case as well.
If the audit daemon does not do an fsync after each record is
written, roughly how many records might be left in the buffer cache
if a system crashes? I'd like to better understand the risk before we
decide whether to change the sync setting.
Could we talk about the sync option and its impacts during the
conference call tomorrow?
One thing it might help to review is the section on bdflush parameters
in section 2.4 of /usr/src/linux-2.4/Documentation/filesystems/proc.txt.
These control the execution of the bdflush and kupdated kernel
threads, allowing you to specify how much of the buffer cache can be
dirty before flushing buffers out, how old buffers have to be to be
automatic candidates for flushing, etc.
The long and short of it is that a system crash will likely cause the
loss of audit records. When the sync parameter is ON, you can loose
as many as max-messages records waiting for auditd to copy them to
user space and push them back into the kernel for write. When the
sync parameter is OFF they could either be in the kernel audit record
buffer or in the filesystem buffer cache.
Perhaps we'll get suitable performance by turning of sync and tuning
the buffer cache so that the lossage is bounded by some predetermined
Could you provide some documentation about the /etc/pam.d file
changes required to enable audit?
Also, have you received approval to modify the sync parameter to
support intermittent syncs to disk (based on record count)?
Let me answer the second of these first. I have approval to modify
the sync parameter *and* approval to set the default to "no". It is
not necessary to have synchronous writes to the audit log to pass
EAL3/CAPP certification. That being said, intermittent sync is in our
EAL3 certification copy of laus. The Evaluated Configuration Guide
states that synchronous writes are available at a substantial penalty
to performance. I'm running with "sync-after = 20" and it's great.
I've opened a bugzilla (123955) to cover the documentation inadaquacies.