Bug 121970

Summary: Performance severely impacted when LAuS audit enabled
Product: Red Hat Enterprise Linux 3 Reporter: Peggy Proffitt <peggy.proffitt>
Component: lausAssignee: Charlie Bennett <ccb>
Status: CLOSED CURRENTRELEASE QA Contact: Jay Turner <jturner>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: fenlason, k.georgiou, laroche, peterm, srevivo, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL3U3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-06 17:31:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 123574    
Attachments:
Description Flags
gzipped tar of audit configuration files
none
Sample testcase and output of test with and without audit none

Description Peggy Proffitt 2004-04-29 13:05:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
T312461)

Description of problem:
We recently applied the updated kernel and laus RPMs suggested as a 
response to our bug 121459. Now that the system is stable, we've 
begun to do some testing. 

When we run a regression test for some of our software with auditing 
turned off the test completes in approximately 4 minutes. When we run 
the same test with auditing enabled, the test takes well over an hour 
to complete. The system is spending a lot of time in iowait. The top 
command shows that the CPU state for auditd is "D" - uninterruptable 
sleep (usually I/O wait). Is this a known problem? We plan to update 
our systems with RHEL 3 Update 2 shortly after the official release 
of the product because our NASA security policies require the audit 
function, but the current performance impact of LAuS is not 
acceptable.

My audit configuration was included in bug 121459. I will attach the 
same group of files to this bug report.

The regression test that we run creates and deletes a large number of 
files. It is a real test of software, not intentionally designed to 
stress auditing. We can probably provide the regression test, or 
perhaps a similar test if needed.

Version-Release number of selected component (if applicable):
laus-0.1-54RHEL3, kernel-smp-2.4.21-14.EL

How reproducible:
Always

Steps to Reproduce:
We can provide a test if required. It currently runs under a specific 
user path.
    

Additional info:

Comment 1 Peggy Proffitt 2004-04-29 13:08:38 UTC
Created attachment 99772 [details]
gzipped tar of audit configuration files

This relevant files in this attachment are the audit configuration files
(audit.conf, filter.conf, filesets.conf)

Comment 2 Peggy Proffitt 2004-04-29 13:48:48 UTC
Created attachment 99774 [details]
Sample testcase and output of test with and without audit

As a simpler test than our original test case, I ran the script "testit" with
auditing enabled and again with auditing disabled. The attachment shows the
script, as well as the resulting run time with audit enabled, then disabled.

Comment 5 Peter Martuccelli 2004-05-19 16:33:15 UTC
As an example for a ten fold increase you would issue the following
command from the bash prompt -

echo 10240 > /proc/sys/dev/audit/max-messages

After you issue the command you can cat the contents, i.e cat
/proc/sys/dev/audit/max-messages, to verify your change before you run
your test(s).

Comment 6 Chris Runge 2004-05-19 17:42:38 UTC
*** Bug 123372 has been marked as a duplicate of this bug. ***

Comment 7 Charlie Bennett 2004-05-19 18:00:46 UTC
I've run the testit script in a number of scenarios.

There is some improvement using larger kernel message buffers but that
clearly won't scale on a busy, busy system.

One thing that I've noticed is that auditd does an fsync() after every
single audit record is written.  This is very safe and conservative. 
It's also possible to turn this behavior off in /etc/audit/audit.conf.
 When I do so I get dramatically better real-time results on the test
load:

fsync = yes                 fsync=no               audit disabled
real:    3m12.907s          0m10.132s              0m09.420s
user:    0m00.870s          0m01.810s              0m00.250s
sys:     0m03.020s          0m06.080s              0m01.640s

There is a risk of losing records left in the buffer cache on
a system crash.  Is this a risk you're willing to take?

ccb

Comment 8 Peggy Proffitt 2004-05-19 21:32:58 UTC
I've retested one of our regression tests with sync set to no. The 
performance is significantly better for our test case as well.

If the audit daemon does not do an fsync after each record is 
written, roughly how many records might be left in the buffer cache 
if a system crashes? I'd like to better understand the risk before we 
decide whether to change the sync setting.

Could we talk about the sync option and its impacts during the 
conference call tomorrow?

Comment 9 Charlie Bennett 2004-05-20 15:25:36 UTC
One thing it might help to review is the section on bdflush parameters
in section 2.4 of /usr/src/linux-2.4/Documentation/filesystems/proc.txt.

These control the execution of the bdflush and kupdated kernel
threads, allowing you to specify how much of the buffer cache can be
dirty before flushing buffers out, how old buffers have to be to be
automatic candidates for flushing, etc.

The long and short of it is that a system crash will likely cause the
loss of audit records.  When the sync parameter is ON, you can loose
as many as max-messages records waiting for auditd to copy them to
user space and push them back into the kernel for write.  When the
sync parameter is OFF they could either be in the kernel audit record
buffer or in the filesystem buffer cache.

Perhaps we'll get suitable performance by turning of sync and tuning
the buffer cache so that the lossage is bounded by some predetermined
upper limit.


Comment 10 Peggy Proffitt 2004-05-21 16:44:37 UTC
Could you provide some documentation about the /etc/pam.d file 
changes required to enable audit?

Also, have you received approval to modify the sync parameter to 
support intermittent syncs to disk (based on record count)?

Comment 11 Charlie Bennett 2004-05-21 23:31:29 UTC
Let me answer the second of these first.  I have approval to modify
the sync parameter *and* approval to set the default to "no".  It is
not necessary to have synchronous writes to the audit log to pass
EAL3/CAPP certification.  That being said, intermittent sync is in our
EAL3 certification copy of laus.  The Evaluated Configuration Guide
states that synchronous writes are available at a substantial penalty
to performance.  I'm running with "sync-after = 20" and it's great.

I've opened a bugzilla (123955) to cover the documentation inadaquacies.