121970 – Performance severely impacted when LAuS audit enabled

Bug 121970 - Performance severely impacted when LAuS audit enabled

Summary: Performance severely impacted when LAuS audit enabled

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	laus
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Charlie Bennett
QA Contact:	Jay Turner
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	123372 (view as bug list)
Depends On:
Blocks:	123574
TreeView+	depends on / blocked

Reported:	2004-04-29 13:05 UTC by Peggy Proffitt
Modified:	2015-01-08 00:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:	RHEL3U3
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-06 17:31:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
gzipped tar of audit configuration files (33.29 KB, application/x-gzip-compressed) 2004-04-29 13:08 UTC, Peggy Proffitt	no flags	Details
Sample testcase and output of test with and without audit (2.74 KB, text/plain) 2004-04-29 13:48 UTC, Peggy Proffitt	no flags	Details
View All

Description Peggy Proffitt 2004-04-29 13:05:26 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; 
T312461)

Description of problem:
We recently applied the updated kernel and laus RPMs suggested as a 
response to our bug 121459. Now that the system is stable, we've 
begun to do some testing. 

When we run a regression test for some of our software with auditing 
turned off the test completes in approximately 4 minutes. When we run 
the same test with auditing enabled, the test takes well over an hour 
to complete. The system is spending a lot of time in iowait. The top 
command shows that the CPU state for auditd is "D" - uninterruptable 
sleep (usually I/O wait). Is this a known problem? We plan to update 
our systems with RHEL 3 Update 2 shortly after the official release 
of the product because our NASA security policies require the audit 
function, but the current performance impact of LAuS is not 
acceptable.

My audit configuration was included in bug 121459. I will attach the 
same group of files to this bug report.

The regression test that we run creates and deletes a large number of 
files. It is a real test of software, not intentionally designed to 
stress auditing. We can probably provide the regression test, or 
perhaps a similar test if needed.

Version-Release number of selected component (if applicable):
laus-0.1-54RHEL3, kernel-smp-2.4.21-14.EL

How reproducible:
Always

Steps to Reproduce:
We can provide a test if required. It currently runs under a specific 
user path.
    

Additional info:

Comment 1 Peggy Proffitt 2004-04-29 13:08:38 UTC

Created attachment 99772 [details]
gzipped tar of audit configuration files

This relevant files in this attachment are the audit configuration files
(audit.conf, filter.conf, filesets.conf)

Comment 2 Peggy Proffitt 2004-04-29 13:48:48 UTC

Created attachment 99774 [details]
Sample testcase and output of test with and without audit

As a simpler test than our original test case, I ran the script "testit" with
auditing enabled and again with auditing disabled. The attachment shows the
script, as well as the resulting run time with audit enabled, then disabled.

Comment 5 Peter Martuccelli 2004-05-19 16:33:15 UTC

As an example for a ten fold increase you would issue the following
command from the bash prompt -

echo 10240 > /proc/sys/dev/audit/max-messages

After you issue the command you can cat the contents, i.e cat
/proc/sys/dev/audit/max-messages, to verify your change before you run
your test(s).

Comment 6 Chris Runge 2004-05-19 17:42:38 UTC

*** Bug 123372 has been marked as a duplicate of this bug. ***

Comment 7 Charlie Bennett 2004-05-19 18:00:46 UTC

I've run the testit script in a number of scenarios.

There is some improvement using larger kernel message buffers but that
clearly won't scale on a busy, busy system.

One thing that I've noticed is that auditd does an fsync() after every
single audit record is written.  This is very safe and conservative. 
It's also possible to turn this behavior off in /etc/audit/audit.conf.
 When I do so I get dramatically better real-time results on the test
load:

fsync = yes                 fsync=no               audit disabled
real:    3m12.907s          0m10.132s              0m09.420s
user:    0m00.870s          0m01.810s              0m00.250s
sys:     0m03.020s          0m06.080s              0m01.640s

There is a risk of losing records left in the buffer cache on
a system crash.  Is this a risk you're willing to take?

ccb

Comment 8 Peggy Proffitt 2004-05-19 21:32:58 UTC

I've retested one of our regression tests with sync set to no. The 
performance is significantly better for our test case as well.

If the audit daemon does not do an fsync after each record is 
written, roughly how many records might be left in the buffer cache 
if a system crashes? I'd like to better understand the risk before we 
decide whether to change the sync setting.

Could we talk about the sync option and its impacts during the 
conference call tomorrow?

Comment 9 Charlie Bennett 2004-05-20 15:25:36 UTC

One thing it might help to review is the section on bdflush parameters
in section 2.4 of /usr/src/linux-2.4/Documentation/filesystems/proc.txt.

These control the execution of the bdflush and kupdated kernel
threads, allowing you to specify how much of the buffer cache can be
dirty before flushing buffers out, how old buffers have to be to be
automatic candidates for flushing, etc.

The long and short of it is that a system crash will likely cause the
loss of audit records.  When the sync parameter is ON, you can loose
as many as max-messages records waiting for auditd to copy them to
user space and push them back into the kernel for write.  When the
sync parameter is OFF they could either be in the kernel audit record
buffer or in the filesystem buffer cache.

Perhaps we'll get suitable performance by turning of sync and tuning
the buffer cache so that the lossage is bounded by some predetermined
upper limit.

Comment 10 Peggy Proffitt 2004-05-21 16:44:37 UTC

Could you provide some documentation about the /etc/pam.d file 
changes required to enable audit?

Also, have you received approval to modify the sync parameter to 
support intermittent syncs to disk (based on record count)?

Comment 11 Charlie Bennett 2004-05-21 23:31:29 UTC

Let me answer the second of these first.  I have approval to modify
the sync parameter *and* approval to set the default to "no".  It is
not necessary to have synchronous writes to the audit log to pass
EAL3/CAPP certification.  That being said, intermittent sync is in our
EAL3 certification copy of laus.  The Evaluated Configuration Guide
states that synchronous writes are available at a substantial penalty
to performance.  I'm running with "sync-after = 20" and it's great.

I've opened a bugzilla (123955) to cover the documentation inadaquacies.

Note You need to log in before you can comment on or make changes to this bug.