Bug 121970
Summary: | Performance severely impacted when LAuS audit enabled | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Peggy Proffitt <peggy.proffitt> | ||||||
Component: | laus | Assignee: | Charlie Bennett <ccb> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jay Turner <jturner> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.0 | CC: | fenlason, k.georgiou, laroche, peterm, srevivo, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHEL3U3 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-12-06 17:31:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 123574 | ||||||||
Attachments: |
|
Description
Peggy Proffitt
2004-04-29 13:05:26 UTC
Created attachment 99772 [details]
gzipped tar of audit configuration files
This relevant files in this attachment are the audit configuration files
(audit.conf, filter.conf, filesets.conf)
Created attachment 99774 [details]
Sample testcase and output of test with and without audit
As a simpler test than our original test case, I ran the script "testit" with
auditing enabled and again with auditing disabled. The attachment shows the
script, as well as the resulting run time with audit enabled, then disabled.
As an example for a ten fold increase you would issue the following command from the bash prompt - echo 10240 > /proc/sys/dev/audit/max-messages After you issue the command you can cat the contents, i.e cat /proc/sys/dev/audit/max-messages, to verify your change before you run your test(s). *** Bug 123372 has been marked as a duplicate of this bug. *** I've run the testit script in a number of scenarios. There is some improvement using larger kernel message buffers but that clearly won't scale on a busy, busy system. One thing that I've noticed is that auditd does an fsync() after every single audit record is written. This is very safe and conservative. It's also possible to turn this behavior off in /etc/audit/audit.conf. When I do so I get dramatically better real-time results on the test load: fsync = yes fsync=no audit disabled real: 3m12.907s 0m10.132s 0m09.420s user: 0m00.870s 0m01.810s 0m00.250s sys: 0m03.020s 0m06.080s 0m01.640s There is a risk of losing records left in the buffer cache on a system crash. Is this a risk you're willing to take? ccb I've retested one of our regression tests with sync set to no. The performance is significantly better for our test case as well. If the audit daemon does not do an fsync after each record is written, roughly how many records might be left in the buffer cache if a system crashes? I'd like to better understand the risk before we decide whether to change the sync setting. Could we talk about the sync option and its impacts during the conference call tomorrow? One thing it might help to review is the section on bdflush parameters in section 2.4 of /usr/src/linux-2.4/Documentation/filesystems/proc.txt. These control the execution of the bdflush and kupdated kernel threads, allowing you to specify how much of the buffer cache can be dirty before flushing buffers out, how old buffers have to be to be automatic candidates for flushing, etc. The long and short of it is that a system crash will likely cause the loss of audit records. When the sync parameter is ON, you can loose as many as max-messages records waiting for auditd to copy them to user space and push them back into the kernel for write. When the sync parameter is OFF they could either be in the kernel audit record buffer or in the filesystem buffer cache. Perhaps we'll get suitable performance by turning of sync and tuning the buffer cache so that the lossage is bounded by some predetermined upper limit. Could you provide some documentation about the /etc/pam.d file changes required to enable audit? Also, have you received approval to modify the sync parameter to support intermittent syncs to disk (based on record count)? Let me answer the second of these first. I have approval to modify the sync parameter *and* approval to set the default to "no". It is not necessary to have synchronous writes to the audit log to pass EAL3/CAPP certification. That being said, intermittent sync is in our EAL3 certification copy of laus. The Evaluated Configuration Guide states that synchronous writes are available at a substantial penalty to performance. I'm running with "sync-after = 20" and it's great. I've opened a bugzilla (123955) to cover the documentation inadaquacies. |