Bug 193643

Summary: Audit system blocks, preventing associated services to work
Product: Red Hat Enterprise Linux 3 Reporter: Frode Nordahl <frode>
Component: lausAssignee: Jason Vas Dias <jvdias>
Status: CLOSED NOTABUG QA Contact: Jay Turner <jturner>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-31 15:23:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frode Nordahl 2006-05-31 11:13:36 UTC
Description of problem:
Recently one of our servers has started to show erratic behaviour. After 24-48 hrs services connected 
to the audit subsystem stops working. We cannot log in via ssh, crond stops performing its tasks etc.

I was lucky to have a allready logged in root shell on console and checked what sshd and crond was up 
to, and they hang waiting for I/O to /dev/audit.

If I stop auditd everything starts working again.

I have started auditd with strace, and it just hangs forever on read from /dev/audit.

I have turned on audit debugging (dev.audit.debug=1) and it says the following when I try to log in via 
SSH (there is probably some other processes involved in this output):
Audit daemon registered (process 18620)
auditf_ioctl: done, result=0
auditf_read: called.
auditf_open: opened by pid 18627
auditf_ioctl: ctx=e2298bc0, cmd=0x801c406f
auditf_ioctl: ctx=c4808480, cmd=0x4065
auditf_ioctl: ctx=c4808480, cmd=0x801c406f
auditf_release: called.
auditf_release: Audit daemon closed audit file; auditing disabled
audit_resume: process 18620 resumes auditing
auditf_ioctl: done, result=-19
auditf_release: called.
auditf_ioctl: done, result=-19
auditf_ioctl: done, result=-19
auditf_ioctl: ctx=c4808480, cmd=0x801c406f
auditf_ioctl: done, result=-19
auditf_ioctl: ctx=c4808480, cmd=0x801c406f
auditf_ioctl: done, result=-19
auditf_ioctl: ctx=c4808480, cmd=0x4066
audit_detach: detaching process 18719
auditf_ioctl: done, result=-49
auditf_ioctl: ctx=c4808480, cmd=0x4065
auditf_ioctl: done, result=-19



It seems to me that stopping auditd also stops the audit system in the kernel, so I think the bug is in 
the kernel part of the audit system.

Version-Release number of selected component (if applicable):
kernel-2.4.21-40.EL
laus-0.1-70RHEL3

How reproducible:
Unknown

Comment 1 Jason Vas Dias 2006-05-31 15:23:07 UTC
The problem could be occurring because auditd is finding that the amount of
free space on the filesystem containing /var/log/audit.d/ is falling below
the threshold specified in /etc/audit/audit.conf:
   notify          = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20%";
and it is hence unable to rotate the /var/log/audit.d/bin* audit log files.
When audit finds that free space falls below the -T threshold, it put the 
system into 'suspend mode' until the free space is equal to or greater than
the threshold. Entering suspend mode is the default action to take when there
is insufficient free disk space, as configured by the -T threshold, as 
configured in /etc/audit.conf:
         error {
                action {
                        type = suspend;
                };
See the man-pages for auditd(8), audbin(1), and audit.conf(5).

Do you see messages in /var/log/messages saying audit is entering suspend mode?:
# egrep 'audbin|suspend' /var/log/messages
If so, then the /var/log/audit.d/ disk space threshold being exceeded is the 
problem.

Unless you require auditing, then turn it off - 
# chkconfig --level=123456 audit off ; reboot
nothing else depends on audit being enabled, and this is the default for 
a clean RHEL-3 install post-U5.

If you want to retain auditing, then you need to set up a mechanism to purge
old rotated log files - see the '-T' and '-N' options in man audbin(1) - 
eg. to remove the oldest log files, set this in /etc/audit.conf:
   notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'rm -f %f'";
or to move them to a different partition:
   notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'mv -f %f
 /another_partition/'";
or to process them with a script that then removes them:
   notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N
'/bin/my_audit_log_rotation_script %f'";
 
If you do not see any 'audbin|suspend' messages in /var/log/messages, and
the machine is still suspending, or if putting a log rotation mechanism in
place does not fix the problem, then please re-open this bug and I'll 
investigate further - thanks.

Comment 2 Frode Nordahl 2006-06-02 09:20:27 UTC
Thank you for your thorough response!

I am a bit surprised though that the default configuration of RedHat Linux is to make sure the Operator 
cannot operate the system as soon as it needs Operator attention.