Description of problem: Recently one of our servers has started to show erratic behaviour. After 24-48 hrs services connected to the audit subsystem stops working. We cannot log in via ssh, crond stops performing its tasks etc. I was lucky to have a allready logged in root shell on console and checked what sshd and crond was up to, and they hang waiting for I/O to /dev/audit. If I stop auditd everything starts working again. I have started auditd with strace, and it just hangs forever on read from /dev/audit. I have turned on audit debugging (dev.audit.debug=1) and it says the following when I try to log in via SSH (there is probably some other processes involved in this output): Audit daemon registered (process 18620) auditf_ioctl: done, result=0 auditf_read: called. auditf_open: opened by pid 18627 auditf_ioctl: ctx=e2298bc0, cmd=0x801c406f auditf_ioctl: ctx=c4808480, cmd=0x4065 auditf_ioctl: ctx=c4808480, cmd=0x801c406f auditf_release: called. auditf_release: Audit daemon closed audit file; auditing disabled audit_resume: process 18620 resumes auditing auditf_ioctl: done, result=-19 auditf_release: called. auditf_ioctl: done, result=-19 auditf_ioctl: done, result=-19 auditf_ioctl: ctx=c4808480, cmd=0x801c406f auditf_ioctl: done, result=-19 auditf_ioctl: ctx=c4808480, cmd=0x801c406f auditf_ioctl: done, result=-19 auditf_ioctl: ctx=c4808480, cmd=0x4066 audit_detach: detaching process 18719 auditf_ioctl: done, result=-49 auditf_ioctl: ctx=c4808480, cmd=0x4065 auditf_ioctl: done, result=-19 It seems to me that stopping auditd also stops the audit system in the kernel, so I think the bug is in the kernel part of the audit system. Version-Release number of selected component (if applicable): kernel-2.4.21-40.EL laus-0.1-70RHEL3 How reproducible: Unknown
The problem could be occurring because auditd is finding that the amount of free space on the filesystem containing /var/log/audit.d/ is falling below the threshold specified in /etc/audit/audit.conf: notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20%"; and it is hence unable to rotate the /var/log/audit.d/bin* audit log files. When audit finds that free space falls below the -T threshold, it put the system into 'suspend mode' until the free space is equal to or greater than the threshold. Entering suspend mode is the default action to take when there is insufficient free disk space, as configured by the -T threshold, as configured in /etc/audit.conf: error { action { type = suspend; }; See the man-pages for auditd(8), audbin(1), and audit.conf(5). Do you see messages in /var/log/messages saying audit is entering suspend mode?: # egrep 'audbin|suspend' /var/log/messages If so, then the /var/log/audit.d/ disk space threshold being exceeded is the problem. Unless you require auditing, then turn it off - # chkconfig --level=123456 audit off ; reboot nothing else depends on audit being enabled, and this is the default for a clean RHEL-3 install post-U5. If you want to retain auditing, then you need to set up a mechanism to purge old rotated log files - see the '-T' and '-N' options in man audbin(1) - eg. to remove the oldest log files, set this in /etc/audit.conf: notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'rm -f %f'"; or to move them to a different partition: notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N 'mv -f %f /another_partition/'"; or to process them with a script that then removes them: notify = "/usr/sbin/audbin -S /var/log/audit.d/save.%u -C -T 20% -N '/bin/my_audit_log_rotation_script %f'"; If you do not see any 'audbin|suspend' messages in /var/log/messages, and the machine is still suspending, or if putting a log rotation mechanism in place does not fix the problem, then please re-open this bug and I'll investigate further - thanks.
Thank you for your thorough response! I am a bit surprised though that the default configuration of RedHat Linux is to make sure the Operator cannot operate the system as soon as it needs Operator attention.