Red Hat Bugzilla – Bug 1345854
non-deterministic calls of check_space_left()
Last modified: 2016-11-04 02:13:32 EDT
Description of problem: With audit-2.5 we observed that space_left and admin_space_left actions are _sometimes_ performed only after auditd restart and not immediately when system is (nearly) running low of space. Behaviour was oddly non-deterministic, sometimes it worked just fine for hours and sometimes it was broken for hours. It was a regression from 2.4. Version-Release number of selected component (if applicable): audit-2.5.2-1.el7 How reproducible: Sometimes. Depends on size of audit events. Steps to Reproduce: 1. Configure admin_space to X and admin_space_action (syslog). 2. Fill partition containing /var/log/audit by audit events so that it has less than X free space. 3. Check /var/log/messages to see that action was triggered. Actual results: Action is sometimes not triggered. Expected results: Action is always triggered. Additional info: We tracked the issue to auditd-event.c write_to_log() function, its relevant part calling check_space_left() changed between 2.5 and 2.4 as follows: (2.5) 397 if (config->daemonize == D_BACKGROUND) { ... 403 log_size += rc; 404 check_log_file_size(); 405 // Keep loose tabs on the free space 406 if (rc%2 == 0) 407 check_space_left(); 408 } (2.4) 427 /* check log file size & space left on partition */ 428 if (config->daemonize == D_BACKGROUND) { ... 434 log_size += rc; 435 check_log_file_size(data); 436 check_space_left(data->log_fd, data); 437 } Where rc = fprintf(log_file, "%s\n", buf). I am not sure if it is the safe approach to condition check_space_left() calls based either on rc or log_size. It might happen that rc/log_size is odd number during a long sequence of write_to_log() calls. In that case a system might be already well below (admin_)space limit before (admin_)space_action is triggered.
Is (rc%3 > 2) more deterministic? The idea here is to try to avoid calling fstatfs all the time.
(In reply to Steve Grubb from comment #2) > Is (rc%3 > 2) more deterministic? The idea here is to try to avoid calling > fstatfs all the time. Well, conditioning check_space_left() calls by event length might go wrong when, for instance, a bunch of similar events are generated (with the same length). I understand your point of saving fstatfs calls. The best would be to condition by serial number of an event to make 100% sure that check_space_left() is called at least once during a sequence of events.
This is the proposed fix: https://fedorahosted.org/audit/changeset/1266
(In reply to Steve Grubb from comment #4) > This is the proposed fix: > https://fedorahosted.org/audit/changeset/1266 Thanks Steve, it looks better now. There is still a very rare case when the function might not be called but a probability it might be hit in the production environment is very close to zero (FTR: I noticed that a typo in the patch was already corrected upstream).
Verified by Common Criteria audit-test on all architectures except ppc64 where results are currently not available, results from x86_64 are as follow: NEW (audit-2.6.5-2.el7) ======================= Bucket: fail-safe Started: Thu Aug 4 18:19:57 CEST 2016 Kernel: 3.10.0-483.el7.x86_64 Architecture: x86_64 Mode: 64 Hostname: cc-v0b.lab.eng.brq.redhat.com Profile: capp selinux-policy version: selinux-policy-3.13.1-92.el7.noarch SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 28 [0] pam_loginuid success PASS [1] pam_loginuid fail PASS [2] max_log_file keep_logs PASS [3] max_log_file rotate PASS [4] max_log_file suspend PASS [5] max_log_file syslog PASS [6] admin_space_left halt PASS [7] admin_space_left single PASS [8] admin_space_left suspend PASS [9] admin_space_left syslog PASS [10] space_left halt PASS [11] space_left single PASS [12] space_left suspend PASS [13] space_left syslog PASS [14] disk_full_action halt PASS [15] disk_full_action single PASS [16] disk_full_action suspend PASS [17] disk_full_action syslog PASS [18] admin_space_left email PASS [19] space_left email PASS 20 pass (100%) 0 fail (0%) 0 error (0%) ------------------ 20 total (in 26s)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2418.html