1345854 – non-deterministic calls of check_space_left()

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1345854 - non-deterministic calls of check_space_left()

Summary: non-deterministic calls of check_space_left()

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	audit
Sub Component:
Version:	7.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.3
Assignee:	Steve Grubb
QA Contact:	Ondrej Moriš
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1256920
TreeView+	depends on / blocked

Reported:	2016-06-13 10:35 UTC by Ondrej Moriš
Modified:	2016-11-04 06:13 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-04 06:13:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2418	0	normal	SHIPPED_LIVE	audit bug fix and enhancement update	2016-11-03 13:58:32 UTC

Description Ondrej Moriš 2016-06-13 10:35:11 UTC

Description of problem:

With audit-2.5 we observed that space_left and admin_space_left actions are _sometimes_ performed only after auditd restart and not immediately when system is (nearly) running low of space. Behaviour was oddly non-deterministic, sometimes it worked just fine for hours and sometimes it was broken for hours. It was a regression from 2.4.

Version-Release number of selected component (if applicable):

audit-2.5.2-1.el7

How reproducible:

Sometimes. Depends on size of audit events.

Steps to Reproduce:

1. Configure admin_space to X and admin_space_action (syslog).

2. Fill partition containing /var/log/audit by audit events so that it has less than X free space.

3. Check /var/log/messages to see that action was triggered.

Actual results:

Action is sometimes not triggered.

Expected results:

Action is always triggered. 

Additional info:

We tracked the issue to auditd-event.c write_to_log() function, its relevant part calling check_space_left() changed between 2.5 and 2.4 as follows:

(2.5)

397	                if (config->daemonize == D_BACKGROUND) {
...
403	                        log_size += rc;
404	                        check_log_file_size();
405	                        // Keep loose tabs on the free space
406	                        if (rc%2 == 0)
407	                                check_space_left();
408	                }

(2.4)

427	                /* check log file size & space left on partition */
428	                if (config->daemonize == D_BACKGROUND) {
...
434	                        log_size += rc;
435	                        check_log_file_size(data);
436	                        check_space_left(data->log_fd, data);
437	                }

Where rc = fprintf(log_file, "%s\n", buf). I am not sure if it is the safe approach to condition check_space_left() calls based either on rc or log_size. It might happen that rc/log_size is odd number during a long sequence of write_to_log() calls. In that case a system might be already well below (admin_)space limit before (admin_)space_action is triggered.

Comment 2 Steve Grubb 2016-06-13 14:03:10 UTC

Is (rc%3 >  2) more deterministic? The idea here is to try to avoid calling fstatfs all the time.

Comment 3 Ondrej Moriš 2016-06-15 11:57:11 UTC

(In reply to Steve Grubb from comment #2)
> Is (rc%3 >  2) more deterministic? The idea here is to try to avoid calling
> fstatfs all the time.

Well, conditioning check_space_left() calls by event length might go wrong when, for instance, a bunch of similar events are generated (with the same length). I understand your point of saving fstatfs calls. The best would be to condition by serial number of an event to make 100% sure that check_space_left() is called at least once during a sequence of events.

Comment 4 Steve Grubb 2016-06-15 16:07:19 UTC

This is the proposed fix:
https://fedorahosted.org/audit/changeset/1266

Comment 5 Ondrej Moriš 2016-06-29 08:35:56 UTC

(In reply to Steve Grubb from comment #4)
> This is the proposed fix:
> https://fedorahosted.org/audit/changeset/1266

Thanks Steve, it looks better now. There is still a very rare case when the function might not be called but a probability it might be hit in the production environment is very close to zero (FTR: I noticed that a typo in the patch was already corrected upstream).

Comment 8 Ondrej Moriš 2016-08-10 10:10:37 UTC

Verified by Common Criteria audit-test on all architectures except ppc64 where results are currently not available, results from x86_64 are as follow:

NEW (audit-2.6.5-2.el7)
=======================
Bucket:                          fail-safe
Started:                         Thu Aug  4 18:19:57 CEST 2016
Kernel:                          3.10.0-483.el7.x86_64
Architecture:                    x86_64
Mode:                            64
Hostname:                        cc-v0b.lab.eng.brq.redhat.com
Profile:                         capp
selinux-policy version:          selinux-policy-3.13.1-92.el7.noarch

SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

[0] pam_loginuid success PASS
[1] pam_loginuid fail PASS
[2] max_log_file keep_logs PASS
[3] max_log_file rotate PASS
[4] max_log_file suspend PASS
[5] max_log_file syslog PASS
[6] admin_space_left halt PASS
[7] admin_space_left single PASS
[8] admin_space_left suspend PASS
[9] admin_space_left syslog PASS
[10] space_left halt PASS
[11] space_left single PASS
[12] space_left suspend PASS
[13] space_left syslog PASS
[14] disk_full_action halt PASS
[15] disk_full_action single PASS
[16] disk_full_action suspend PASS
[17] disk_full_action syslog PASS
[18] admin_space_left email PASS
[19] space_left email PASS

  20 pass (100%)
   0 fail (0%)
   0 error (0%)
------------------
  20 total (in 26s)

Comment 10 errata-xmlrpc 2016-11-04 06:13:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2418.html

Note You need to log in before you can comment on or make changes to this bug.