Bug 601738

Summary:	Readahead disables audit collection
Product:	Red Hat Enterprise Linux 6	Reporter:	Harald Hoyer <harald>
Component:	readahead	Assignee:	Harald Hoyer <harald>
Status:	CLOSED WONTFIX	QA Contact:	qe-baseos-daemons
Severity:	medium	Docs Contact:
Priority:	low
Version:	6.3	CC:	harald, jwest, pknirsch
Target Milestone:	rc	Keywords:	Reopened, RHELNAK
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	584643	Environment:
Last Closed:	2012-06-25 15:21:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	584643
Bug Blocks:

Description Harald Hoyer 2010-06-08 14:40:21 UTC

+++ This bug was initially created as a clone of Bug #584643 +++

Description of problem:
Due to a race condition, readahead can leave audits disabled when it completes. This happens when readhead is configured to run at system startup (as it is by default in F12) and the system transitions into a runlevel 2 through 5 before readahead has completed. Leaving audits disabled means that possibly critical SElinux denials will never be logged.

Version-Release number of selected component (if applicable):
readahead-1.5.4-1.fc12.x86_64

How reproducible:
always, if the timing constraint is met

Steps to Reproduce:
1. Boot the system.
  
Actual results:
The system comes up with auditd running, but audits turned off in the kernel.

Expected results:
Startup of auditd should be delayed until readahead has finished so that auditd can enable audits in the kernel.  Because readahead starts while the system is not yet in a runlevel where auditd is configured to run, the code in the event.d scripts for arranging this delay is bypassed.

Additional info:
The timestamps tell the story.

From /var/log/wtmp, showing the system boot and transition to runlevel 5:
BOOT        0 ~          ~~   reboot    4/21 12:55:51.967 CDT (-0500)
RUNLVL    ->5 ~          ~~   runlevel  4/21 12:55:52.130 CDT (-0500)

From /var/log/audit.log, showing auditd startup:
type=DAEMON_START msg=audit(1271872562.987:8773): auditd start, ver=2.0.4 format=raw kernel=2.6.32.11-99.fc12.x86_64 auid=4294967295 pid=1619 subj=system_u:system_r:auditd_t:s0 res=success
**Note: That timestamp is 12:56:02.987

From /var/log/messages, showing audits being disabled in the kernel:
Apr 21 12:56:26 omega-3a kernel: type=1305 audit(1271872586.681:17143): auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 op="remove rule" key=(null) list=4 res=0
Apr 21 12:56:26 omega-3a kernel: type=1305 audit(1271872586.681:17144): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1

Timestamp showing when readahead completed:
-rw-r--r--. 1 root root 10556 2010-04-21 12:56:27 /var/lib/readahead/early.sorted

--- Additional comment from harald on 2010-04-22 03:35:05 EDT ---

And the auditd daemon is not running afterwards?
And, yes, if you want readahead-collector to be run, it has to delay the audit daemon. But after it has run readahead-collector, auditd is started.

So, if you don't want to miss those messages, you have to disable the collector in /etc/sysconfig/readahead. This has the consequence, that /var/lib/readahead/early.sorted will never be updated and readahead might readahead too many or the wrong files.

If auditd is _not_ running after readahead-collector was running, then reopen the bug.

--- Additional comment from rnichols42 on 2010-04-22 08:25:02 EDT ---

The auditd process started before readahead completed.  Yes, auditd is still running, but it cannot receive any messages because audits are turned off in the kernel.  The timestamps show that clearly.  That auditd process should not have started when it did.  I can see by the weeks-old timestamp on /etc/rc.d/rc5.d/S11auditd that the code in readahead-disable-services.event and readahead-collector.event that is supposed to delay auditd startup was never executed.  That is because the system is not yet in runlevel 5 when readahead-disable-services.event is executed.

If you are not seeing the failure, that just means that on your system the readahead data collection has finished by the time the system transitions to runlevel 5 and starts auditd.  Do an "ls -l" on /etc/rc.d/rc5.d/S11auditd.  If the startup delay code is working, the timestamp on that link should match the time that readahead-collector last finished.  Does it?

--- Additional comment from harald on 2010-04-22 08:49:34 EDT ---

(In reply to comment #2)
> That is because the system is not yet in runlevel 5 when
> readahead-disable-services.event is executed.

Ah!!!! now I get it!

In readahead-disable-services.event we have "start on stopped rcS" and later on "runlevel=$(/sbin/runlevel| (read a b; echo $b))". So there might a window between stopped rcS and starting the new runlevel.

Does it work for you, if you replace in readahead-disable-services.event
start on stopped rcS
with
start on started rc[2345]
?

--- Additional comment from harald on 2010-04-22 08:53:37 EDT ---

And yes, if you go first in runlevel 2, and then move on to 5, while the collector is running, indead auditd could be left disabled for runlevel 2.

Will clean up the code for that!

--- Additional comment from harald on 2010-04-22 08:54:55 EDT ---

Btw, a big thank you for finding this bug!!!!

--- Additional comment from harald on 2010-04-22 08:58:20 EDT ---

(In reply to comment #4)
> And yes, if you go first in runlevel 2, and then move on to 5, while the
> collector is running, indead auditd could be left disabled for runlevel 2.
> 
> Will clean up the code for that!    

and also auditd will be started in runlevel 5 without the fix in comment #3

--- Additional comment from rnichols42 on 2010-04-22 10:24:42 EDT ---

When the problem occurs, it is invariably right after reboot as the system is transitioning runlevels  S->5.  I do sometimes change runlevels, often right after new RPMs have been installed, but the logs do not show the problem happening at those times.

In comment #3, how does "start on started rc[2345]" ensure that this script runs before the rc5 script that is going to start auditd?  Seems like just another race.

Looking further, I think I now see the real race.  The 'runlevel' comnand reads /var/run/utmp to find the current runlevel.  Under upstart, the only thing that stores the runlevel record in utmp is the "runlevel --set" command in each of the event.d/rc? scripts.  It is very likely that the rc5 script ("start on runlevel 5") has not executed prior to your "start on stopped rcS" script.  In fact, it would appear that the "start on started rc[2345]" race I just mentioned is one you are almost guaranteed to lose.  It seems one of two things will happen:

  A) Your script runs first -- runlevel has not yet been set.

  B) The rc5 script runs first, and auditd has already been started.

--- Additional comment from rnichols42 on 2010-04-22 11:09:28 EDT ---

Upon reflection, I don't see any simple tweak that can make this reliable.  The readahead-disable-services script is dependent on seeing the new runlevel in /var/run/utmp.  However, the script that sets the new runlevel also invokes the script that starts the services in that level.  For the disable-services script to work, it must sneak in between the runlevel setting and the startup of services.  It works most of the time because /etc/rc.d/rc does quite a bit of processing (all of the "KILL" scripts, among other things) before starting any services, but that timing can't be guaranteed.

--- Additional comment from rnichols42 on 2010-04-22 12:31:42 EDT ---

I've tried the fix in comment #3, and it does not work.  I also added a line in the script to record what is happening.  Here's the "diff -u" output:

--- ../readahead-disable-services.event.old	2009-10-13 05:24:06.000000000 -0500
+++ readahead-disable-services.event	2010-04-22 11:05:40.000000000 -0500
@@ -2,7 +2,8 @@
 #
 # disable all services, which interfere with readahead-collector
 
-start on stopped rcS
+#start on stopped rcS
+start on started rc[2345]
 
 script
 	. /etc/sysconfig/readahead
@@ -13,6 +14,7 @@
 	fi
 	/sbin/pidof readahead-collector >/dev/null 2>&1 || exit 0
 	runlevel=$(/sbin/runlevel| (read a b; echo $b))
+	echo "readahead-disable-services runlevel=$runlevel $(/sbin/runlevel)" >>/tmp/readahead.log
 	# delay startup of services, which do not play nicely with readahead
 	for service in auditd restorecond; do
 		if /sbin/chkconfig --level $runlevel $service ; then
[

The line that gets written fo /tmp/readahead.log is:

   readahead-disable-services runlevel= unknown

As I suspected, the runlevel has not yet been set when this script runs, and on this particular boot auditing did end up disabled in the kernel.  I see the "unknown" runlevel recorded for both "start on started rc[2345]" and "start on stopped rcS".

BTW, to cause a new readahead collection I've been setting the mtime on /var/lib/readahead/early.sorted to an old date.

Perhaps you need to resort to a "service $service condrestart" loop when readahead collection finishes, but I suspect there's a race hidden somewhere in that, too.

--- Additional comment from rnichols42 on 2010-04-22 17:16:59 EDT ---

Created an attachment (id=408450)
Updated readahead-disable-services.event script

I've found that the runlevel can be extracted from argument $1 passed to the script, independent of the utmp file.  Still a race there with the actual rc[2-5] script, but I think the odds look good.  Hope this fits with the rest of your code cleanup.

Comment 2 Harald Hoyer 2010-06-08 14:43:47 UTC


*** This bug has been marked as a duplicate of bug 561486 ***

Comment 3 RHEL Program Management 2011-10-21 13:09:05 UTC

Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.