Bug 584643

Summary: Readahead disables audit collection
Product: [Fedora] Fedora Reporter: Robert Nichols <rnichols42>
Component: readaheadAssignee: Harald Hoyer <harald>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 14CC: bugs.michael, harald, jburke
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 601738 (view as bug list) Environment:
Last Closed: 2012-08-16 21:54:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 601738    
Attachments:
Description Flags
Updated readahead-disable-services.event script none

Description Robert Nichols 2010-04-22 05:35:26 UTC
Description of problem:
Due to a race condition, readahead can leave audits disabled when it completes. This happens when readhead is configured to run at system startup (as it is by default in F12) and the system transitions into a runlevel 2 through 5 before readahead has completed. Leaving audits disabled means that possibly critical SElinux denials will never be logged.

Version-Release number of selected component (if applicable):
readahead-1.5.4-1.fc12.x86_64

How reproducible:
always, if the timing constraint is met

Steps to Reproduce:
1. Boot the system.
  
Actual results:
The system comes up with auditd running, but audits turned off in the kernel.

Expected results:
Startup of auditd should be delayed until readahead has finished so that auditd can enable audits in the kernel.  Because readahead starts while the system is not yet in a runlevel where auditd is configured to run, the code in the event.d scripts for arranging this delay is bypassed.

Additional info:
The timestamps tell the story.

From /var/log/wtmp, showing the system boot and transition to runlevel 5:
BOOT        0 ~          ~~   reboot    4/21 12:55:51.967 CDT (-0500)
RUNLVL    ->5 ~          ~~   runlevel  4/21 12:55:52.130 CDT (-0500)

From /var/log/audit.log, showing auditd startup:
type=DAEMON_START msg=audit(1271872562.987:8773): auditd start, ver=2.0.4 format=raw kernel=2.6.32.11-99.fc12.x86_64 auid=4294967295 pid=1619 subj=system_u:system_r:auditd_t:s0 res=success
**Note: That timestamp is 12:56:02.987

From /var/log/messages, showing audits being disabled in the kernel:
Apr 21 12:56:26 omega-3a kernel: type=1305 audit(1271872586.681:17143): auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 op="remove rule" key=(null) list=4 res=0
Apr 21 12:56:26 omega-3a kernel: type=1305 audit(1271872586.681:17144): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1

Timestamp showing when readahead completed:
-rw-r--r--. 1 root root 10556 2010-04-21 12:56:27 /var/lib/readahead/early.sorted

Comment 1 Harald Hoyer 2010-04-22 07:35:05 UTC
And the auditd daemon is not running afterwards?
And, yes, if you want readahead-collector to be run, it has to delay the audit daemon. But after it has run readahead-collector, auditd is started.

So, if you don't want to miss those messages, you have to disable the collector in /etc/sysconfig/readahead. This has the consequence, that /var/lib/readahead/early.sorted will never be updated and readahead might readahead too many or the wrong files.

If auditd is _not_ running after readahead-collector was running, then reopen the bug.

Comment 2 Robert Nichols 2010-04-22 12:25:02 UTC
The auditd process started before readahead completed.  Yes, auditd is still running, but it cannot receive any messages because audits are turned off in the kernel.  The timestamps show that clearly.  That auditd process should not have started when it did.  I can see by the weeks-old timestamp on /etc/rc.d/rc5.d/S11auditd that the code in readahead-disable-services.event and readahead-collector.event that is supposed to delay auditd startup was never executed.  That is because the system is not yet in runlevel 5 when readahead-disable-services.event is executed.

If you are not seeing the failure, that just means that on your system the readahead data collection has finished by the time the system transitions to runlevel 5 and starts auditd.  Do an "ls -l" on /etc/rc.d/rc5.d/S11auditd.  If the startup delay code is working, the timestamp on that link should match the time that readahead-collector last finished.  Does it?

Comment 3 Harald Hoyer 2010-04-22 12:49:34 UTC
(In reply to comment #2)
> That is because the system is not yet in runlevel 5 when
> readahead-disable-services.event is executed.

Ah!!!! now I get it!

In readahead-disable-services.event we have "start on stopped rcS" and later on "runlevel=$(/sbin/runlevel| (read a b; echo $b))". So there might a window between stopped rcS and starting the new runlevel.

Does it work for you, if you replace in readahead-disable-services.event
start on stopped rcS
with
start on started rc[2345]
?

Comment 4 Harald Hoyer 2010-04-22 12:53:37 UTC
And yes, if you go first in runlevel 2, and then move on to 5, while the collector is running, indead auditd could be left disabled for runlevel 2.

Will clean up the code for that!

Comment 5 Harald Hoyer 2010-04-22 12:54:55 UTC
Btw, a big thank you for finding this bug!!!!

Comment 6 Harald Hoyer 2010-04-22 12:58:20 UTC
(In reply to comment #4)
> And yes, if you go first in runlevel 2, and then move on to 5, while the
> collector is running, indead auditd could be left disabled for runlevel 2.
> 
> Will clean up the code for that!    

and also auditd will be started in runlevel 5 without the fix in comment #3

Comment 7 Robert Nichols 2010-04-22 14:24:42 UTC
When the problem occurs, it is invariably right after reboot as the system is transitioning runlevels  S->5.  I do sometimes change runlevels, often right after new RPMs have been installed, but the logs do not show the problem happening at those times.

In comment #3, how does "start on started rc[2345]" ensure that this script runs before the rc5 script that is going to start auditd?  Seems like just another race.

Looking further, I think I now see the real race.  The 'runlevel' comnand reads /var/run/utmp to find the current runlevel.  Under upstart, the only thing that stores the runlevel record in utmp is the "runlevel --set" command in each of the event.d/rc? scripts.  It is very likely that the rc5 script ("start on runlevel 5") has not executed prior to your "start on stopped rcS" script.  In fact, it would appear that the "start on started rc[2345]" race I just mentioned is one you are almost guaranteed to lose.  It seems one of two things will happen:

  A) Your script runs first -- runlevel has not yet been set.

  B) The rc5 script runs first, and auditd has already been started.

Comment 8 Robert Nichols 2010-04-22 15:09:28 UTC
Upon reflection, I don't see any simple tweak that can make this reliable.  The readahead-disable-services script is dependent on seeing the new runlevel in /var/run/utmp.  However, the script that sets the new runlevel also invokes the script that starts the services in that level.  For the disable-services script to work, it must sneak in between the runlevel setting and the startup of services.  It works most of the time because /etc/rc.d/rc does quite a bit of processing (all of the "KILL" scripts, among other things) before starting any services, but that timing can't be guaranteed.

Comment 9 Robert Nichols 2010-04-22 16:31:42 UTC
I've tried the fix in comment #3, and it does not work.  I also added a line in the script to record what is happening.  Here's the "diff -u" output:

--- ../readahead-disable-services.event.old	2009-10-13 05:24:06.000000000 -0500
+++ readahead-disable-services.event	2010-04-22 11:05:40.000000000 -0500
@@ -2,7 +2,8 @@
 #
 # disable all services, which interfere with readahead-collector
 
-start on stopped rcS
+#start on stopped rcS
+start on started rc[2345]
 
 script
 	. /etc/sysconfig/readahead
@@ -13,6 +14,7 @@
 	fi
 	/sbin/pidof readahead-collector >/dev/null 2>&1 || exit 0
 	runlevel=$(/sbin/runlevel| (read a b; echo $b))
+	echo "readahead-disable-services runlevel=$runlevel $(/sbin/runlevel)" >>/tmp/readahead.log
 	# delay startup of services, which do not play nicely with readahead
 	for service in auditd restorecond; do
 		if /sbin/chkconfig --level $runlevel $service ; then
[

The line that gets written fo /tmp/readahead.log is:

   readahead-disable-services runlevel= unknown

As I suspected, the runlevel has not yet been set when this script runs, and on this particular boot auditing did end up disabled in the kernel.  I see the "unknown" runlevel recorded for both "start on started rc[2345]" and "start on stopped rcS".

BTW, to cause a new readahead collection I've been setting the mtime on /var/lib/readahead/early.sorted to an old date.

Perhaps you need to resort to a "service $service condrestart" loop when readahead collection finishes, but I suspect there's a race hidden somewhere in that, too.

Comment 10 Robert Nichols 2010-04-22 21:16:59 UTC
Created attachment 408450 [details]
Updated readahead-disable-services.event script

I've found that the runlevel can be extracted from argument $1 passed to the script, independent of the utmp file.  Still a race there with the actual rc[2-5] script, but I think the odds look good.  Hope this fits with the rest of your code cleanup.

Comment 11 Robert Nichols 2010-06-16 13:21:38 UTC
For reasons unknown, I am no longer seeing the error.  The logging line that
I added in readahead-disable-services now consistently reports "runlevel=5",
whereas when I was seeing the error the reported runlevel was unknown.
Perhaps something in the initscripts-9.02.2-1 update affected this.  I just
don't know, and readahead simply isn't important enough to me to warrant
more effort.  My desktop machine gets rebooted only after significant
updates, so readahead never has a chance to provide any benefit.  My laptop
does boot 10 seconds faster with readahead, but I can't get terribly excited
about 57 seconds vs. 67 (the BIOS alone takes 18 seconds of that) for a cold
boot.

Comment 12 Michael Schwendt 2010-10-02 20:11:34 UTC
I've run into this with F-14 Beta:

$ rpm -q readahead
readahead-1.5.6-1.fc14.i686

[...]

$ ll /etc/rc5.d/*aud*
lrwxrwxrwx. 1 root root 16 Sep 29 10:56 /etc/rc5.d/K88auditd -> ../init.d/auditd

$ chkconfig --list auditd
auditd          0:off   1:off   2:on    3:on    4:on    5:off   6:off
$ service auditd status
auditd is stopped

This installation is from Sep 23. Package's default is to enable the
service for run-level 5. Service was not disabled manually.

On Sep 29 10:56 there was a reboot with the following log message

  kernel: readahead-disable-service: delaying service auditd

with readahead-disable-services running

  /sbin/chkconfig --level $runlevel $service off

for "auditd". Nothing has automatically restored the chkconfig setting.

Comment 13 Bug Zapper 2010-11-03 16:36:04 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 14 Bug Zapper 2010-12-03 15:35:52 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 15 Fedora End Of Life 2012-08-16 21:54:33 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping