Bug 584643
Summary: | Readahead disables audit collection | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Robert Nichols <rnichols42> | ||||
Component: | readahead | Assignee: | Harald Hoyer <harald> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 14 | CC: | bugs.michael, harald, jburke | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 601738 (view as bug list) | Environment: | |||||
Last Closed: | 2012-08-16 21:54:30 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 601738 | ||||||
Attachments: |
|
Description
Robert Nichols
2010-04-22 05:35:26 UTC
And the auditd daemon is not running afterwards? And, yes, if you want readahead-collector to be run, it has to delay the audit daemon. But after it has run readahead-collector, auditd is started. So, if you don't want to miss those messages, you have to disable the collector in /etc/sysconfig/readahead. This has the consequence, that /var/lib/readahead/early.sorted will never be updated and readahead might readahead too many or the wrong files. If auditd is _not_ running after readahead-collector was running, then reopen the bug. The auditd process started before readahead completed. Yes, auditd is still running, but it cannot receive any messages because audits are turned off in the kernel. The timestamps show that clearly. That auditd process should not have started when it did. I can see by the weeks-old timestamp on /etc/rc.d/rc5.d/S11auditd that the code in readahead-disable-services.event and readahead-collector.event that is supposed to delay auditd startup was never executed. That is because the system is not yet in runlevel 5 when readahead-disable-services.event is executed. If you are not seeing the failure, that just means that on your system the readahead data collection has finished by the time the system transitions to runlevel 5 and starts auditd. Do an "ls -l" on /etc/rc.d/rc5.d/S11auditd. If the startup delay code is working, the timestamp on that link should match the time that readahead-collector last finished. Does it? (In reply to comment #2) > That is because the system is not yet in runlevel 5 when > readahead-disable-services.event is executed. Ah!!!! now I get it! In readahead-disable-services.event we have "start on stopped rcS" and later on "runlevel=$(/sbin/runlevel| (read a b; echo $b))". So there might a window between stopped rcS and starting the new runlevel. Does it work for you, if you replace in readahead-disable-services.event start on stopped rcS with start on started rc[2345] ? And yes, if you go first in runlevel 2, and then move on to 5, while the collector is running, indead auditd could be left disabled for runlevel 2. Will clean up the code for that! Btw, a big thank you for finding this bug!!!! (In reply to comment #4) > And yes, if you go first in runlevel 2, and then move on to 5, while the > collector is running, indead auditd could be left disabled for runlevel 2. > > Will clean up the code for that! and also auditd will be started in runlevel 5 without the fix in comment #3 When the problem occurs, it is invariably right after reboot as the system is transitioning runlevels S->5. I do sometimes change runlevels, often right after new RPMs have been installed, but the logs do not show the problem happening at those times. In comment #3, how does "start on started rc[2345]" ensure that this script runs before the rc5 script that is going to start auditd? Seems like just another race. Looking further, I think I now see the real race. The 'runlevel' comnand reads /var/run/utmp to find the current runlevel. Under upstart, the only thing that stores the runlevel record in utmp is the "runlevel --set" command in each of the event.d/rc? scripts. It is very likely that the rc5 script ("start on runlevel 5") has not executed prior to your "start on stopped rcS" script. In fact, it would appear that the "start on started rc[2345]" race I just mentioned is one you are almost guaranteed to lose. It seems one of two things will happen: A) Your script runs first -- runlevel has not yet been set. B) The rc5 script runs first, and auditd has already been started. Upon reflection, I don't see any simple tweak that can make this reliable. The readahead-disable-services script is dependent on seeing the new runlevel in /var/run/utmp. However, the script that sets the new runlevel also invokes the script that starts the services in that level. For the disable-services script to work, it must sneak in between the runlevel setting and the startup of services. It works most of the time because /etc/rc.d/rc does quite a bit of processing (all of the "KILL" scripts, among other things) before starting any services, but that timing can't be guaranteed. I've tried the fix in comment #3, and it does not work. I also added a line in the script to record what is happening. Here's the "diff -u" output: --- ../readahead-disable-services.event.old 2009-10-13 05:24:06.000000000 -0500 +++ readahead-disable-services.event 2010-04-22 11:05:40.000000000 -0500 @@ -2,7 +2,8 @@ # # disable all services, which interfere with readahead-collector -start on stopped rcS +#start on stopped rcS +start on started rc[2345] script . /etc/sysconfig/readahead @@ -13,6 +14,7 @@ fi /sbin/pidof readahead-collector >/dev/null 2>&1 || exit 0 runlevel=$(/sbin/runlevel| (read a b; echo $b)) + echo "readahead-disable-services runlevel=$runlevel $(/sbin/runlevel)" >>/tmp/readahead.log # delay startup of services, which do not play nicely with readahead for service in auditd restorecond; do if /sbin/chkconfig --level $runlevel $service ; then [ The line that gets written fo /tmp/readahead.log is: readahead-disable-services runlevel= unknown As I suspected, the runlevel has not yet been set when this script runs, and on this particular boot auditing did end up disabled in the kernel. I see the "unknown" runlevel recorded for both "start on started rc[2345]" and "start on stopped rcS". BTW, to cause a new readahead collection I've been setting the mtime on /var/lib/readahead/early.sorted to an old date. Perhaps you need to resort to a "service $service condrestart" loop when readahead collection finishes, but I suspect there's a race hidden somewhere in that, too. Created attachment 408450 [details]
Updated readahead-disable-services.event script
I've found that the runlevel can be extracted from argument $1 passed to the script, independent of the utmp file. Still a race there with the actual rc[2-5] script, but I think the odds look good. Hope this fits with the rest of your code cleanup.
For reasons unknown, I am no longer seeing the error. The logging line that I added in readahead-disable-services now consistently reports "runlevel=5", whereas when I was seeing the error the reported runlevel was unknown. Perhaps something in the initscripts-9.02.2-1 update affected this. I just don't know, and readahead simply isn't important enough to me to warrant more effort. My desktop machine gets rebooted only after significant updates, so readahead never has a chance to provide any benefit. My laptop does boot 10 seconds faster with readahead, but I can't get terribly excited about 57 seconds vs. 67 (the BIOS alone takes 18 seconds of that) for a cold boot. I've run into this with F-14 Beta: $ rpm -q readahead readahead-1.5.6-1.fc14.i686 [...] $ ll /etc/rc5.d/*aud* lrwxrwxrwx. 1 root root 16 Sep 29 10:56 /etc/rc5.d/K88auditd -> ../init.d/auditd $ chkconfig --list auditd auditd 0:off 1:off 2:on 3:on 4:on 5:off 6:off $ service auditd status auditd is stopped This installation is from Sep 23. Package's default is to enable the service for run-level 5. Service was not disabled manually. On Sep 29 10:56 there was a reboot with the following log message kernel: readahead-disable-service: delaying service auditd with readahead-disable-services running /sbin/chkconfig --level $runlevel $service off for "auditd". Nothing has automatically restored the chkconfig setting. This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping |