Bug 1044122
| Summary: | Batch processing vs. reboot: fix possible reboot loop and the clutter from when batch processing gets interrupted | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jan Pokorný [poki] <jpokorny> | ||||||
| Component: | ricci | Assignee: | Jan Pokorný [poki] <jpokorny> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 6.5 | CC: | cluster-maint, fdinitto, joallen, rsteiger | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | ricci-0.16.2-74.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause:
Time window allowing for remotely invoked reboot task not being
considered finished if the system reboot is performed faster than
ricci worker gets a chance to mark the successful completion.
Also, proceeded files left behind if ricci restart (caused, e.g.,
by reboot) occurred during processing respective batches of tasks.
Consequence:
Chance that ricci will cause an unintentional reboot loop
preventing from normal use of the machine.
Also, unnecessary clutter in the directory with batches to proceed.
Fix:
Now the task is refrained from any further scheduling before
the reboot process is initiated with possible ad-hoc failure
propagation. Also the problem with stale files left behind
is addressed with each such removed file reported to system log.
Result:
Ricci won't cause a reboot loop because of the said circumstances,
nor it will leave stale files from finished batch processing behind
needlessly.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-10-14 07:29:48 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Looking at 1st-gen luci code, RHEL 5 suffers from this too (one can also trigger the isolated RPC call in question from here, via "Reboot this node") from respective per-node drop-down menu (code search hint: reboot_url). On the positive side, ccs does not seem to support that RPC call, not even in the context of adding a node or creating a cluster. Still, I'm not sure why the "reboot nodes" option when creating a cluster or adding a node, which effectively includes the same RPC call to the batch to be executed, seems to be harmless so far. Perhaps being a single task in the batch and being one of many (and near the middle of the list) is what makes the difference? > now not sure, but nothing seems to show that the syscall/reboot command/
> oddjobd execution/DBus blocking message delivery chain can return
> asynchonously, hence the expectation:
>
> 9? ...any code after mentioned DBus blocking send call (waiting for the
> response) is not ever executed unless reboot comand fails for some
> esoteric reason
>
> 10? ... hence file being discussed in 4. - 6. is not ever updated,
> and in turn, when ricci is started again upon reboot (if chkconfig'd
> on as usual), it starts reboot cycle again
>
> 11? -> reboot loop until some intervention kicks in
Sorry, there was some noise due to subtle nuances in how
SysVinit's/upstart's reboot command (and respective syscall handling
in the kernel) works.
As per both a simple test (/sbin/reboot && echo OK || echo NOT) and code
comprehension, the truth is that reboot command with no extra switch
behaves asynchronously, and that's why reboot as triggered by ricci
should always work (but perhaps modulo rare case where reboot via
init process is too fast so that ricci-worker has no chance to
update the file, so there still may be a latent bug in this regard).
> perhaps modulo rare case where reboot via init process is too fast
> so that ricci-worker has no chance to update the file, so there still
> may be a latent bug in this regard
This is one of the possible ways to achieve the unintentional reboot
loop. One of others is that the node is required to reboot via luci,
which is interrupted by power fencing at the point the reboot task
is "scheduled" (hence the file in queue has the potential to be triggered
again as soon as ricci is started after reboot) but not "in progress"
yet (or "in progress" but /sbin/reboot hasn't been invoked yet).
Fix should be simple both in the business logic and code-wise:
in case of "reboot" module invoction via ricci's RPC, invert
the expectation from pesimistic (success only once proved)
to optimistic (success assumed and negated only once proved).
This way, it cannot happen that reboot succeeds without task
file modified to state this success and in turn, it cannot be
initiated more than once, effectively preventing the reboot
loop. Only in extremely rare occurrences, the reboot can be
supposed to have succeeded while in fact it did not.
Created attachment 910869 [details]
Proposed patch to make reboot loop as unlikely as possible
Created attachment 910878 [details]
Updated version of the patch
As the prerequisite condition to reproduce the original bug is rare to reproduce due to timing sensitivity, following test result mimics such a condition: [note we have to disable SELinux as our "fake reboot" function needs to operate outside of the reasonably set boundaries pertaining the SELinux contexts] # setenforce 0 [get some material to use and prepare for action] # mkdir ricci-bz1044122; pushd ricci-bz1044122 # TOOLSGIT=https://git.fedorahosted.org/cgit/conga.git # TOOLSPATH=plain/ricci/test/2-via-ssl # TOOLS=cacert.config,generate_certs.sh,send_to_ricci # TOOLS+=,ricci/authenticate.xml,reboot/reboot_now.xml # eval wget ${TOOLSGIT}/${TOOLSPATH}/{${TOOLS}} # chmod +x generate_certs.sh send_to_ricci # ./generate_certs.sh # service ricci stop # PASS=password # echo "${PASS}" | passwd --stdin -- ricci # sed -i "s|enter_password_here|${PASS}|" authenticate.xml [craft our own fake reboot command] # cat >/sbin/fakereboot.sh <<EOF #!/bin/sh logger -t ricci-bz1044122 'told to reboot, faking it' pkill -9 ricci killall ricci-worker EOF # sed -i.orig 's|/sbin/reboot|/sbin/fakereboot.sh|' \ /etc/oddjobd.conf.d/ricci.oddjob.conf # service oddjobd reload [now the core of the test] # rm -rf /var/lib/ricci/queue/[0-9]* # service ricci start # ./send_to_ricci localhost authenticate.xml > <?xml version="1.0"?> > <ricci authenticated="true" [...] # ./send_to_ricci localhost reboot_now.xml # tail -n1 /var/log/messages Jul 2 19:14:21 localhost ricci-bz1044122: told to reboot, faking it ######## IFDEF BEFORE-FIX ######## # ls /var/lib/ricci/queue/[0-9]* > /var/lib/ricci/queue/286634334 # cat /var/lib/ricci/queue/[0-9]* > <?xml version="1.0"?> > <batch batch_id="286634334" status="2"> > <module name="reboot" status="1"> > <request API_version="1.0" sequence="1254"> > <function_call name="reboot_now"/> > </request> > </module> > </batch> # pgrep ricci || echo nothing > nothing [so far, we have simulated superfast reboot, which caused the ricci queue of batches to be properly cleaned (generally expected) nor to have the per-module status flipped to something different that "to be proceeded" (subject of this bug to be fixed, as otherwise, when effectively rebooted and ricci chkconfig'd on, it will start processing the request to reboot again ...you can imagine the possible neverending loop DoSing the machine] # service ricci start # # tail -n 1 /var/log/messages Jul 2 19:22:12 localhost ricci-bz1044122: told to reboot, faking it [time for another round? ;-) one cannot guarantee superfast reboot to be always a case for the machine at hand] ######## ELSE ######## # ls /var/lib/ricci/queue/[0-9]* > /var/lib/ricci/queue/2079265775 # cat /var/lib/ricci/queue/[0-9]* > <?xml version="1.0"?> > <batch batch_id="2079265775" status="2"> > <module name="reboot" status="5"> > <request API_version="1.0" sequence="1254"> > <function_call name="reboot_now"/> > </request> > </module> > </batch> # pgrep ricci || echo nothing > nothing # service ricci start # tail -n 1 /var/log/messages > Jul 2 19:24:27 localhost ricci: startup succeeded [i.e., no reboot loop possible, ergo the main issue fixed; side negligible issue is with cluttering the queue with reboot tasks turned to out-of-scheduling-but-kept-around ones] # ls /var/lib/ricci/queue/[0-9]* > /var/lib/ricci/queue/2079265775 # cat /var/lib/ricci/queue/[0-9]* > <?xml version="1.0"?> > <batch batch_id="2079265775" status="0"> > <module name="reboot" status="5"> > <request API_version="1.0" sequence="1254"> > <function_call name="reboot_now"/> > </request> > </module> > </batch> ######## ENDIF ######## Conclusion: need to decide if fixing the remaining mini-issue is worth it. WRT. the additional aside issue pointed out at the end of [comment 11], arising from the original design flaw: this is nothing new, but it exposed itself in a clear light when I was fiddling with this part of code. It has nothing to do with unscheduling particular module processing (as now the code is instrumented to do in case of "reboot" one), the same used to happen unconditionally: once the batch processing finishes after the interruption (say reboot), the task file is left around, (with a slight negative impact on subsequent ricci restarts, etc.) Now the additional change was added and can be observed in two ways: 1. when following the previous test run, imagine this just before ######## ENDIF ######## mark in the [comment 11]: # service ricci restart # ls -1 /var/lib/ricci/queue/[0-9]* 2>/dev/null | wc -l > 1 (ricci-0.16.2-73.el6) > 0 (ricci-0.16.2-74.el6+) [i.e., now there is some kind of garbage collection in place preventing cluttering the batch queue in the form of filesystem entries with fully proceeded batches (see 'batch status="0"' above); this in turn lowers the clutter of sosreports and theoretically helps ricci get accepting the requests upon its start faster (slowdown might be remarkable in case of tons of such futile files + slow disk access)] --- 2. even when the batch was being processed and the processing interrupted (machine shutdown, ricci killed, etc.), if ricci is then restarted after more than an 70 minutes, the batch/file is forcibly removed w/o further resuming; imagine this just before ######## ENDIF ########: # service ricci stop [restore batch so it looks like unfinished as in "IFDEF BEFORE-FIX"] # sed -i -e 's|\(<batch.*status="\).\(">\)|\12\2|' \ -e 's|\(<module.*status="\).\(">\)|\11\2|' /var/lib/ricci/queue/[0-9]* # cat /var/lib/ricci/queue/[0-9]* > <?xml version="1.0"?> > <batch batch_id="1234" status="2"> > <module name="reboot" status="1"> > <request API_version="1.0" sequence="1254"> > <function_call name="reboot_now"/> > </request> > </module> > </batch> [now if ricci restarted within 70 minutes (!), "told to reboot, faking it" should appear in syslog (please verify) but primarily we want to demonstrate that after an hour, the batch is now no longer considered relevant and removed] ######## IF STALE (T > 70 minutes) ######## # LC=C date --set "+71 minutes" # service ricci start # tail -n1 /var/log/messages | grep ricci > Jul 4 21:53:44 localhost ricci[11654]: > Prunning stale unfinished batch '2079265775' # ls /var/lib/ricci/queue/[0-9]* > ls: cannot access /var/lib/ricci/queue/[0-9]*: No such file or directory ######## ELSE (T < 70 minutes) ######## # service ricci start # tail -n1 /var/log/messages | grep ricci > Jul 4 20:56:37 localhost ricci-bz1044122: told to reboot, faking it ######## ENDIF ######### --- Technical note: all file-exploring commands, like the "tail" ones are mainly for illustration, as exact reproducer would be even more difficult to prepare (appending " | grep ricci" when tail'ing /var/log/messages would be a good start). Created attachment 914749 [details]
(hopefully) a final patch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1539.html |
The scenario as per summary has been reported and, indeed, it's a bug in the design of reboot_now vs. persisting state of the task batch. Look: 1. message like following hits the ricci's port (11111 by default): <ricci version="1.0" function="process_batch"> <batch> <module name="reboot"> <request API_version="1.0"> <function_call name="reboot_now"/> </request> </module> </batch> </ricci> 2. the connection is accepted, the message is received, request handling initiated and turns out to be "process_batch" handling (auth. required) 3. "batch" stanza from the original message is unwrapped and pushed to Batch object constructor 4. ...which establishes the unique filename within /var/lib/ricci/queue/ that is then used to store original request enhanced by some progress tracking decoration (using "write to foo.tmp and rename to foo" technique) 5. ...on such a file, /usr/libexec/ricci/ricci-worker binary is applied to do the actual task batch in separation 6. ...which starts with opening and flocking the file from 4., reading the contained assignment and splitting the batch to particular tasks (processes), that are then sequentially run interleaved by updates denoting the progress/success to the file from 4. 7. ...for "reboot: reboot_now" task in particular, RebootModule (built-in to the ricci-worker) is invoked, leading to an empty DBus message being synchronously sent to com.redhat.ricci.reboot (interface.member) 8. ...where it is intercepted by oddjobd and as per /etc/oddjobd.conf.d/ricci.oddjob.conf (if permissions match), /sbin/reboot is executed, which leads to reboot(LINUX_REBOOT_CMD_RESTART) syscall now not sure, but nothing seems to show that the syscall/reboot command/ oddjobd execution/DBus blocking message delivery chain can return asynchonously, hence the expectation: 9? ...any code after mentioned DBus blocking send call (waiting for the response) is not ever executed unless reboot comand fails for some esoteric reason 10? ... hence file being discussed in 4. - 6. is not ever updated, and in turn, when ricci is started again upon reboot (if chkconfig'd on as usual), it starts reboot cycle again 11? -> reboot loop until some intervention kicks in