1044122 – Batch processing vs. reboot: fix possible reboot loop and the clutter from when batch processing gets interrupted

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1044122 - Batch processing vs. reboot: fix possible reboot loop and the clutter from when batch processing gets interrupted

Summary: Batch processing vs. reboot: fix possible reboot loop and the clutter from wh...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	ricci
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Jan Pokorný [poki]
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-12-17 20:40 UTC by Jan Pokorný [poki]
Modified:	2018-12-04 16:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:	ricci-0.16.2-74.el6
Doc Type:	Bug Fix
Doc Text:	Cause: Time window allowing for remotely invoked reboot task not being considered finished if the system reboot is performed faster than ricci worker gets a chance to mark the successful completion. Also, proceeded files left behind if ricci restart (caused, e.g., by reboot) occurred during processing respective batches of tasks. Consequence: Chance that ricci will cause an unintentional reboot loop preventing from normal use of the machine. Also, unnecessary clutter in the directory with batches to proceed. Fix: Now the task is refrained from any further scheduling before the reboot process is initiated with possible ad-hoc failure propagation. Also the problem with stale files left behind is addressed with each such removed file reported to system log. Result: Ricci won't cause a reboot loop because of the said circumstances, nor it will leave stale files from finished batch processing behind needlessly.
Clone Of:
Environment:
Last Closed:	2014-10-14 07:29:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Proposed patch to make reboot loop as unlikely as possible (3.24 KB, patch) 2014-06-20 18:35 UTC, Jan Pokorný [poki]	no flags	Details \| Diff
(hopefully) a final patch (5.95 KB, patch) 2014-07-04 18:20 UTC, Jan Pokorný [poki]	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1539	0	normal	SHIPPED_LIVE	ricci bug fix and enhancement update	2014-10-14 01:21:42 UTC

Description Jan Pokorný [poki] 2013-12-17 20:40:39 UTC

The scenario as per summary has been reported and, indeed, it's a bug
in the design of reboot_now vs. persisting state of the task batch.

Look:

1. message like following hits the ricci's port (11111 by default):

<ricci version="1.0" function="process_batch">
  <batch>
    <module name="reboot">
      <request API_version="1.0">
        <function_call name="reboot_now"/>
      </request>
    </module>
  </batch>
</ricci>

2. the connection is accepted, the message is received, request handling
   initiated and turns out to be "process_batch" handling (auth. required)

3. "batch" stanza from the original message is unwrapped and pushed to
   Batch object constructor

4. ...which establishes the unique filename within /var/lib/ricci/queue/
   that is then used to store original request enhanced by some progress
   tracking decoration (using "write to foo.tmp and rename to foo"
   technique)

5. ...on such a file, /usr/libexec/ricci/ricci-worker binary is applied
   to do the actual task batch in separation

6. ...which starts with opening and flocking the file from 4., reading
   the contained assignment and splitting the batch to particular tasks
   (processes), that are then sequentially run interleaved by updates
   denoting the progress/success to the file from 4.

7. ...for "reboot: reboot_now" task in particular, RebootModule (built-in
   to the ricci-worker) is invoked, leading to an empty DBus message being
   synchronously sent to com.redhat.ricci.reboot (interface.member)

8. ...where it is intercepted by oddjobd and as per
   /etc/oddjobd.conf.d/ricci.oddjob.conf (if permissions match),
   /sbin/reboot is executed, which leads to
   reboot(LINUX_REBOOT_CMD_RESTART) syscall

now not sure, but nothing seems to show that the syscall/reboot command/
oddjobd execution/DBus blocking message delivery chain can return
asynchonously, hence the expectation:

9? ...any code after mentioned DBus blocking send call (waiting for the
   response) is not ever executed unless reboot comand fails for some
   esoteric reason

10? ... hence file being discussed in 4. - 6. is not ever updated,
    and in turn, when ricci is started again upon reboot (if chkconfig'd
    on as usual), it starts reboot cycle again

11? -> reboot loop until some intervention kicks in

Comment 1 Jan Pokorný [poki] 2013-12-17 20:58:38 UTC

Looking at 1st-gen luci code, RHEL 5 suffers from this too (one can
also trigger the isolated RPC call in question from here, via
"Reboot this node") from respective per-node drop-down menu
(code search hint: reboot_url).

On the positive side, ccs does not seem to support that RPC call,
not even in the context of adding a node or creating a cluster.

Comment 3 Jan Pokorný [poki] 2013-12-17 21:13:43 UTC

Still, I'm not sure why the "reboot nodes" option when creating a cluster
or adding a node, which effectively includes the same RPC call to the
batch to be executed, seems to be harmless so far.
Perhaps being a single task in the batch and being one of many (and near
the middle of the list) is what makes the difference?

Comment 4 Jan Pokorný [poki] 2013-12-19 18:30:37 UTC

> now not sure, but nothing seems to show that the syscall/reboot command/
> oddjobd execution/DBus blocking message delivery chain can return
> asynchonously, hence the expectation:
> 
> 9? ...any code after mentioned DBus blocking send call (waiting for the
>    response) is not ever executed unless reboot comand fails for some
>    esoteric reason
> 
> 10? ... hence file being discussed in 4. - 6. is not ever updated,
>     and in turn, when ricci is started again upon reboot (if chkconfig'd
>     on as usual), it starts reboot cycle again
> 
> 11? -> reboot loop until some intervention kicks in

Sorry, there was some noise due to subtle nuances in how
SysVinit's/upstart's reboot command (and respective syscall handling
in the kernel) works.

As per both a simple test (/sbin/reboot && echo OK || echo NOT) and code
comprehension, the truth is that reboot command with no extra switch
behaves asynchronously, and that's why reboot as triggered by ricci
should always work (but perhaps modulo rare case where reboot via
init process is too fast so that ricci-worker has no chance to
update the file, so there still may be a latent bug in this regard).

Comment 5 Jan Pokorný [poki] 2013-12-19 18:53:02 UTC

> perhaps modulo rare case where reboot via init process is too fast
> so that ricci-worker has no chance to update the file, so there still
> may be a latent bug in this regard

This is one of the possible ways to achieve the unintentional reboot
loop.  One of others is that the node is required to reboot via luci,
which is interrupted by power fencing at the point the reboot task
is "scheduled" (hence the file in queue has the potential to be triggered
again as soon as ricci is started after reboot) but not "in progress"
yet (or "in progress" but /sbin/reboot hasn't been invoked yet).

Fix should be simple both in the business logic and code-wise:
in case of "reboot" module invoction via ricci's RPC, invert
the expectation from pesimistic (success only once proved)
to optimistic (success assumed and negated only once proved).
This way, it cannot happen that reboot succeeds without task
file modified to state this success and in turn, it cannot be
initiated more than once, effectively preventing the reboot
loop.  Only in extremely rare occurrences, the reboot can be
supposed to have succeeded while in fact it did not.

Comment 7 Jan Pokorný [poki] 2014-06-20 18:35:08 UTC

Created attachment 910869 [details]
Proposed patch to make reboot loop as unlikely as possible

Comment 8 Jan Pokorný [poki] 2014-06-20 19:49:21 UTC

Created attachment 910878 [details]
Updated version of the patch

Comment 11 Jan Pokorný [poki] 2014-07-02 17:56:09 UTC

As the prerequisite condition to reproduce the original bug is rare to
reproduce due to timing sensitivity, following test result mimics such
a condition:

[note we have to disable SELinux as our "fake reboot" function needs
to operate outside of the reasonably set boundaries pertaining
the SELinux contexts]
# setenforce 0

[get some material to use and prepare for action]
# mkdir ricci-bz1044122; pushd ricci-bz1044122
# TOOLSGIT=https://git.fedorahosted.org/cgit/conga.git
# TOOLSPATH=plain/ricci/test/2-via-ssl
# TOOLS=cacert.config,generate_certs.sh,send_to_ricci
# TOOLS+=,ricci/authenticate.xml,reboot/reboot_now.xml
# eval wget ${TOOLSGIT}/${TOOLSPATH}/{${TOOLS}}
# chmod +x generate_certs.sh send_to_ricci
# ./generate_certs.sh
# service ricci stop
# PASS=password
# echo "${PASS}" | passwd --stdin -- ricci
# sed -i "s|enter_password_here|${PASS}|" authenticate.xml

[craft our own fake reboot command]
# cat >/sbin/fakereboot.sh <<EOF
#!/bin/sh
logger -t ricci-bz1044122 'told to reboot, faking it'
pkill -9 ricci
killall ricci-worker
EOF
# sed -i.orig 's|/sbin/reboot|/sbin/fakereboot.sh|' \
  /etc/oddjobd.conf.d/ricci.oddjob.conf
# service oddjobd reload

[now the core of the test]
# rm -rf /var/lib/ricci/queue/[0-9]*
# service ricci start
# ./send_to_ricci localhost authenticate.xml
> <?xml version="1.0"?>
> <ricci authenticated="true" [...]
# ./send_to_ricci localhost reboot_now.xml
# tail -n1 /var/log/messages
Jul  2 19:14:21 localhost ricci-bz1044122: told to reboot, faking it

######## IFDEF BEFORE-FIX ########

# ls /var/lib/ricci/queue/[0-9]*
> /var/lib/ricci/queue/286634334
# cat /var/lib/ricci/queue/[0-9]*
> <?xml version="1.0"?>
> <batch batch_id="286634334" status="2">
> 	<module name="reboot" status="1">
> 		<request API_version="1.0" sequence="1254">
> 			<function_call name="reboot_now"/>
> 		</request>
> 	</module>
> </batch>
# pgrep ricci || echo nothing
> nothing

[so far, we have simulated superfast reboot, which caused the ricci queue
of batches to be properly cleaned (generally expected) nor to have the
per-module status flipped to something different that "to be proceeded"
(subject of this bug to be fixed, as otherwise, when effectively rebooted
and ricci chkconfig'd on, it will start processing the request to reboot
again ...you can imagine the possible neverending loop DoSing the machine]

# service ricci start
# # tail -n 1 /var/log/messages
Jul  2 19:22:12 localhost ricci-bz1044122: told to reboot, faking it

[time for another round? ;-)  one cannot guarantee superfast reboot
to be always a case for the machine at hand]

######## ELSE ########

# ls /var/lib/ricci/queue/[0-9]*
> /var/lib/ricci/queue/2079265775
# cat /var/lib/ricci/queue/[0-9]*
> <?xml version="1.0"?>
> <batch batch_id="2079265775" status="2">
> 	<module name="reboot" status="5">
> 		<request API_version="1.0" sequence="1254">
> 			<function_call name="reboot_now"/>
> 		</request>
> 	</module>
> </batch>

# pgrep ricci || echo nothing
> nothing
# service ricci start
# tail -n 1 /var/log/messages
> Jul  2 19:24:27 localhost ricci: startup succeeded

[i.e., no reboot loop possible, ergo the main issue fixed; side negligible
issue is with cluttering the queue with reboot tasks turned to
out-of-scheduling-but-kept-around ones]

# ls /var/lib/ricci/queue/[0-9]*
> /var/lib/ricci/queue/2079265775
# cat /var/lib/ricci/queue/[0-9]*
> <?xml version="1.0"?>
> <batch batch_id="2079265775" status="0">
> 	<module name="reboot" status="5">
> 		<request API_version="1.0" sequence="1254">
> 			<function_call name="reboot_now"/>
> 		</request>
> 	</module>
> </batch>

######## ENDIF ########


Conclusion: need to decide if fixing the remaining mini-issue is worth it.

Comment 16 Jan Pokorný [poki] 2014-07-04 17:55:08 UTC

WRT. the additional aside issue pointed out at the end of [comment 11],
arising from the original design flaw: this is nothing new, but it exposed
itself in a clear light when I was fiddling with this part of code.
It has nothing to do with unscheduling particular module processing
(as now the code is instrumented to do in case of "reboot" one),
the same used to happen unconditionally:

  once the batch processing finishes after the interruption (say reboot),
  the task file is left around, (with a slight negative impact on
  subsequent ricci restarts, etc.)


Now the additional change was added and can be observed in two ways:

1. when following the previous test run, imagine this just before
   ######## ENDIF ######## mark in the [comment 11]:

# service ricci restart
# ls -1 /var/lib/ricci/queue/[0-9]* 2>/dev/null | wc -l
> 1 (ricci-0.16.2-73.el6)
> 0 (ricci-0.16.2-74.el6+)

[i.e., now there is some kind of garbage collection in place preventing
cluttering the batch queue in the form of filesystem entries with fully
proceeded batches (see 'batch status="0"' above);  this in turn lowers
the clutter of sosreports and theoretically helps ricci get accepting
the requests upon its start faster (slowdown might be remarkable in case
of tons of such futile files + slow disk access)]

---

2. even when the batch was being processed and the processing interrupted
   (machine shutdown, ricci killed, etc.), if ricci is then restarted
   after more than an 70 minutes, the batch/file is forcibly removed w/o
   further resuming; imagine this just before ######## ENDIF ########:

# service ricci stop
[restore batch so it looks like unfinished as in "IFDEF BEFORE-FIX"]
# sed -i -e 's|\(<batch.*status="\).\(">\)|\12\2|' \
  -e 's|\(<module.*status="\).\(">\)|\11\2|' /var/lib/ricci/queue/[0-9]*
# cat /var/lib/ricci/queue/[0-9]*
> <?xml version="1.0"?>
> <batch batch_id="1234" status="2">
> 	<module name="reboot" status="1">
> 		<request API_version="1.0" sequence="1254">
> 			<function_call name="reboot_now"/>
> 		</request>
> 	</module>
> </batch>

[now if ricci restarted within 70 minutes (!), "told to reboot, faking it"
 should appear in syslog (please verify) but primarily we want to
 demonstrate that after an hour, the batch is now no longer considered
 relevant and removed]

######## IF STALE (T > 70 minutes) ########

# LC=C date --set "+71 minutes"
# service ricci start
# tail -n1 /var/log/messages | grep ricci
> Jul  4 21:53:44 localhost ricci[11654]:
>   Prunning stale unfinished batch '2079265775'
# ls /var/lib/ricci/queue/[0-9]*
> ls: cannot access /var/lib/ricci/queue/[0-9]*: No such file or directory

######## ELSE (T < 70 minutes) ########

# service ricci start
# tail -n1 /var/log/messages | grep ricci
> Jul  4 20:56:37 localhost ricci-bz1044122: told to reboot, faking it

######## ENDIF #########

---

Technical note: all file-exploring commands, like the "tail" ones are
mainly for illustration, as exact reproducer would be even more difficult
to prepare (appending " | grep ricci" when tail'ing /var/log/messages
would be a good start).

Comment 17 Jan Pokorný [poki] 2014-07-04 18:20:58 UTC

Created attachment 914749 [details]
(hopefully) a final patch

Comment 21 errata-xmlrpc 2014-10-14 07:29:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1539.html

Note You need to log in before you can comment on or make changes to this bug.