Red Hat Bugzilla – Bug 974319
clear_running_commands XML-RPC call fails with MemoryError
Last modified: 2018-02-05 19:41:31 EST
Description of problem:
beaker-provision was restarted with 51 Running operations in the command queue. It failed to restart because the "clear_running_commands" call back to the main server was failing with MemoryError.
Version-Release number of selected component (if applicable):
Always (with those 51 commands listed as "Running" on the server)
Steps to Reproduce:
XML-RPC call reports MemoryError and beaker-provision fails to start
Running commands are marked as Failed (aborting the associated recipes), beaker-provision starts up and begins processing commands.
Investigation has suggested that this only arises if many stale commands (see bug 974352) have accumulated in the server's command queue, and then the provisioning daemon on the lab controller is restarted.
Due to the lack of a direct link between commands and recipes, it's also risky to execute the state update callbacks for such stale commands - if the system has since been reallocated, the stale callback may end up aborting an unrelated recipe.
Accordingly, the "clear_running_commands" operation will be updated to directly execute the following SQL before moving on to handling more recent commands:
JOIN activity ON activity.id=command_queue.id
WHERE command_queue.status = "Running"
AND activity.created < DATE_ADD(UTC_TIMESTAMP(), INTERVAL -1 DAY);
Any remaining commands (those less than 24 hours old) will then be processed in independent transactions.
On Gerrit: http://gerrit.beaker-project.org/#/c/2036/
For this bug, I will verify it on July 2nd while doing acceptance test.
Beaker 0.13.2 has been released. (http://beaker-project.org/docs/whats-new/release-0.13.html#beaker-0-13-2).