Description of problem: We frequently use beaker to have machine populated with OS and application suite installation and maybe the application suite populated with some data. The last test is /distribution/reservesys. When I then want to release the system and I cancel it from the WebUI, the overall result is Warn because that's what beaker marks there. It would be nice if the reservesys step could be done not via a test but via some other mechanism, from the point of view of the scheduler, so that the overall result of all the tests (sans the reservesys) could be Pass, the test would be done and finished, and the fact that the machine is reserved would be tracked by beaker separately, optionally based on the results of the previous tests. Version-Release number of selected component (if applicable): Version - 0.5.58 How reproducible: Deterministic. Steps to Reproduce: 1. Run job which has /distribution/reservesys as the last test. Actual results: The test is in state Running until the reservesys finishes. If you cancel it (release the machine) from the WebUI, the status is Warn. Expected results: Make it possible (maybe via some attribute to recipeSet) to reservesys the machines used for the test in such a way that the reservesys would not account into the overall state of the job. It would just be a reservation done after the tests passed. Actually, it would be even better if it could be specified condition under which the system(s) should be reservesys'ed. Frequently, when running the daily or weekly regression tests, we do not want the machine reserved if the tests passed, but we want it reserved if some of the tests failed. So the attributes could be reservesys-time="12h" and reservesys-cond="failed" with the accepted values for reservesys-time being some time-ish specification, and reservesys-cond would have values "passed" or "failed", and if specified, it would default reservesys-time to "1d" for one day. Additional info:
*** Bug 601235 has been marked as a duplicate of this bug. ***
(In reply to comment #0) > or weekly regression tests, we do not want the machine reserved if the tests > passed, but we want it reserved if some of the tests failed. I should have added: ... to investigate the cause of the failure, on the machine where it happened, without having to re-run the whole test set again.
+1 It should also keep machine "reserved" for given amount of time [1] after recipe was Aborted (e.g. by external watchdog) and *after the notification* was sent to job owner. This can not be implemented by harness, as machine/harness may be corrupted. This should allow extending the watchdog. As currently watchdog requires a task, this could be implemented using empty task - i.e. task without rpm element. Harness would handle the situation correctly and would not touch the task. [1] Default configured on the server optionally overridden by the recipe.
Jan, does the new RESERVE_IF_FAIL=1 parameter for /distribution/reservesys meet your needs here?
(In reply to comment #4) > Jan, does the new RESERVE_IF_FAIL=1 parameter for /distribution/reservesys > meet your needs here? It does, mostly. There's still the issue of distinguishing jobs in state Running (== executing all the tests) from jobs that have Finished all the tests but are Reserved for certain period of time, and not counting the result of the reservesys into the overall job result. My assumption is that noone (except maybe beaker and beakerlib developers) really puts /distribution/reservesys into their jobs as a test that should influence job's result -- it's there to get access to the machine. If beaker could account for that, it would be great.
Bulk reassignment of issues as Bill has moved to another team.
*** Bug 913339 has been marked as a duplicate of this bug. ***
*** Bug 903691 has been marked as a duplicate of this bug. ***
Some good suggestions from Nick on how to represent this in the job XML are in the duped bug 903691.
My suggestions are actually in bug 913339. Reproducing the suggestion here for convenient reference: A more elegant alternative may be to define an actual "reservesys" element, which may be specified at the job level, the recipe set level, or the individual recipe level (applying to all recipes covered by the scope where it appears). (A simpler starting point would be to allow the new element only at the recipe level) A possible definition of such an element (ignoring things like making the attributes optional and giving them specific data types): <element name="reservesys"> <attribute name="onabort"/> <attribute name="onfail"/> <attribute name="onwarn"/> <attribute name="onpass"/> <attribute name="duration"/> </element> By default, the system would be reserved regardless of how the recipe finishes. The duration setting would control how long the automatic reservation would be for (defaulting to 24 hours) onabort/fail/warn/pass would control whether or not the system is actually reserved in the relevant situations: - onabort = reserve system if a task aborts - onfail = reserve system if at least one task reports a failed result - onwarn = reserve system if at least one task reports a warning - onpass = reserve system if none of the above occur All four settings would default to "true", so the reservation is unconditional by default.
Since the reservation task currently only works at the recipe level, it would make sense for the <reservesys/> element to initially be allowed only at the recipe level. Supporting it at higher levels would then be a possible future enhancement.
Ack'ed for the version described in comments 10 & 11. The "onabort" handling should also cover the case of External Watchdog triggering, which means this is a substantial improvement over the existing reservesys task. It will also complement the new alternative harness support, as it will be usable even with harnesses that don't support execution of RHTS style tasks.
*** Bug 745960 has been marked as a duplicate of this bug. ***
One more detail is that when a recipe or job is explicitly cancelled through the web UI or CLI/XML-RPC API, the system should *not* be reserved.
*** Bug 979840 has been marked as a duplicate of this bug. ***
Added dependency on bug 651479, since it makes sense to build this mechanism on top of the changes planned to make that possible.
http://gerrit.beaker-project.org/#/c/3075/
Removed the dependency, since Amit came up with an approach that tied in more cleanly with the existing watchdog mechanism.
The initial implementation of this feature allows adding a new XML element ``<reservesys>`` to a recipe. This will reserve a system after the tests have been completed. An optional attribute, ``duration`` can be used to specify the time of the reservation in seconds. By default it is 86400 seconds or 24 hours.
Tips for verification: As stated in comment 19 above, this feature adds a new element <reservesys> with an optional attribute 'duration'. If a recipe has <reservesys/>, after all the tests have completed, the system will be reserved for 24 hours. If a duration is specified, such as <reservesys duration="3600"/>, it will reserve the system for 1 hour and so on. When the system is reserved, the recipe's status will show "Reserved" and the time remaining will also be shown. The system can be returned early using the "Release System" button on the Web UI, else it will be claimed back by Beaker after the reservation period is over. PS: Note that the existing /distribution/reservesys task will still continue working the way it has been.
Note that one key step for verification will be to use some bad settings that cause the distro installation step to fail. The existing system reservation mechanism will skip in that case, while the new mechanism will reserve the system. Similarly, a test that emits a fake kernel panic message should still be reserved with the new mechanism. If a job or recipe set is explicitly *cancelled* by the user, then a recipe with <reservesys> should still be cancelled without being reserved. That can be tested while waiting for installation (for example).
Is there a possibility to release the system from the reserved system command line? Going to UI to individually release systems is overhead when compared to releasing them from command line just after finishing the work there.
I understand your point and we are aware of it. However, in the initial release this is not possible, but we will have one soon hopefully.
Bug #1102442 now covers adding the ability to release these systems via "bkr system-release". We're still considering whether or not we want to offer the ability to release reserved systems as part of the harness API (and add an appropriate command to the beah harness or RHTS).
(In reply to Nick Coghlan from comment #27) > Bug #1102442 now covers adding the ability to release these systems via "bkr > system-release". We're still considering whether or not we want to offer the > ability to release reserved systems as part of the harness API (and add an > appropriate command to the beah harness or RHTS). Thanks for info. I do definitely see added value of releasing from reserved machine, as explained above. What are the reasons why not to implement it?
Complexity, mainly - when the release command is run from the reserved machine, it needs to be supported in every test harness (some of which aren't maintained by the core Beaker team), as well as in the harness API on the lab controller. You also *can't* run the release command on the target system in cases where there's no test harness installed in the first place (as occurs with manual provisioning). In the case of Manual reservations and aborted installations, there may not even *be* a usable test harness on the target system. By contrast, once bug #1102442 is addressed, following up a "ssh <FQDN>" command with "bkr system-release <FQDN>" when you're done will work regardless of the original reservation mechanism (bkr system-reserve, taking the system in the web UI, /distribution/reservesys task, <reservesys/> element) and independently of the test harness (if any) installed on the target system.
(In reply to Nick Coghlan from comment #29) > Complexity, mainly - when the release command is run from the reserved > machine, it needs to be supported in every test harness (some of which > aren't maintained by the core Beaker team), as well as in the harness API on > the lab controller. You also *can't* run the release command on the target > system in cases where there's no test harness installed in the first place > (as occurs with manual provisioning). In the case of Manual reservations and > aborted installations, there may not even *be* a usable test harness on the > target system. Thank you for explanation. From my POW having supported API and in one major harness can be enough if there is also bkr system-release at the same time. Other harnesses then can use the API depending on their choice. > By contrast, once bug #1102442 is addressed, following up a "ssh <FQDN>" > command with "bkr system-release <FQDN>" when you're done will work > regardless of the original reservation mechanism (bkr system-reserve, taking > the system in the web UI, /distribution/reservesys task, <reservesys/> > element) and independently of the test harness (if any) installed on the > target system. In most cases I connect to several machines at once using cssh (usually all archs for given product). Sometimes I do release these one by one, sometimes all at once. By using external command it is possible but not so comfortable. At least, can we consider bkr system-release to handle more fqdn at once? I will post this idea to bug you mentioned.
I actually created a separate bug for releasing more hosts at once: bug 1103156.
Beaker 0.17.0 has been released.