Bug 1826298
| Summary: | even when I cancel ReX job, remediation still shows it as running | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Jan Hutař <jhutar> |
| Component: | RH Cloud - Cloud Connector | Assignee: | Adam Ruzicka <aruzicka> |
| Status: | CLOSED ERRATA | QA Contact: | Lukáš Hellebrandt <lhellebr> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.7.0 | CC: | aruzicka, ehelms, inecas, jyejare, sdunning |
| Target Milestone: | 6.8.0 | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python3-receptor-satellite-1.0.2 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 13:01:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Hutař
2020-04-21 11:46:31 UTC
QUERY: ------------- Check Working and NonWorking scenario. Working Scenario: ======================= Steps: ------------- 1. Schedule remediation run from RH Cloud. 2. While the job is still running, Cancel the job on Satellite side, it is canceled (state "failed") 3. Checked Remediations status on RH Cloud Observation: -------------- The RH CLoud shows the remediation status as Failed and shows similar status as in Satellite job `Job has been canceled by user`. Non-Working Scenario(The Actual bug Steps in description): ============================ Steps: ------------ 1. Schedule remediation run while previous one is still running. (@ consecutive runs) 2. Cancel all the jobs on Satellite side, make sure it is canceled (state "failed" 3. Check Remediations status on RH Cloud Observation: -------------- The remediation status shows running and also succeeded in the end. Not sure if it bcz playbook execution finished before the job actually get cancelled. Whats the correct way of verifying this bug Working / Non Working? Talked about this with Jitendra on friday. This whole thing is hard to time correctly. On Jitendra's test machines, all the jobs showed up in satellite as cancelled by the user, but at the same time exited with exit code 0. Anything that exits with 0 is considered successful and so it was reported to cloud. In this specific case, I'd say the behavior of cloud+receptor is correct. If the process exits with 0 on its own, it doesn't matter if we tried to cancel it or not, it still managed to end successfully on its own. The key here is to find more things to remediate (maybe something involving a reboot?) which will take longer time to give us more space to actually cancel the job in satellite. An alternative would be to tweak the template that is being used in Satellite to include a sleep or something at the end, but that would be rather hacky Verified: ------------------- 1. For Single cancellation of job look at the `comment 6 - Working Scenario` section. 2. For Consecutive cancellation of the job here are the updates, this time with the job that takes more time to finish :- Steps: ------------ 1. Schedule remediation runs, while the previous one is still running. (consecutive runs). 2. Cancel all the jobs on the Satellite side, make sure it is canceled (state "failed"). 3. Check Remediations status on RH Cloud. Observation: -------------- 1. The RH CLoud shows the remediation status of the job as Failed and all job tasks status as Failed in Satellite job tasks. 2. Both the job task logs are showing - `Job has been canceled by user'. Note: --------- I tested this scenario with 2 consecutive jobs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.8 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4366 |