Description of problem: retirement requests that hit an error cannot be attempted again due to way the state is handled Version-Release number of selected component (if applicable): 5.10.2 How reproducible: all the time Steps to Reproduce: 1.create a service bundle and use stock retirmeent 2.retire bundle, causing an error on check_retirement (on purpose or otherwise) 3. attempt to retire the bundle again Actual results: the second attempt (and all following attempts) fails at "start_retirement" due to the state the service bundle being retired was left in Expected results: after an error happens durring retirement, regardless of which step it happens in, it should be possible to re-run the retirement request Additional info: the actual error is due to calling on the service bundle POST POST /api/services/<<id>> { "action" : "retire" } instead of POST POST /api/services/<<id>> { "action" : "request_retire" } see bz#1698116
Hi Felix, We queried the customer database loaded at moneta.usersys.redhat.com, and found these 2 Services have a retirement_state of retiring. "c5cac7b29 Dynamic Testing Env-20190409-130040" "Dynamic Testing Env-20190409-154406" Assuming these Services are not being currently retired, the customer can use the UI on the My Services page, to set a retirement date in the future which will reset the retirement_state field so that the service can be retired. This query can be used in console to show services in this state: Service.select { |s| s.retirement_state == 'retiring'} Thanks, Tina
there were issues with the scheduler to pick the retirement tasks if they were set 2 minutes into the future. the customer has shared his expectations of how retirement should be working, especially when retirement fails : The expected solution when service retirement fails would be: - the service retirement_state will not stay in status "retiring" but will be changed to 'failed' (retirement_failed, error - choose as you want) - it will be allowed to rerun service retirement (ui, api, any other way as setting retirement date) when retirement_state == 'failed' The immediate situation has been addressed but we need to look into how to prevent similar situations from happening again.
Hi Felix, If retirement, when initiated by the new retirement API call, were to encounter an error, the service retirement_state will change from "retiring" to 'error' which would allow you to initiate retirement again. The old retirement API call remains for customers using custom state machines. Let me know if you have any questions. Thanks, Tina
(In reply to Tina Fitzgerald from comment #5) > Hi Felix, > > If retirement, when initiated by the new retirement API call, were to > encounter an error, the service retirement_state will change from "retiring" > to 'error' which would allow you to initiate retirement again. > > The old retirement API call remains for customers using custom state > machines. > > Let me know if you have any questions. > > Thanks, > Tina the customer I raised this bugzilla for was using a custom retirement state machine from a previous major release and ran into this issue. I worry that other customers with similar situations will run into the same problem as they keep using the old retirement API call
CloudForms seems to be using the new workflow even with the old API calls.
the customer wishes for us to look into why when a 5.10 automate-based retirement fails (with action=retire being used) it isn't possible to try another time to retire it. Effectively every attempt after the first is met with a "retirement already in progress" error. the state of retirement isn't being cleared. He wants for us to look into making this not happen.
Hi Felix, We're going to modify the Service retirement state machine to abort in start_retirement if the service_retire_task is not found, which will catch the scenario where the old "retire" API is used. We'll have Automate code for the customer to test shortly. Thanks, Tina
Hi Felix, I have created a workaround for the customer. I am including a domain. Just download, import and enable. If the customer uses the old api call, instead of setting the retirement to 'retiring', this code will Abort with the following messages: ERROR -- : <AEMethod start_retirement> Service retire task not found ERROR -- : <AEMethod start_retirement> The old style retirement is incompatible with the new retirement state machine. Then if the customer uses the right api call "request_retire", the service can be retired. Let me know if you have any questions .... Billy
See Comments 11 & 12 above
https://github.com/ManageIQ/manageiq-content/pull/530
New commit detected on ManageIQ/manageiq-content/master: https://github.com/ManageIQ/manageiq-content/commit/8b95631a8ac1d0613ef51040f56ce6712e06b472 commit 8b95631a8ac1d0613ef51040f56ce6712e06b472 Author: william fitzgerald <wfitzger> AuthorDate: Tue Apr 30 17:28:41 2019 -0400 Commit: william fitzgerald <wfitzger> CommitDate: Tue Apr 30 17:28:41 2019 -0400 Fix Service Retirement requests leaving the Service is a state of 'retiring'. This will fix the problem when someone uses the older api call and leaves the service in a 'retiring' state. If there is no task, the process is aborted and the state is 'initializing'. This will allow the retirement process will start again instead of being denied because the state is 'retiring' fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1698480 Updated update_service_retirement_status_spec Moved check for service_retire_task before notification as requested. Removed tap and changed code as requested Changed 'fred' to 'active' as requested content/automate/ManageIQ/Service/Retirement/StateMachines/Methods.class/__methods__/start_retirement.rb | 6 + content/automate/ManageIQ/Service/Retirement/StateMachines/ServiceRetirement.class/__methods__/update_service_retirement_status.rb | 2 +- spec/content/automate/ManageIQ/Service/Retirement/StateMachines/ServiceRetirement.class/__methods__/update_service_retirement_status_spec.rb | 21 +- 3 files changed, 25 insertions(+), 4 deletions(-)
Verified in Version 5.11.0.8.20190611155126_01e077e