1698480 – service bundle retirement requests that hit an error cannot be attempted again due to way the state is handled

Bug 1698480 - service bundle retirement requests that hit an error cannot be attempted again due to way the state is handled

Summary: service bundle retirement requests that hit an error cannot be attempted agai...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Automate
Sub Component:
Version:	5.10.2
Hardware:	All
OS:	All
Priority:	urgent
Severity:	urgent
Target Milestone:	GA
Target Release:	5.11.0
Assignee:	Tina Fitzgerald
QA Contact:	Niyaz Akhtar Ansari
Docs Contact:	Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:	1704905 1713477
TreeView+	depends on / blocked

Reported:	2019-04-10 13:00 UTC by Felix Dewaleyne
Modified:	2019-12-13 14:57 UTC (History)
CC List:	11 users (show)
Fixed In Version:	5.11.0.6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1713477 (view as bug list)
Environment:
Last Closed:	2019-12-13 14:57:47 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1698116	0	high	CLOSED	service bundle retirement initiated by API fails at check_service_retirement on an unknown method error	2021-02-22 00:41:40 UTC

Description Felix Dewaleyne 2019-04-10 13:00:36 UTC

Description of problem:
retirement requests that hit an error cannot be attempted again due to way the state is handled

Version-Release number of selected component (if applicable):
5.10.2

How reproducible:
all the time

Steps to Reproduce:
1.create a service bundle and use stock retirmeent
2.retire bundle, causing an error on check_retirement (on purpose or otherwise)
3. attempt to retire the bundle again 

Actual results:
the second attempt (and all following attempts) fails at "start_retirement" due to the state the service bundle being retired was left in

Expected results:
after an error happens durring retirement, regardless of which step it happens in, it should be possible to re-run the retirement request

Additional info:
the actual error is due to calling on the service bundle 
POST POST /api/services/<<id>>
{
	"action" : "retire"
}
instead of 

POST POST /api/services/<<id>>
{
	"action" : "request_retire"
}

see bz#1698116

Comment 3 Tina Fitzgerald 2019-04-10 18:25:01 UTC

Hi Felix,

We queried the customer database loaded at moneta.usersys.redhat.com, and found these 2 Services have a retirement_state of retiring.

"c5cac7b29 Dynamic Testing Env-20190409-130040"
"Dynamic Testing Env-20190409-154406"

Assuming these Services are not being currently retired, the customer can use the UI on the My Services page, to set a retirement date in the future which will reset the retirement_state field so that the service can be retired.

This query can be used in console to show services in this state:  
Service.select { |s| s.retirement_state == 'retiring'}

Thanks,
Tina

Comment 4 Felix Dewaleyne 2019-04-16 12:05:49 UTC

there were issues with the scheduler to pick the retirement tasks if they were set 2 minutes into the future.

the customer has shared his expectations of how retirement should be working, especially when retirement fails :

The expected solution when service retirement fails would be:
- the service retirement_state will not stay in status "retiring" but will be changed to 'failed' (retirement_failed, error - choose as you want)
- it will be allowed to rerun service retirement (ui, api, any other way as setting retirement date) when retirement_state == 'failed'

The immediate situation has been addressed but we need to look into how to prevent similar situations from happening again.

Comment 5 Tina Fitzgerald 2019-04-16 17:48:41 UTC

Hi Felix,

If retirement, when initiated by the new retirement API call, were to encounter an error, the service retirement_state will change from "retiring" to 'error' which would allow you to initiate retirement again.

The old retirement API call remains for customers using custom state machines. 

Let me know if you have any questions.

Thanks,
Tina

Comment 6 Felix Dewaleyne 2019-04-24 08:40:38 UTC

(In reply to Tina Fitzgerald from comment #5)
> Hi Felix,
> 
> If retirement, when initiated by the new retirement API call, were to
> encounter an error, the service retirement_state will change from "retiring"
> to 'error' which would allow you to initiate retirement again.
> 
> The old retirement API call remains for customers using custom state
> machines. 
> 
> Let me know if you have any questions.
> 
> Thanks,
> Tina

the customer I raised this bugzilla for was using a custom retirement state machine from a previous major release and ran into this issue. I worry that other customers with similar situations will run into the same problem as they keep using the old retirement API call

Comment 7 Felix Dewaleyne 2019-04-24 08:41:23 UTC

CloudForms seems to be using the new workflow even with the old API calls.

Comment 9 Felix Dewaleyne 2019-04-29 12:54:16 UTC

the customer wishes for us to look into why when a 5.10 automate-based retirement fails (with action=retire being used) it isn't possible to try another time to retire it.

Effectively every attempt after the first is met with a "retirement already in progress" error. the state of retirement isn't being cleared. 

He wants for us to look into making this not happen.

Comment 10 Tina Fitzgerald 2019-04-30 17:31:52 UTC

Hi Felix,

We're going to modify the Service retirement state machine to abort in start_retirement if the service_retire_task is not found, which will catch the scenario where the old "retire" API is used.  We'll have Automate code for the customer to test shortly.

Thanks,
Tina

Comment 11 William Fitzgerald 2019-04-30 19:38:21 UTC

Hi Felix,

I have created a workaround for the customer.  I am including a domain.  Just download, import and enable.
If the customer uses the old api call, instead of setting the retirement to 'retiring', this code will Abort with the following messages: 

ERROR -- : <AEMethod start_retirement> Service retire task not found
ERROR -- : <AEMethod start_retirement> The old style retirement is incompatible with the new retirement state machine.

Then if the customer uses the right api call "request_retire", the service can be retired.

Let me know if you have any questions ....

Billy

Comment 13 William Fitzgerald 2019-04-30 19:40:36 UTC

See Comments 11 & 12 above

Comment 14 CFME Bot 2019-04-30 21:36:30 UTC

https://github.com/ManageIQ/manageiq-content/pull/530

Comment 17 CFME Bot 2019-05-23 18:26:25 UTC

New commit detected on ManageIQ/manageiq-content/master:

https://github.com/ManageIQ/manageiq-content/commit/8b95631a8ac1d0613ef51040f56ce6712e06b472
commit 8b95631a8ac1d0613ef51040f56ce6712e06b472
Author:     william fitzgerald <wfitzger>
AuthorDate: Tue Apr 30 17:28:41 2019 -0400
Commit:     william fitzgerald <wfitzger>
CommitDate: Tue Apr 30 17:28:41 2019 -0400

    Fix Service Retirement requests leaving the Service is a state of 'retiring'.

    This will fix the problem when someone uses the older api call and leaves the service in a 'retiring' state.
    If there is no task, the process is aborted and the state is 'initializing'.

    This will allow the retirement process will start again instead of being denied because the state is 'retiring'

    fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1698480

    Updated update_service_retirement_status_spec
    Moved check for service_retire_task before notification as requested.
    Removed tap and changed code  as requested
    Changed 'fred' to 'active' as requested

 content/automate/ManageIQ/Service/Retirement/StateMachines/Methods.class/__methods__/start_retirement.rb | 6 +
 content/automate/ManageIQ/Service/Retirement/StateMachines/ServiceRetirement.class/__methods__/update_service_retirement_status.rb | 2 +-
 spec/content/automate/ManageIQ/Service/Retirement/StateMachines/ServiceRetirement.class/__methods__/update_service_retirement_status_spec.rb | 21 +-
 3 files changed, 25 insertions(+), 4 deletions(-)

Comment 19 Niyaz Akhtar Ansari 2019-06-14 12:14:07 UTC

Verified in Version 5.11.0.8.20190611155126_01e077e

Note You need to log in before you can comment on or make changes to this bug.