Bug 1529735

Summary: ae_max_retries does not show the correct value after one retry if called through multiple relations each limited by different max retries counts
Product: Red Hat CloudForms Management Engine Reporter: Felix Dewaleyne <fdewaley>
Component: AutomateAssignee: Tina Fitzgerald <tfitzger>
Status: CLOSED CURRENTRELEASE QA Contact: Dmitry Misharov <dmisharo>
Severity: high Docs Contact:
Priority: high    
Version: 5.8.0CC: cpelland, fdewaley, mkanoor, obarenbo, tfitzger, vaclav.miller
Target Milestone: GAKeywords: TestOnly
Target Release: 5.10.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 5.10.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1531303 (view as bug list) Environment:
Last Closed: 2018-06-21 20:50:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1531303    

Description Felix Dewaleyne 2017-12-29 17:07:15 UTC
Description of problem:
ae_max_retries does not show the correct value after one retry

Version-Release number of selected component (if applicable):
5.8.2

How reproducible:
customer environemnt

Steps to Reproduce:
1. create a state machine that references an instance calling a method
2. make the max_retry of the call to that instance an arbitrary number such as 3
3. in the method, make sure you write into the logs the value of ae_max_retries

Actual results:
ae_max_retries jumps to 200 at the second run

Expected results:
ae_max_retries remains the same

Additional info:

Comment 3 Felix Dewaleyne 2018-01-01 09:33:49 UTC
confirmed : the higher number of attempts seen on the second run comes from another automate that was calling this one.

it seems the bug is rather that the ae_max_retries gets overwritten by the top parent on the first failure.

Comment 4 Tina Fitzgerald 2018-01-02 19:37:12 UTC
Investigating issue.... 

I see the max_retry change from 3 to 200.

before retry:
[----] I, [2017-12-20T10:13:13.068479 #9345:102cdc0]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 0, max_retries: 3, ae_state_retries: 0, ae_state_max_retries: 3
[----] I, [2017-12-20T10:13:13.098374 #9345:102cdc0]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '0', retrying...
[----] I, [2017-12-20T10:13:13.121685 #9345:eb7134]  INFO -- : <AutomationEngine> <AEMethod [/v.b/Integration/VMware/VCenter/Methods/vcenter_service_assigned]> Ending

first retry:
----] I, [2017-12-20T10:14:09.369384 #10492:f6db28]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 2, max_retries: 200, ae_state_retries: 2, ae_state_max_retries: 200
[----] I, [2017-12-20T10:14:09.401014 #10492:f6db28]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '2', retrying...

Comment 5 Felix Dewaleyne 2018-01-03 10:24:39 UTC
Due to new data from the customer I wish to make an update on the steps to reproduce.

you need to call two layers of state machines, the topmost using a max_retry of 50 for example, calling a state machine that will use a max_retry of 2 calling the instance that will call the method. The result is that on the second retry the value will jump to the count of the topmost and not continue from 2.

Comment 8 Tina Fitzgerald 2018-01-03 16:51:33 UTC
Hi Felix,

Thanks for the additional information.
  
Although the $evm.root['ae_state_max_retries'] value is incorrect, the state machine is processed with the correct max retries value.

Customer log shows incorrect value of ae_state_max_retries = 200:
[----] I, [2017-12-20T10:14:38.009888 #9345:102c758]  INFO -- : <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 3, max_retries: 200, ae_state_retries: 3, ae_state_max_retries: 200
[----] I, [2017-12-20T10:14:38.017976 #9345:102c758]  INFO -- : <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '3', retrying...
[----] I, [2017-12-20T10:14:38.037778 #9345:eb7134]  INFO -- : <AEMethod [/v.b/Integration/VMware/VCenter/Methods/vcenter_service_assigned]> Ending
[----] I, [2017-12-20T10:14:38.037900 #9345:eb7134]  INFO -- : Method exited with rc=MIQ_OK

....

State machine correctly aborts resolution due to the max_retries = 3:
----] I, [2017-12-20T10:15:03.083971 #9345:eb7134]  INFO -- : Processing State=[state2]
[----] E, [2017-12-20T10:15:03.084187 #9345:eb7134] ERROR -- : State=<state2> running  raised exception: <number of retries <4> exceeded maximum of <3>>
[----] W, [2017-12-20T10:15:03.084310 #9345:eb7134]  WARN -- : Error in State=[state2]

I'm working on a fix for the incorrect root object value.

Let me know if you have any questions.

Thanks,
Tina

Comment 10 CFME Bot 2018-01-04 13:58:02 UTC
New commit detected on ManageIQ/manageiq-automation_engine/master:
https://github.com/ManageIQ/manageiq-automation_engine/commit/eb92959679fbb419d14e1625b6f201550e4189d0

commit eb92959679fbb419d14e1625b6f201550e4189d0
Author:     Tina Fitzgerald <tfitzger>
AuthorDate: Wed Jan 3 13:18:10 2018 -0500
Commit:     Tina Fitzgerald <tfitzger>
CommitDate: Wed Jan 3 17:37:54 2018 -0500

    Fix incorrect root['ae_state_max_retries'] value.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1529735

 .../engine/miq_ae_engine/miq_ae_state_info.rb                  |  2 +-
 .../engine/miq_ae_engine/miq_ae_state_machine.rb               |  2 +-
 spec/engine/miq_ae_state_machine_multi_spec.rb                 | 10 ++++++----
 3 files changed, 8 insertions(+), 6 deletions(-)