Bug 1529735 - ae_max_retries does not show the correct value after one retry if called through multiple relations each limited by different max retries counts
Summary: ae_max_retries does not show the correct value after one retry if called thro...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate
Version: 5.8.0
Hardware: All
OS: All
high
high
Target Milestone: GA
: 5.10.0
Assignee: Tina Fitzgerald
QA Contact: Dmitry Misharov
URL:
Whiteboard:
Depends On:
Blocks: 1531303
TreeView+ depends on / blocked
 
Reported: 2017-12-29 17:07 UTC by Felix Dewaleyne
Modified: 2021-06-10 14:04 UTC (History)
6 users (show)

Fixed In Version: 5.10.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1531303 (view as bug list)
Environment:
Last Closed: 2018-06-21 20:50:25 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Felix Dewaleyne 2017-12-29 17:07:15 UTC
Description of problem:
ae_max_retries does not show the correct value after one retry

Version-Release number of selected component (if applicable):
5.8.2

How reproducible:
customer environemnt

Steps to Reproduce:
1. create a state machine that references an instance calling a method
2. make the max_retry of the call to that instance an arbitrary number such as 3
3. in the method, make sure you write into the logs the value of ae_max_retries

Actual results:
ae_max_retries jumps to 200 at the second run

Expected results:
ae_max_retries remains the same

Additional info:

Comment 3 Felix Dewaleyne 2018-01-01 09:33:49 UTC
confirmed : the higher number of attempts seen on the second run comes from another automate that was calling this one.

it seems the bug is rather that the ae_max_retries gets overwritten by the top parent on the first failure.

Comment 4 Tina Fitzgerald 2018-01-02 19:37:12 UTC
Investigating issue.... 

I see the max_retry change from 3 to 200.

before retry:
[----] I, [2017-12-20T10:13:13.068479 #9345:102cdc0]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 0, max_retries: 3, ae_state_retries: 0, ae_state_max_retries: 3
[----] I, [2017-12-20T10:13:13.098374 #9345:102cdc0]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '0', retrying...
[----] I, [2017-12-20T10:13:13.121685 #9345:eb7134]  INFO -- : <AutomationEngine> <AEMethod [/v.b/Integration/VMware/VCenter/Methods/vcenter_service_assigned]> Ending

first retry:
----] I, [2017-12-20T10:14:09.369384 #10492:f6db28]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 2, max_retries: 200, ae_state_retries: 2, ae_state_max_retries: 200
[----] I, [2017-12-20T10:14:09.401014 #10492:f6db28]  INFO -- : <AutomationEngine> <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '2', retrying...

Comment 5 Felix Dewaleyne 2018-01-03 10:24:39 UTC
Due to new data from the customer I wish to make an update on the steps to reproduce.

you need to call two layers of state machines, the topmost using a max_retry of 50 for example, calling a state machine that will use a max_retry of 2 calling the instance that will call the method. The result is that on the second retry the value will jump to the count of the topmost and not continue from 2.

Comment 8 Tina Fitzgerald 2018-01-03 16:51:33 UTC
Hi Felix,

Thanks for the additional information.
  
Although the $evm.root['ae_state_max_retries'] value is incorrect, the state machine is processed with the correct max retries value.

Customer log shows incorrect value of ae_state_max_retries = 200:
[----] I, [2017-12-20T10:14:38.009888 #9345:102c758]  INFO -- : <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - state_retries: 3, max_retries: 200, ae_state_retries: 3, ae_state_max_retries: 200
[----] I, [2017-12-20T10:14:38.017976 #9345:102c758]  INFO -- : <AEMethod vcenter_service_assigned> /v.b/Integration/VMware/VCenter/Methods/VcenterServiceAssigned - VM 'test_fqdn' is not still part of any service, count '3', retrying...
[----] I, [2017-12-20T10:14:38.037778 #9345:eb7134]  INFO -- : <AEMethod [/v.b/Integration/VMware/VCenter/Methods/vcenter_service_assigned]> Ending
[----] I, [2017-12-20T10:14:38.037900 #9345:eb7134]  INFO -- : Method exited with rc=MIQ_OK

....

State machine correctly aborts resolution due to the max_retries = 3:
----] I, [2017-12-20T10:15:03.083971 #9345:eb7134]  INFO -- : Processing State=[state2]
[----] E, [2017-12-20T10:15:03.084187 #9345:eb7134] ERROR -- : State=<state2> running  raised exception: <number of retries <4> exceeded maximum of <3>>
[----] W, [2017-12-20T10:15:03.084310 #9345:eb7134]  WARN -- : Error in State=[state2]

I'm working on a fix for the incorrect root object value.

Let me know if you have any questions.

Thanks,
Tina

Comment 10 CFME Bot 2018-01-04 13:58:02 UTC
New commit detected on ManageIQ/manageiq-automation_engine/master:
https://github.com/ManageIQ/manageiq-automation_engine/commit/eb92959679fbb419d14e1625b6f201550e4189d0

commit eb92959679fbb419d14e1625b6f201550e4189d0
Author:     Tina Fitzgerald <tfitzger>
AuthorDate: Wed Jan 3 13:18:10 2018 -0500
Commit:     Tina Fitzgerald <tfitzger>
CommitDate: Wed Jan 3 17:37:54 2018 -0500

    Fix incorrect root['ae_state_max_retries'] value.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1529735

 .../engine/miq_ae_engine/miq_ae_state_info.rb                  |  2 +-
 .../engine/miq_ae_engine/miq_ae_state_machine.rb               |  2 +-
 spec/engine/miq_ae_state_machine_multi_spec.rb                 | 10 ++++++----
 3 files changed, 8 insertions(+), 6 deletions(-)


Note You need to log in before you can comment on or make changes to this bug.