Bug 1947147

Summary: [FFU OSP13 to 16.1.4] After FFU upgrade overcloud node introspection failed "an not transition from state 'uninitialized' on event 'sync''
Product: Red Hat OpenStack Reporter: Paras Babbar <pbabbar>
Component: openstack-ironic-inspectorAssignee: Julia Kreger <jkreger>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: dtantsur, hjensas, jkreger, pweeks, sbaker
Target Milestone: z8Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-inspector-9.2.4-1.20211112163411.2c85396.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2010469 2010990 (view as bug list) Environment:
Last Closed: 2022-03-24 10:59:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2010469    

Description Paras Babbar 2021-04-07 19:11:22 UTC
Description of problem:

After FFU upgrade with ironic in OC enabled the instance in overcloud failed introspection with the follwing error:


TASK [ironic-overcloud : Introspect the nodes] *********************************
changed: [titan101.lab.eng.tlv2.redhat.com] => (item=ironic-0) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": "source /home/stack/overcloudrc\nopenstack baremetal introspection start --wait ironic-0\n",
    "delta": "0:00:04.092840",
    "end": "2021-04-07 15:11:32.313959",
    "item": "ironic-0",
    "rc": 0,
    "start": "2021-04-07 15:11:28.221119"
}

STDOUT:

+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| UUID     | Error                                                                                                                                                                            |
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ironic-0 | The PXE filter driver DnsmasqFilter, state=uninitialized: my fsm encountered an exception: Can not transition from state 'uninitialized' on event 'sync' (no defined transition) |
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


STDERR:

Waiting for introspection to finish...
changed: [titan101.lab.eng.tlv2.redhat.com] => (item=ironic-1) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": "source /home/stack/overcloudrc\nopenstack baremetal introspection start --wait ironic-1\n",
    "delta": "0:00:03.396431",
    "end": "2021-04-07 15:11:36.302769",
    "item": "ironic-1",
    "rc": 0,
    "start": "2021-04-07 15:11:32.906338"
}

STDOUT:

+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| UUID     | Error                                                                                                                                                                            |
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ironic-1 | The PXE filter driver DnsmasqFilter, state=uninitialized: my fsm encountered an exception: Can not transition from state 'uninitialized' on event 'sync' (no defined transition) |
+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

STDERR:

Waiting for introspection to finish...

Version-Release number of selected component (if applicable):


How reproducible:
Happens after ffu upgrade

Steps to Reproduce:
1. FFU upgrade
2. start introspection in OC instances


Actual results:
Introspection failed

Expected results:
Introspection failed

Additional info:

Comment 1 Julia Kreger 2021-04-09 15:21:00 UTC
I've got... I guess part of a fix posted to upstream gerrit for this. In essence, I'm not sure there is a good way to handle the transitory database failure upon trying, since there are so many different factors. But that failure can set us up in a state where we cannot retry. The patch I've posted upstream should allow a retry to be processed so a client doesn't interpret it as a hard failure has occurred due to the "error" state being retained on the next introspection attempt.

CC'ing dtantsur as this issue may be more observable in his work, and as such he might have ideas on the database, but I think that i may be a dangerous path to take since the process does recover anyway when the cluster is healthy again.

Comment 4 Julia Kreger 2021-06-09 13:33:15 UTC
Fix for this issue is in review upstream.

Comment 5 Julia Kreger 2021-10-04 18:05:31 UTC
Backport for this change has been proposed upstream.

Comment 14 errata-xmlrpc 2022-03-24 10:59:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986