Bug 1843175 - Baremetal Deployment OSP16 overcloud stops with json.decoder.JSONDecodeError
Summary: Baremetal Deployment OSP16 overcloud stops with json.decoder.JSONDecodeError
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Alex Schultz
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-02 18:25 UTC by Jason Grosso
Modified: 2022-08-23 22:49 UTC (History)
8 users (show)

Fixed In Version: tripleo-ansible-0.5.1-0.20200706173411.c53bf61.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1850715 (view as bug list)
Environment:
Last Closed: 2021-09-15 07:08:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1882134 0 None None None 2020-06-04 20:18:56 UTC
OpenStack gerrit 733691 0 None MERGED [TRAIN-AND-OLDER] Improve timeout error handling 2021-02-19 21:08:25 UTC
Red Hat Issue Tracker OSP-3038 0 None None None 2022-08-23 22:49:13 UTC
Red Hat Product Errata RHEA-2021:3483 0 None None None 2021-09-15 07:09:06 UTC

Comment 3 Alex Schultz 2020-06-04 20:00:34 UTC
For future records, when you get this error it is basically an execution timeout. In this case the NetworkConfig executed, the systems become unavailable, and ansible just hangs.  Eventually the mistral -> zaqar connection either ends or a blank response is sent and the json decode error is thrown because the response is empty.  


To trouble shoot, check the `openstack workflow execution list` and see if there is a 'failed' task. You would be able to to do an `openstack workflow execution show <id> -f yaml` to get the error. If you hit this, it will tell you to look in the ansible.log for the failure. If the last task executing is NetworkConfig then the situation described above occurred. You will need to hop on the  system's console and troubleshoot the network config. 


The improvement here might be to try and improve the 'timeout' messaging by catching this and doing some improved error handling. In future versions we've cleaned up this condition by removing the mistral to zaqar interactions.

Comment 15 David Rosenfeld 2021-07-21 13:16:53 UTC
DF doesn't have baremetal servers. Checked with submitter and problem is no longer seen.

Comment 17 errata-xmlrpc 2021-09-15 07:08:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483


Note You need to log in before you can comment on or make changes to this bug.