Bug 2149782 - [RHOSP 13 to 16.2 Upgrades][ nova_compute container stuck in restarting state or unhealthy after executing nova_hybrid_state tasks as part of FFU
Summary: [RHOSP 13 to 16.2 Upgrades][ nova_compute container stuck in restarting state...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: OSP Team
QA Contact: Khomesh Thakre
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-30 22:40 UTC by Jacob Ansari
Modified: 2023-11-08 19:19 UTC (History)
18 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230717085025.1608f56.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 19:18:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 866238 0 None MERGED [TRAIN-ONLY] Passing NovaComputeOpt{Volumes,EnvVars} to hybrid containers 2023-09-25 10:31:04 UTC
Red Hat Issue Tracker OSP-20601 0 None None None 2022-11-30 22:52:45 UTC
Red Hat Product Errata RHBA-2023:6307 0 None None None 2023-11-08 19:19:01 UTC

Description Jacob Ansari 2022-11-30 22:40:06 UTC
Description of problem:

After running [1] as part of [2], all the computes have a nova_compute container that is either unhealthy or bouncing in a loop (varies). This occurs in a Juniper Contrail backed environment .

The errors observed are mainly along the lines of [3] which, according to Juniper,are due to the bindmounts for the Nova container pointing towards /usr/lib/python3.6/site-packages (which do not exist yet) rather than the previous /usr/lib/python2.7/site-packages/ .


If the upgrade  is continued on a given compute through the LEAP part and beyond, the nova containers become functional and healthy . 

We have the following questions in this context :

1) Is the observed behavior normal/expected (i.e. is it safe to continue FFU in spite of this) ? 
2) If not, what is the expected behavior and what is the actual impact of observed behavior versus expected behavior (if any) ?  
3) If behavior is normal/expected should it be documented ? 
4) Is this possibly Contrail specific or likely to affect other third-party ML2 setups ? 



[1]
openstack overcloud upgrade run [--stack <stack_name>] --playbook upgrade_steps_playbook.yaml --tags nova_hybrid_state --limit all


[2]
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/framework_for_upgrades_13_to_16.2/index

[3]
ERROR oslo_service.service nova.exception.InternalError: Failure running os_vif plugin plug method: No VIF plugin was found with the name vrouter


Version-Release number of selected component (if applicable):


How reproducible:
Undetermined


Steps to Reproduce:
1. Run a 13-->16.2 FFU up to step "openstack overcloud upgrade run [--stack <stack_name>] --playbook upgrade_steps_playbook.yaml --tags nova_hybrid_state --limit all" on a Juniper Contrail backed ENV


Actual results:
Observe unhealthy or bouncing in a loop (varies) nova_compute containers on all computes


Expected results:
Unclear (BZ created with aim to answer that question and others)



Additional info:

Comment 17 errata-xmlrpc 2023-11-08 19:18:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307


Note You need to log in before you can comment on or make changes to this bug.