Bug 2085583

Summary: [RHOS 17] ovs reconnects can block the nova-compute agent
Product: Red Hat OpenStack Reporter: smooney
Component: python-os-vifAssignee: OSP Team <rhos-maint>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: alifshit, arcsingh, igallagh, jamsmith, jjoyce, jpretori, jschluet, kthakre, rheslop, slinaber, spower, tvignaud
Target Milestone: z1Keywords: Patch, Triaged
Target Release: 17.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-os-vif-2.4.0-0.20220907220808.6167658.el9ost Doc Type: Bug Fix
Doc Text:
Before this update, `ovsdb` connection time-outs caused the `nova-compute` agent to become unresponsive. With this update, the issue has been fixed.
Story Points: ---
Clone Of:
: 2085588 (view as bug list) Environment:
Last Closed: 2023-01-25 12:28:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2085517, 2085588, 2122654    

Description smooney 2022-05-13 16:25:17 UTC
This bug was initially created as a copy of Bug #2085517

I am copying this bug because: 



Description of problem: Unable to resize a stopped VM with latest phase3 puddle RHOS-17.0-RHEL-9-20220511.n.1, this is the first puddle this has happened for 17 and was previously passing. Also this test is failing across a couple of jobs [1,2].  Example build failure and logs here [3,4].  Failure logs attached as well.

Version-Release number of selected component (if applicable):
RHOS-17.0-RHEL-9-20220511.n.1

How reproducible:
First time seeing this for 17 with latest puddle, happened across several jobs.

Steps to Reproduce:
1. Deploy environment with RHOS-17.0-RHEL-9-20220511.n.1 and execute tempest test tempest.api.compute.servers.test_server_actions/ServerActionsTestJSON/test_resize_server_confirm_from_stopped_id
2.
3.

Actual results:
VM remains in SHUTOFF state, fails to migrate, and never resizes

Expected results:
VM resizes

Additional info:
[1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com//job/DFG-compute-nova-17.0_director-rhel-virthost-1cont_2comp_3ceph-ipv4-geneve-live-migration-shared-storage-config-drive-vfat-phase3/9//testReport/tempest.api.compute.servers.test_server_actions/ServerActionsTestJSON/test_resize_server_confirm_from_stopped_id_138b131d_66df_48c9_a171_64f45eb92962_/
[2] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com//job/DFG-compute-nova-17.0_director-rhel-virthost-1cont_2comp_3ceph-ipv4-geneve-smt-whitebox-numa-tests-ceph-phase3/8//testReport/tempest.api.compute.servers.test_server_actions/ServerActionsTestJSON/test_resize_server_confirm_from_stopped_id_138b131d_66df_48c9_a171_64f45eb92962_/
[3] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-compute-nova-17.0_director-rhel-virthost-1cont_2comp_3ceph-ipv4-geneve-smt-whitebox-numa-tests-ceph-phase3/8/
[4] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-compute-nova-17.0_director-rhel-virthost-1cont_2comp_3ceph-ipv4-geneve-smt-whitebox-numa-tests-ceph-phase3/8/

Comment 13 errata-xmlrpc 2023-01-25 12:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.0.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0271