Bug 2255253

Summary: [OSP16.2] After upgrade to OSP16.2.6 Octavia Mgmt network amphoras having random MTU change: smaller MTU (1500) compared to orginal value 8950 (jumbo frames)
Product: Red Hat OpenStack Reporter: John Soliman <jsoliman>
Component: tripleo-ansibleAssignee: Gregory Thiemonge <gthiemon>
Status: CLOSED ERRATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: high Docs Contact: Greg Rakauskas <gregraka>
Priority: high    
Version: 16.2 (Train)CC: bcafarel, chrisbro, dalvarez, gthiemon, jlibosva, mariel, mburns, midzik, oschwart, pgrist, ralonsoh, tvainio, tvignaud, tweining
Target Milestone: asyncKeywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.8.1-2.20230817005024.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2257274 (view as bug list) Environment:
Last Closed: 2024-03-26 12:25:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2257274    

Description John Soliman 2023-12-19 15:28:01 UTC
Description of problem:
Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in deployment templates
- Suddenly after upgrade some loadbalancers ended up in error state. 
- Failing them over didn't fix anything they ended up in 'pending_update' state and some amphoras went in 'error' state.

- we tried to compare two sosreport from the same node before and after upgrade 
  sosreport-helpa-compute1r1-prod-2023-11-27 with sosreport-helpa-compute1r1-prod-2023-12-15 
- in both there are q-devices which has either mtu 1500 or 8950 so not all are after the upgrade
- result: Octavia Mgmt is not working for large amount of amphoras.
- Octavia Management network has with some amphoras has been set to 1500 MTU which is too low for Octavia Mgmt health messages. 

many amphora related interfaces have small mtu, like:
ip a | grep eaf87d51-ac 
862: qbreaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
863: qvoeaf87d51-ac@qvbeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
864: qvbeaf87d51-ac@qvoeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbreaf87d51-ac state UP group default qlen 1000
872: tapeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue master qbreaf87d51-ac state UNKNOWN group default qlen 1000


Many loadbalancer are stuck in pending update and prevent any actions.

we have noticed upstream in [1] which states:
~~~
A new parameter octavia_provider_network_mtu is added to set the MTU to 1500 by default. This is important for deployments that allow jumbo frames while setting the management to the standard Ethernet MTU. The MTU can be still changed at any point during the initial octavia deployment or with the openstack network set –mtu command line.
~~~

that may be available downstream (variable) in [2]
But changing network manually is also an option.
by doing:
 openstack network set --mtu 8950 <LB networks ID> but on test lab first
but This is not tested yet,

we need to investigate why the MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950

[1] https://docs.openstack.org/releasenotes/openstack-ansible-os_octavia/unreleased.html
[2] https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/864819

Version-Release number of selected component (if applicable):
RHOSP 16.2.6
puppet-octavia-15.5.1-2.20220821005128.a56b33a.el8ost.noarch

Actual results:
MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950, Octavia Mgmt is not working for large amount of amphoras

Expected results:
MTU in Octavia Mgmt network is having the orginal value: 8950


Additional info:
sos-report attached on case from compute and controllers

Comment 3 Jakub Libosvar 2023-12-19 18:25:42 UTC
(In reply to John Soliman from comment #0)
> Description of problem:
> Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in
> deployment templates
> - Suddenly after upgrade some loadbalancers ended up in error state. 
> - Failing them over didn't fix anything they ended up in 'pending_update'
> state and some amphoras went in 'error' state.
> 
> - we tried to compare two sosreport from the same node before and after
> upgrade 
>   sosreport-helpa-compute1r1-prod-2023-11-27 with
> sosreport-helpa-compute1r1-prod-2023-12-15 
> - in both there are q-devices which has either mtu 1500 or 8950 so not all
> are after the upgrade

Can you please share those sosreports?

Can you please also share the output of

`openstack network show lb-mgmt-net` ?

Can you please also share when did they do the update?

Comment 41 errata-xmlrpc 2024-03-26 12:25:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:1519