Bug 2255253 - [OSP16.2] After upgrade to OSP16.2.6 Octavia Mgmt network amphoras having random MTU change: smaller MTU (1500) compared to orginal value 8950 (jumbo frames)
Summary: [OSP16.2] After upgrade to OSP16.2.6 Octavia Mgmt network amphoras having ran...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.2 (Train)
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: async
: 16.2 (Train on RHEL 8.4)
Assignee: Gregory Thiemonge
QA Contact: Bruna Bonguardo
Greg Rakauskas
URL:
Whiteboard:
Depends On:
Blocks: 2257274
TreeView+ depends on / blocked
 
Reported: 2023-12-19 15:28 UTC by John Soliman
Modified: 2024-03-26 12:26 UTC (History)
14 users (show)

Fixed In Version: tripleo-ansible-0.8.1-2.20230817005024.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2257274 (view as bug list)
Environment:
Last Closed: 2024-03-26 12:25:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-30951 0 None None None 2023-12-19 15:29:47 UTC
Red Hat Knowledge Base (Solution) 7053931 0 None None None 2024-02-12 22:59:55 UTC
Red Hat Product Errata RHBA-2024:1519 0 None None None 2024-03-26 12:26:05 UTC

Description John Soliman 2023-12-19 15:28:01 UTC
Description of problem:
Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in deployment templates
- Suddenly after upgrade some loadbalancers ended up in error state. 
- Failing them over didn't fix anything they ended up in 'pending_update' state and some amphoras went in 'error' state.

- we tried to compare two sosreport from the same node before and after upgrade 
  sosreport-helpa-compute1r1-prod-2023-11-27 with sosreport-helpa-compute1r1-prod-2023-12-15 
- in both there are q-devices which has either mtu 1500 or 8950 so not all are after the upgrade
- result: Octavia Mgmt is not working for large amount of amphoras.
- Octavia Management network has with some amphoras has been set to 1500 MTU which is too low for Octavia Mgmt health messages. 

many amphora related interfaces have small mtu, like:
ip a | grep eaf87d51-ac 
862: qbreaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
863: qvoeaf87d51-ac@qvbeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
864: qvbeaf87d51-ac@qvoeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbreaf87d51-ac state UP group default qlen 1000
872: tapeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue master qbreaf87d51-ac state UNKNOWN group default qlen 1000


Many loadbalancer are stuck in pending update and prevent any actions.

we have noticed upstream in [1] which states:
~~~
A new parameter octavia_provider_network_mtu is added to set the MTU to 1500 by default. This is important for deployments that allow jumbo frames while setting the management to the standard Ethernet MTU. The MTU can be still changed at any point during the initial octavia deployment or with the openstack network set –mtu command line.
~~~

that may be available downstream (variable) in [2]
But changing network manually is also an option.
by doing:
 openstack network set --mtu 8950 <LB networks ID> but on test lab first
but This is not tested yet,

we need to investigate why the MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950

[1] https://docs.openstack.org/releasenotes/openstack-ansible-os_octavia/unreleased.html
[2] https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/864819

Version-Release number of selected component (if applicable):
RHOSP 16.2.6
puppet-octavia-15.5.1-2.20220821005128.a56b33a.el8ost.noarch

Actual results:
MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950, Octavia Mgmt is not working for large amount of amphoras

Expected results:
MTU in Octavia Mgmt network is having the orginal value: 8950


Additional info:
sos-report attached on case from compute and controllers

Comment 3 Jakub Libosvar 2023-12-19 18:25:42 UTC
(In reply to John Soliman from comment #0)
> Description of problem:
> Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in
> deployment templates
> - Suddenly after upgrade some loadbalancers ended up in error state. 
> - Failing them over didn't fix anything they ended up in 'pending_update'
> state and some amphoras went in 'error' state.
> 
> - we tried to compare two sosreport from the same node before and after
> upgrade 
>   sosreport-helpa-compute1r1-prod-2023-11-27 with
> sosreport-helpa-compute1r1-prod-2023-12-15 
> - in both there are q-devices which has either mtu 1500 or 8950 so not all
> are after the upgrade

Can you please share those sosreports?

Can you please also share the output of

`openstack network show lb-mgmt-net` ?

Can you please also share when did they do the update?

Comment 41 errata-xmlrpc 2024-03-26 12:25:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:1519


Note You need to log in before you can comment on or make changes to this bug.