Description of problem: Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in deployment templates - Suddenly after upgrade some loadbalancers ended up in error state. - Failing them over didn't fix anything they ended up in 'pending_update' state and some amphoras went in 'error' state. - we tried to compare two sosreport from the same node before and after upgrade sosreport-helpa-compute1r1-prod-2023-11-27 with sosreport-helpa-compute1r1-prod-2023-12-15 - in both there are q-devices which has either mtu 1500 or 8950 so not all are after the upgrade - result: Octavia Mgmt is not working for large amount of amphoras. - Octavia Management network has with some amphoras has been set to 1500 MTU which is too low for Octavia Mgmt health messages. many amphora related interfaces have small mtu, like: ip a | grep eaf87d51-ac 862: qbreaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 863: qvoeaf87d51-ac@qvbeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000 864: qvbeaf87d51-ac@qvoeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbreaf87d51-ac state UP group default qlen 1000 872: tapeaf87d51-ac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue master qbreaf87d51-ac state UNKNOWN group default qlen 1000 Many loadbalancer are stuck in pending update and prevent any actions. we have noticed upstream in [1] which states: ~~~ A new parameter octavia_provider_network_mtu is added to set the MTU to 1500 by default. This is important for deployments that allow jumbo frames while setting the management to the standard Ethernet MTU. The MTU can be still changed at any point during the initial octavia deployment or with the openstack network set –mtu command line. ~~~ that may be available downstream (variable) in [2] But changing network manually is also an option. by doing: openstack network set --mtu 8950 <LB networks ID> but on test lab first but This is not tested yet, we need to investigate why the MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950 [1] https://docs.openstack.org/releasenotes/openstack-ansible-os_octavia/unreleased.html [2] https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/864819 Version-Release number of selected component (if applicable): RHOSP 16.2.6 puppet-octavia-15.5.1-2.20220821005128.a56b33a.el8ost.noarch Actual results: MTU in Octavia Mgmt network is having smaller MTU and return the orginal value to 8950, Octavia Mgmt is not working for large amount of amphoras Expected results: MTU in Octavia Mgmt network is having the orginal value: 8950 Additional info: sos-report attached on case from compute and controllers
(In reply to John Soliman from comment #0) > Description of problem: > Hello, we have a CU has upgraded the ENV to 16.2.6, CU using jumbo frame in > deployment templates > - Suddenly after upgrade some loadbalancers ended up in error state. > - Failing them over didn't fix anything they ended up in 'pending_update' > state and some amphoras went in 'error' state. > > - we tried to compare two sosreport from the same node before and after > upgrade > sosreport-helpa-compute1r1-prod-2023-11-27 with > sosreport-helpa-compute1r1-prod-2023-12-15 > - in both there are q-devices which has either mtu 1500 or 8950 so not all > are after the upgrade Can you please share those sosreports? Can you please also share the output of `openstack network show lb-mgmt-net` ? Can you please also share when did they do the update?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2.6 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:1519