Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1931661

Summary: OSP 16.1 ovn octavia load balancers in error state - Pool <UUID> is immutable and cannot be updated
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: python-ovn-octavia-providerAssignee: Brian Haley <bhaley>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: afariasa, bhaley, cgoncalves, dhill, eolivare, gthiemon, igarciam, ihrachys, kfida, knakai, lpeer, ltomasbo, majopela, mdemaced, mdulko, pveiga, ramishra, scohen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-26 14:18:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Flusche 2021-02-22 21:30:50 UTC
Description of problem:
OVN octavia load balancers stop working.  octavia.log contains the following errors:

Client-side error: Pool <UUID> is immutable and cannot be updated. format_exception /usr/lib/python3.6/site-packages/wsme/api.py:222

Perhaps: https://bugs.launchpad.net/neutron/+bug/1900763

Version-Release number of selected component (if applicable):
16.1.x

How reproducible:
This specific environment


Additional info:
I'll provide additional details in privates comments

Comment 31 Rabi Mishra 2021-03-08 12:10:30 UTC
https://access.redhat.com/solutions/5858391 seems relevant.

Comment 37 Brian Haley 2021-03-15 19:47:01 UTC
Hi Ignacio,

Unfortunately these errors were happening in even the oldest logs in the sosreports I found on supportshell, so I can't tell what operation might have triggered them exactly.

One possibility is that the subnet was not created properly, and when Kuryr/CNO went to allocate a VIP with that IP, it was already in-use.

Another is that there was a loadbalancer create operation that was somehow stopped and left a stale DB entry.

I know there were some manual DB operations to clean things up, followed by a forced failover of (I think) this loadbalancer which cleared things up.

All the log files I've looked at in the case show this failure (AllocateVIPException), I think to get further in the RCA we'd need the logs for when the failure stopped, so we could see what operation happened at that point in time.

Thanks,

-Brian