Bug 1931661 - OSP 16.1 ovn octavia load balancers in error state - Pool <UUID> is immutable and cannot be updated
Summary: OSP 16.1 ovn octavia load balancers in error state - Pool <UUID> is immutable...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ovn-octavia-provider
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Brian Haley
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-22 21:30 UTC by Matt Flusche
Modified: 2021-11-17 18:58 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-26 14:18:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-278 0 None None None 2021-11-17 18:58:55 UTC
Red Hat Knowledge Base (Solution) 4251821 0 None None None 2021-02-26 20:31:43 UTC

Description Matt Flusche 2021-02-22 21:30:50 UTC
Description of problem:
OVN octavia load balancers stop working.  octavia.log contains the following errors:

Client-side error: Pool <UUID> is immutable and cannot be updated. format_exception /usr/lib/python3.6/site-packages/wsme/api.py:222

Perhaps: https://bugs.launchpad.net/neutron/+bug/1900763

Version-Release number of selected component (if applicable):
16.1.x

How reproducible:
This specific environment


Additional info:
I'll provide additional details in privates comments

Comment 31 Rabi Mishra 2021-03-08 12:10:30 UTC
https://access.redhat.com/solutions/5858391 seems relevant.

Comment 37 Brian Haley 2021-03-15 19:47:01 UTC
Hi Ignacio,

Unfortunately these errors were happening in even the oldest logs in the sosreports I found on supportshell, so I can't tell what operation might have triggered them exactly.

One possibility is that the subnet was not created properly, and when Kuryr/CNO went to allocate a VIP with that IP, it was already in-use.

Another is that there was a loadbalancer create operation that was somehow stopped and left a stale DB entry.

I know there were some manual DB operations to clean things up, followed by a forced failover of (I think) this loadbalancer which cleared things up.

All the log files I've looked at in the case show this failure (AllocateVIPException), I think to get further in the RCA we'd need the logs for when the failure stopped, so we could see what operation happened at that point in time.

Thanks,

-Brian


Note You need to log in before you can comment on or make changes to this bug.