Bug 1927348 - [OVN] port update cause sbctl error
Summary: [OVN] port update cause sbctl error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: z8
: 16.1 (Train on RHEL 8.2)
Assignee: Jakub Libosvar
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 1927369
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-10 15:23 UTC by Yariv
Modified: 2022-03-25 10:34 UTC (History)
9 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20220113183500.4e24f4c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1927369 (view as bug list)
Environment:
Last Closed: 2022-03-25 10:34:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-496 0 None None None 2021-11-18 14:21:49 UTC

Description Yariv 2021-02-10 15:23:33 UTC
Description of problem:

Updating direct port with binding profile causes the following

 2021-02-10T09:13:23.817Z|01290|ovsdb_idl|WARN|transaction error: {"details":"cannot delete HA_Chassis_Group row 24f09c3b-3031-419e-92e0-d57843ca684e because of 1 remaining reference(s)","error":"referential integrity violation"}

After that vm is active but no ips ackuired

Version-Release number of selected component (if applicable):

RHOS-16.1-RHEL-8-20210129.n.0(overcloud)
ovn2.13-20.09.0-17.el8fdp.x86_64

How reproducible:

Permanent

Steps to Reproduce:
1. create direct port 
 openstack port create  --network hwoffload_net_nic1_129 --vnic-type direct port_name

2.  openstack port set --binding-profile "capabilities=['switchdev']" port_name
The error pops up

3.  openstack port delete port name
The error pop up

4. openstack server create ... --nic port-id=port_name test server

Actual results:
VMS is up no IPS

Expected results:
No errors, 
VMS is up with IPs

Additional info:

When creating port in the regular way, vm is up with IPs

openstack port create  --network network_id --vnic-type direct --binding-profile '{"capabilities": ["switchdev"]}' port_name

Comment 2 Jakub Libosvar 2021-02-10 15:40:48 UTC
The reason this happens is as follows:

  openstack port create  --network hwoffload_net_nic1_129 --vnic-type direct port_name

networking-ovn thinks this is an SR-IOV port and because such a port is not plugged through br-int, we create and external HA port on controllers for DHCP and metadata services. Hence the corresponding logical switch port in OVN NB DB is type: external and has associated ha_chassis_group. Also such a port is bound to controller with highest priority, it means there is a port_binding entry in SB DB also with ha_chassis_group.

  openstack port set --binding-profile "capabilities=['switchdev']" port_name

This means the port will be plugged through br-int, there will be its representator port. Because of that, we no longer need the external port because DHCP and metadata can be done directly on the hosting hypervisor. The update triggers a call to NB DB to remove the type: external port and deletes the ha_chassis_group because we no longer need those. If this is the last logical switch port associated with the default_ha_chassis group, OVN northd tries to delete it from the SB DB because it's no longer used. However, there is still the port_binding for the external port using that ha_chassis_group and it's referenced form the ha_Chassis_group - thus northd attempt to delete the ha_chassis_group fails and is retried. Because northd is stuck retrying to delete the ha_chassis_group, the port binding for the external port can't be removed and northd holds a lock over the SB DB. This causes all ovn-controllers in the whole cluster unable to create new entries in the database. Entries like port_binding or mac_bindings. It means the cluster is unusable until someone deletes the port_binding for no longer existing external logical switch port.

Comment 8 Jakub Libosvar 2022-01-26 21:58:35 UTC
Fixed in ovn2.13-20.12.0-85

Comment 11 Vadim Khitrin 2022-02-02 10:53:16 UTC
Hi,

NFV team has verified this on compose 'RHOS-16.1-RHEL-8-20211126.n.1' with OVN 'ovn2.13-20.12.0-189.el8fdp.x86_64'.

Comment 12 OSP Team 2022-03-25 10:34:42 UTC
According to our records, this should be resolved by python-networking-ovn-7.3.1-1.20220113183502.el8ost.  This build is available now.


Note You need to log in before you can comment on or make changes to this bug.