Numan said: Looks like the IDL contents and the actual db are out of sync. The IDL will not try to create a port binding row which causes referential integrity issues unless there is a mismatch. Also when ovn-northd tries to delete a datapath, the deletion fails as some port binding rows of the datapath exist in the db which ovn-northd is unaware of. We should work on detecting this in ovn-northd and force a complete re-sync of the DB. Since this is the root cause of the issue, I'm repurposing this bug to track detecting and correcting IDL consistency issues in northd automatically.
Patch series to make ovn-northd solve such inconsistency automatically sent upstream for review: https://patchwork.ozlabs.org/project/openvswitch/list/?series=173578
*** Bug 1828343 has been marked as a duplicate of this bug. ***
reproduced first situation of inconsistency on ovn2.13.0-21 with following script: /usr/share/ovn/scripts/ovn-ctl start_northd # Create two logical switches with one port each. ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 p1 ovn-nbctl ls-add ls2 ovn-nbctl lsp-add ls2 p2 ovn-nbctl --wait=sb sync # At this point PB for p1 has tunnel_key=1 ovn-sbctl list datapath # At this point PB for p2 has tunnel_key=2 # Simulate the SB db going away (could be network # issues or crash or some other event). /usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb # CMS decides to move p2 from ls2 to ls1 and removes # ls2 completely. ovn-nbctl ls-del ls2 ovn-nbctl lsp-add ls1 p2 # Simulate SB DB coming back online. /usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb [root@kvm-04-guest09 bz1828637]# cat rep1.sh /usr/share/ovn/scripts/ovn-ctl start_northd # Create two logical switches with one port each. ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 p1 ovn-nbctl ls-add ls2 ovn-nbctl lsp-add ls2 p2 ovn-nbctl --wait=sb sync # At this point PB for p1 has tunnel_key=1 ovn-sbctl list datapath # At this point PB for p2 has tunnel_key=2 # Simulate the SB db going away (could be network # issues or crash or some other event). /usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb # CMS decides to move p2 from ls2 to ls1 and removes # ls2 completely. ovn-nbctl ls-del ls2 ovn-nbctl lsp-add ls1 p2 # Simulate SB DB coming back online. /usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb cat /var/log/ovn/ovn-northd.log [root@kvm-04-guest09 bz1828637]# rpm -qa | grep -E "openvswitch|ovn" ovn2.13-host-2.13.0-21.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-central-2.13.0-21.el8fdp.x86_64 ovn2.13-2.13.0-21.el8fdp.x86_64 openvswitch2.13-2.13.0-18.el8fdp.x86_64 error in /var/log/ovn/ovn-northd.log: 2020-05-12T02:49:51.819Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (ad24f326-5d37-4ad0-966f-a07121e0b4bb and 1) for index on columns \"datapath\" and \"tunnel_key\". First row, with UUID ee1c4a5e-502f-4555-b5e3-c30240ef2719, had the following index values before the transaction: 79ce696f-6579-4aad-93f5-00a70c5140e2 and 1. Second row, with UUID 44631d9e-ada8-4266-8bfb-60e87533d264, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} reproduced second situation of inconsistency on ovn2.13.0-21 with following script: /usr/share/ovn/scripts/ovn-ctl start_northd # Create a logical router with on router port. ovn-nbctl lr-add lr ovn-nbctl lrp-add lr p 00:00:00:00:00:01 1.1.1.1/24 # Simulate that a mac binding was created for the router # port. dp=$(ovn-sbctl --bare --columns _uuid list datapath .) ovn-sbctl create mac_binding logical_port="p" ip="1.1.1.2" datapath="$dp" ovn-nbctl --wait=sb sync # Simulate the SB db going away (could be network # issues or crash or some other event). /usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb # CMS decides to delete lr. ovn-nbctl lr-del lr # CMS decides to readd lr and router port. ovn-nbctl lr-add lr ovn-nbctl lrp-add lr p 00:00:00:00:00:01 1.1.1.1/24 # Simulate SB DB coming back online. /usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb cat /var/log/ovn/ovn-northd.log error in /var/log/ovn/ovn-northd.log: 2020-05-12T02:52:29.274Z|00013|ovsdb_idl|WARN|transaction error: {"details":"cannot delete Datapath_Binding row 63dfd8b4-e904-4004-a7ed-2068593c6de4 because of 1 remaining reference(s)","error":"referential integrity violation"} Verified on ovn2.13.0-27, no error in /var/log/ovn/ovn-northd.log after running the two scripts: [root@kvm-04-guest09 bz1828637]# rpm -qa | grep -E "openvswitch|ovn" openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch ovn2.13-2.13.0-27.el8fdp.x86_64 ovn2.13-host-2.13.0-27.el8fdp.x86_64 openvswitch2.13-2.13.0-18.el8fdp.x86_64 ovn2.13-central-2.13.0-27.el8fdp.x86_64
reproduced on rhel7 version: 2020-05-12T03:12:56.526Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (05fa7425-20c5-49a0-8cf9-d75e189af722 and 1) for index on columns \"datapath\" and \"tunnel_key\". First row, with UUID eb25335d-e058-4f7b-8daf-bfe9dd8c9415, had the following index values before the transaction: 4134a573-df11-415f-bcf1-ff9f0312fbe9 and 1. Second row, with UUID 4e82b8f7-d2e8-4cbb-9221-1d8e5a4c50e1, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} [root@hpe-dl380pgen8-02-vm-13 bz1828637]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.13-2.13.0-17.el7fdp.x86_64 ovn2.13-2.13.0-21.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-host-2.13.0-21.el7fdp.x86_64 ovn2.13-central-2.13.0-21.el7fdp.x86_64 2020-05-12T03:13:55.476Z|00013|ovsdb_idl|WARN|transaction error: {"details":"cannot delete Datapath_Binding row d2759a31-9239-43a4-9371-4a25a259d4ca because of 1 remaining reference(s)","error":"referential integrity violation"} Verified on ovn2.13.0-27: [root@hpe-dl380pgen8-02-vm-13 bz1828637]# rpm -qa | grep -E "openvswitch|ovn" openvswitch2.13-2.13.0-17.el7fdp.x86_64 ovn2.13-central-2.13.0-27.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch ovn2.13-2.13.0-27.el7fdp.x86_64 ovn2.13-host-2.13.0-27.el7fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2317