Bug 1828637
| Summary: | [Telco] northd: detect and correct DB consistency and referential integrity issues | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dan Williams <dcbw> | |
| Component: | ovn2.12 | Assignee: | Dumitru Ceara <dceara> | |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | FDP 20.B | CC: | ctrautma, dblack, dceara, eminguez, i.maximets, jhuddles, jishi, mmichels, ralongi, smalleni | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | All | |||
| Whiteboard: | Telco | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1837257 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-26 14:07:18 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1828639, 1836305 | |||
| Bug Blocks: | 1837257 | |||
|
Comment 1
Dan Williams
2020-04-28 14:40:41 UTC
Patch series to make ovn-northd solve such inconsistency automatically sent upstream for review: https://patchwork.ozlabs.org/project/openvswitch/list/?series=173578 *** Bug 1828343 has been marked as a duplicate of this bug. *** reproduced first situation of inconsistency on ovn2.13.0-21 with following script:
/usr/share/ovn/scripts/ovn-ctl start_northd
# Create two logical switches with one port each.
ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 p1
ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 p2
ovn-nbctl --wait=sb sync
# At this point PB for p1 has tunnel_key=1
ovn-sbctl list datapath
# At this point PB for p2 has tunnel_key=2
# Simulate the SB db going away (could be network
# issues or crash or some other event).
/usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb
# CMS decides to move p2 from ls2 to ls1 and removes
# ls2 completely.
ovn-nbctl ls-del ls2
ovn-nbctl lsp-add ls1 p2
# Simulate SB DB coming back online.
/usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb
[root@kvm-04-guest09 bz1828637]# cat rep1.sh
/usr/share/ovn/scripts/ovn-ctl start_northd
# Create two logical switches with one port each.
ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 p1
ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 p2
ovn-nbctl --wait=sb sync
# At this point PB for p1 has tunnel_key=1
ovn-sbctl list datapath
# At this point PB for p2 has tunnel_key=2
# Simulate the SB db going away (could be network
# issues or crash or some other event).
/usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb
# CMS decides to move p2 from ls2 to ls1 and removes
# ls2 completely.
ovn-nbctl ls-del ls2
ovn-nbctl lsp-add ls1 p2
# Simulate SB DB coming back online.
/usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb
cat /var/log/ovn/ovn-northd.log
[root@kvm-04-guest09 bz1828637]# rpm -qa | grep -E "openvswitch|ovn"
ovn2.13-host-2.13.0-21.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-central-2.13.0-21.el8fdp.x86_64
ovn2.13-2.13.0-21.el8fdp.x86_64
openvswitch2.13-2.13.0-18.el8fdp.x86_64
error in /var/log/ovn/ovn-northd.log:
2020-05-12T02:49:51.819Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (ad24f326-5d37-4ad0-966f-a07121e0b4bb and 1) for index on columns \"datapath\" and \"tunnel_key\". First row, with UUID ee1c4a5e-502f-4555-b5e3-c30240ef2719, had the following index values before the transaction: 79ce696f-6579-4aad-93f5-00a70c5140e2 and 1. Second row, with UUID 44631d9e-ada8-4266-8bfb-60e87533d264, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
reproduced second situation of inconsistency on ovn2.13.0-21 with following script:
/usr/share/ovn/scripts/ovn-ctl start_northd
# Create a logical router with on router port.
ovn-nbctl lr-add lr
ovn-nbctl lrp-add lr p 00:00:00:00:00:01 1.1.1.1/24
# Simulate that a mac binding was created for the router
# port.
dp=$(ovn-sbctl --bare --columns _uuid list datapath .)
ovn-sbctl create mac_binding logical_port="p" ip="1.1.1.2" datapath="$dp"
ovn-nbctl --wait=sb sync
# Simulate the SB db going away (could be network
# issues or crash or some other event).
/usr/share/ovn/scripts/ovn-ctl stop_sb_ovsdb
# CMS decides to delete lr.
ovn-nbctl lr-del lr
# CMS decides to readd lr and router port.
ovn-nbctl lr-add lr
ovn-nbctl lrp-add lr p 00:00:00:00:00:01 1.1.1.1/24
# Simulate SB DB coming back online.
/usr/share/ovn/scripts/ovn-ctl start_sb_ovsdb
cat /var/log/ovn/ovn-northd.log
error in /var/log/ovn/ovn-northd.log:
2020-05-12T02:52:29.274Z|00013|ovsdb_idl|WARN|transaction error: {"details":"cannot delete Datapath_Binding row 63dfd8b4-e904-4004-a7ed-2068593c6de4 because of 1 remaining reference(s)","error":"referential integrity violation"}
Verified on ovn2.13.0-27, no error in /var/log/ovn/ovn-northd.log after running the two scripts:
[root@kvm-04-guest09 bz1828637]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch
ovn2.13-2.13.0-27.el8fdp.x86_64
ovn2.13-host-2.13.0-27.el8fdp.x86_64
openvswitch2.13-2.13.0-18.el8fdp.x86_64
ovn2.13-central-2.13.0-27.el8fdp.x86_64
reproduced on rhel7 version:
2020-05-12T03:12:56.526Z|00013|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (05fa7425-20c5-49a0-8cf9-d75e189af722 and 1) for index on columns \"datapath\" and \"tunnel_key\". First row, with UUID eb25335d-e058-4f7b-8daf-bfe9dd8c9415, had the following index values before the transaction: 4134a573-df11-415f-bcf1-ff9f0312fbe9 and 1. Second row, with UUID 4e82b8f7-d2e8-4cbb-9221-1d8e5a4c50e1, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
[root@hpe-dl380pgen8-02-vm-13 bz1828637]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.13-2.13.0-17.el7fdp.x86_64
ovn2.13-2.13.0-21.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
ovn2.13-host-2.13.0-21.el7fdp.x86_64
ovn2.13-central-2.13.0-21.el7fdp.x86_64
2020-05-12T03:13:55.476Z|00013|ovsdb_idl|WARN|transaction error: {"details":"cannot delete Datapath_Binding row d2759a31-9239-43a4-9371-4a25a259d4ca because of 1 remaining reference(s)","error":"referential integrity violation"}
Verified on ovn2.13.0-27:
[root@hpe-dl380pgen8-02-vm-13 bz1828637]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.13-2.13.0-17.el7fdp.x86_64
ovn2.13-central-2.13.0-27.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch
ovn2.13-2.13.0-27.el7fdp.x86_64
ovn2.13-host-2.13.0-27.el7fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2317 |