Description of problem: In an OVN deployment when ovsdb-server failover happens, it can happen that some ovn-controllers connect to the ovsdb-server master in read-only mode. Once the ovsdb-servers are promoted to master, ideally ovn-controller should reconnect again and have read-write access to the db. But some times, the connection is not reset and these ovn-controller remain connected to the ovsdb-servers' in read-only mode. Because of which they cannot write to the SB db. This causes VM boot failures and mac_binding write failures. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi Numan,can this bug happen everytime when change the ovsdb-server? Or is there an easy way to reproduce it? We haven't found the bug when we run our ha case which use cluster to test.Thanks!
(In reply to haidong li from comment #2) > Hi Numan,can this bug happen everytime when change the ovsdb-server? Or is > there an easy way to reproduce it? We haven't found the bug when we run our > ha case which use cluster to test.Thanks! Hi Haidong Li, This doesn't happen all the time. There is small timing window in which this could happen. Prior to the fix which addresses this issue, whenever ovsdb-server changes the state from active to standby or vice versa, ovsdb-server was closing the existing socket connections. With the fix, it doesn't close the existing connections. You can probably verify in 2 ways 1. Without pacemaker. a. In this, start OVN ovsdb-servers b. Run ovn-nbctl --detach. Note down the unix socket it displays in a shell varible - NB_PATH 3. Run ovs-appctl -t $NB_PATH run ls-add tst 4. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/set-active-ovsdb-server tcp:192.0.2.254:6641" 5. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/connect-active-ovsdb-server" 6. Run - ovs-appctl -t $NB_PATH run ls-add tst1 -> This should fail. 7. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/disconnect-active-ovsdb-server" 8. Run - ovs-appctl -t $NB_PATH run ls-add tst2 -> This should fail. The othwe is with pacemaker. This is tricky and you may or may not be able to reproduce it. But this is how I reproduced it. (Use openvswitch version which doesn't have this fix) 2. With pacemaker setup a. Increase the monitor interval period by running - pcs resource show <ovn-resource-name> - pcs resource update <ovn-resource-name> op monitor interval 600 (I don't know the exact command, but you can copy from the pcs resource show output. b. In one window run - "watch -n1 nestat -tunlpa | grep 6642". Make sure that ovn-controller is connected to the active ovsdb-server b. On the master node, keep running the below commands - /usr/share/openvswitch/scripts/ovn-ctl demote_ovnsb --db-sb-sync-from-addr=192.0.2.254 When you run the above command, you should see the socket connection in the "netstat" output to getting reset. - /usr/share/openvswitch/scripts/ovn-ctl promote_ovnsb Again when you run this command, the socket connection should be reset. If you keep running the above commands, it can happen that ovsdb-server doesn't reset the socket connection. When this happens you can verify if ovn-controller is able to write to the sb db or not. For this you can bind a port. Like - ovs-vsctl add-port br-int tst-- set interface tst type=internal ovs-vsctl set Interface $name external_ids:iface-id=$<SOME_LOGICAL_PORT_ID> Hope this helps. Thanks Numan
verified on the latest version: [root@dell-per740-18 ~]# rpm -qa | grep openvswitch openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch openvswitch2.11-2.11.0-26.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch [root@dell-per740-18 ~]# rpm -qa | grep ovn ovn2.11-2.11.1-8.el7fdp.x86_64 ovn2.11-central-2.11.1-8.el7fdp.x86_64 ovn2.11-host-2.11.1-8.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch [root@dell-per740-18 ~]# [root@dell-per740-18 ~]# ovn-nbctl --detach /var/run/openvswitch/ovn-nbctl.24631.ctl [root@dell-per740-18 ~]# NB_PATH=/var/run/openvswitch/ovn-nbctl.24631.ctl [root@dell-per740-18 ~]# ovs-appctl -t $NB_PATH run ls-add test [root@dell-per740-18 ~]# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/set-active-ovsdb-server tcp:192.0.2.254:6641 [root@dell-per740-18 ~]# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/connect-active-ovsdb-server [root@dell-per740-18 ~]# ovs-appctl -t $NB_PATH run ls-add test1 transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ovs-appctl: /var/run/openvswitch/ovn-nbctl.24631.ctl: server returned an error [root@dell-per740-18 ~]# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/disconnect-active-ovsdb-server [root@dell-per740-18 ~]# ovs-appctl -t $NB_PATH run ls-add test2 [root@dell-per740-18 ~]# ovn-nbctl show switch dc71bf7c-adb1-44f4-bc4e-511067edcf34 (test2) switch f8313504-6647-4d1d-97f3-9c60a553ea91 (test) [root@dell-per740-18 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3719
* Wed Jul 15 2020 Flavio Leitner <fbl> - 2.11.3-58 - spec: Fix configure to use dpdkdir without version. [583acc91dd782f1e73cc20a27b7cbd8bb5a7bc98] * Mon Jul 13 2020 Flavio Leitner <fbl> - 2.11.3-57 - redhat: Rename OVSCI job name. [cbcaa831188b77f253f718203dc743904538464a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-56 - This is fast-datapath-rhel-8 [98f312f126a245f2609a8dcea9604e09832181f0] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-55 - bus/pci: fix VF memory access (#1851170) [fa4d90db57191665037114e4098f3d1f6b6ea9c7] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-54 - vhost: fix vring index check (#1831391) [8e33084d85d80cea72d02de0abf36c142dcefa2a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-53 - vhost: check log mmap offset and size overflow (#1831391) [753ae0cf66553e8fd71b8e76642900d9fb62c406] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-52 - vhost: add device op when notification to guest is sent (#1726579) [92715cf99cbebdb6d13e223872cdd44f822a4ebe] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-51 - net/i40e: re-program promiscuous mode on VF interface (#1733402) [0fe1f42b5f3bc0b714f063d57cc79215459d28dc] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-50 - bus/pci: always check IOMMU capabilities (#1711739) [0815c39d39c0b34dd7456bde23077e1f25250dec] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-49 - eal: fix IOVA mode selection as VA for PCI drivers (#1711739) [11fbef3c85f71b257dc37dd9b570025ad4a24dfa] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-48 - bus/pci: consider only usable devices for IOVA mode (#1711739) [69f5cb4c56c59505c76d4599cb0117b9fd6bfc11] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-47 - eal: compute IOVA mode based on PA availability (#1711739) [d5e1d2fa507875898bae71762c84c4f1d63ed972] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-46 - netdev-linux: Update LAG in all cases. (#1812892) [276351180996d21a96b6539671e4eed4e636f65d] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-45 - netdev-offload-tc: Re-fetch block ID after probing. (#1812892) [83cebd3221538df693d7170c3a17ed9a381911c6] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-44 - netdev-offload-tc: Flush rules on ingress block when init tc flow api (#1812892) [e5d7d5ec243b68d65383ca5075d7128f13e8aebc] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-43 - netdev-vport: Use the dst_port in tunnel netdev name (#1727599) [f4a6fb757441ee0ba5bf808a18cd8bf7a65a9124] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-42 - lib/tc: Fix flow dump for tunnel id equal zero (#1732305) [765ba1d1c0898446d3c05d9c7d3e92134647787a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-41 - lib/tc: Support optional tunnel id (#1732305) [42f09fe96f8664a4165261c935d0a4117f0675d1] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-40 - tc: Set 'no_percpu' flag for compatible actions (#1780690) [42f07f6bd81f65f52b84bb7a0011c5bb21af71ce] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-39 - rhel: let *-ctl handle runtime directory (#1785586) [c3763ec916aef757d113a73fb402cf89753e92a7] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-38 - rhel: set useropts optional for ovsdb-server (#1785586) [77bed8f0e4c0a3b7396a219d4680d585e88caf95] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-37 - rhel: run ovn with the same user as ovs (#1785586) [8f5f39b4afcfcfc8f29e79db138629630909352a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-36 - rhel: secure openvswitch useropts (#1785586) [71154ad26f1c22aacc60ab0a1ea335b7b2a6588a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-35 - userspace: Improved packet drop statistics. (#1726568) [a6b7a37be86d9fe990e4511f56b99d23d14f763d] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-34 - netdev-dpdk: Fix sw stats perf drop. (#1790841) [54f4571750280654fa05705b2d4657823dffbf64] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-33 - netdev-dpdk: Detailed packet drop statistics. (#1790841) [1e1b33541a3a832e32d7515b660f2939b251718a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-32 - netdev-dpdk: Reuse vhost function for dpdk ETH custom stats. (#1790841) [e0d00f70c5154535a86295ea58f6ef726e478fc8] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-31 - netdev-dpdk: Refactor vhost custom stats for extensibility. (#1790841) [b084d7a5c2644ac5e6ec667c80ae9c39b3f22350] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-30 - netdev-dpdk: Fix not reporting rx_oversize_errors in stats. (#1790841) [26017f85c82ba01a1e884a031605095b4f64ee69] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-29 - ovsdb replication: Provide option to configure probe interval. (#1788800) [e8a669ead72973ced8bb15d9a18e25b323f05ab0] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-28 - netdev-dpdk: Add coverage counter to count vhost IRQs. (#1726579) [3c3997eb0aa9693f89a6a3083b6fa12772d522dd] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-27 - netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event. (#1719644) [ca1a1a8e1c6ec2b44744876b26630448022b95e9] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-26 - bridge: Allow manual notifications about interfaces' updates. (#1719644) [f58b68088819d4ec8b7bd3a1821929f5fea3170d] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-25 - Shutdown SSL connection before closing socket (#1780745) [aa97017175536816f70d111647b5dc9bedd824ff] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-24 - flake8: also check the ovs-check-dead-ifs script (#1751161) [ecd3a1b407816c629c17f410f95eab868ab68257] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-23 - ovs-check-dead-ifs: unshadow pid variable (#1751161) [a086e7618191f0efc75746c1fe6d4481a397f2ac] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-22 - ovs-check-dead-ifs: python3 print format (#1751161) [d61553f744b42dc05186910be30171ed1f8425e3] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-21 - ovs-tcpundump: exit when getting version (#1764127) [ea9923af222ed5bf398846b553d7b7fe54e10bd6] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-20 - ovs-tcpundump: allow multiple packet lengths (#1764125) [ac3b7794054e2b15b22855930b23ede24b5d5835] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-19 - jsonrpc: increase input buffer size from 512 to 4096 (#1776883) [9c93db837390817b3bae8b2104bec5becbd946cf] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-18 - netdev-dpdk: Track vhost tx contention. (#1740144) [31112a95027735528554c91953de89175f94e191] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-17 - ovsdb-server: Allow replication from older schema version servers. (#1766586) [cb53fe2282c1c260cb7cc98c9d21e0573b304283] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-16 - ovsdb-server: Don't drop all connections on read/write status change. (#1761572) [5a0a77328bcab168ad04fba006158f2c2884befb] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-15 - ofproto-dpif: Fix continuation with patch port (#1761461) [069d4bd4378e02bd61121f32fb2bc18ac316f358] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-14 - vswitch: ratelimit the device add log (#1737146) [052e541d4580fe49d3461c3045755374a0726dd5] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-13 - netdev-dpdk: Enable tx-retries-max config. (#1747531) [734086f5d4608b7cdf03a5d0a182245354e1f6eb] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-12 - netdev-dpdk: Add custom stat for vhost tx retries. (#1747531) [0c238ac414e750fad80ec810ff42395df6c2e540] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-11 - doc: Move vhost tx retry info to separate section. (#1747531) [91d9e4d92b9efe06dccbf22f42faf1ae183a96e9] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-10 - netdev-vport: Make ip6gre netdev type to use TC rules (#1725623) [d3315b8035a875e9e3b425d72a97191fbcb7e065] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-9 - tunnel: Add layer 2 IPv6 GRE encapsulation support. (#1725623) [0c20e7e83ddb50dbb6e0c37f986216e3953ea12e] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-8 - ovsdb-server: drop all connections on read/write status change (#1720947) [0f0be40ee08c15a114029a5c0e046dc58d38fb09] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-7 - netdev-tc-offloads: Support match on priority tags (#1725623) [895735b3827e2afdd7c968d965e9f4fd9b0e1278] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-6 - rhel: limit stack size to 2M. (#1720315) [79c6209e71801b94396ce4833cff99a2c0969e30] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-5 - Add a new OVS action check_pkt_larger (#1702564) [c899ac57880e4446a00d83a590a5eb60fc081fdc] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-4 - netlink linux: account for the netnsid netlink attr. (#1692812) [ce14b518b702c2401a9a291a0afd654de5cd44a5] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-3 - rhel: Add an example to specify custom options (#1687775) [a7dd6b6eb5e2dfe15d9387f83b614c8661b18bdd] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-2 - ovs-ctl: Permit to specify additional options (#1687775) [b8a874b82e423a87965503da2384c45e84b6509a] * Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-1 - Merge commit 'a4efc599e0244e43fd417b2fb38b7f120eb1ebd4' into fast-datapath-rhel-7 [8da1428afe7a47d5fe02d396ede18d7ecfb60128] - Backport "vhost: fix virtqueue not accessible" (#1792399) - Backport "vhost: prevent zero copy mode if IOMMU is on" (#1792399) - Backport "vhost: convert buffer addresses to GPA for logging" (#1792399) - Backport "vhost: translate incoming log address to GPA" (#1792399) - Backport "vhost: fix vring address handling during live migration" (#1792399)