The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1761572 - [RHEL 7] ovsdb-server doesn't apply the db server status change to all the json rpc sessions few times.
Summary: [RHEL 7] ovsdb-server doesn't apply the db server status change to all the js...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.11
Version: FDP 19.G
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: haidong li
URL:
Whiteboard:
Depends On:
Blocks: 1761573 1761575 1761577
TreeView+ depends on / blocked
 
Reported: 2019-10-14 18:28 UTC by Numan Siddique
Modified: 2020-07-15 17:51 UTC (History)
6 users (show)

Fixed In Version: openvswitch2.11-2.11.3-58.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1761573 1761577 (view as bug list)
Environment:
Last Closed: 2019-11-06 05:21:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3719 0 None None None 2019-11-06 05:21:32 UTC

Description Numan Siddique 2019-10-14 18:28:10 UTC
Description of problem:
In an OVN deployment when ovsdb-server failover happens, it can happen that some ovn-controllers connect to the ovsdb-server master in read-only mode. Once the ovsdb-servers are promoted to master, ideally ovn-controller should reconnect again and have read-write access to the db. But some times, the connection is not reset and these ovn-controller remain connected to the ovsdb-servers' in read-only mode. Because of which they cannot write to the SB db. This causes VM boot failures and mac_binding write failures.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 haidong li 2019-10-16 03:54:36 UTC
Hi Numan,can this bug happen everytime when change the ovsdb-server? Or is there an easy way to reproduce it? We haven't found the bug when we run our ha case which use cluster to test.Thanks!

Comment 3 Numan Siddique 2019-10-16 04:59:13 UTC
(In reply to haidong li from comment #2)
> Hi Numan,can this bug happen everytime when change the ovsdb-server? Or is
> there an easy way to reproduce it? We haven't found the bug when we run our
> ha case which use cluster to test.Thanks!

Hi Haidong Li,

This doesn't happen all the time. There is small timing window in which this could happen.

Prior to the fix which addresses this issue, whenever ovsdb-server changes the state from active to standby or vice versa, ovsdb-server was closing the existing socket connections.

With the fix, it doesn't close the existing connections.

You can probably verify in 2 ways

1. Without pacemaker.

a.  In this, start OVN ovsdb-servers
b. Run ovn-nbctl --detach. Note down the unix socket it displays in a shell varible - NB_PATH
3. Run ovs-appctl -t $NB_PATH run ls-add tst
4. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/set-active-ovsdb-server tcp:192.0.2.254:6641"
5. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/connect-active-ovsdb-server"
6. Run -  ovs-appctl -t $NB_PATH run ls-add tst1 -> This should fail.
7. Run "ovs-appctl -t /var/run/openvswitch/ovnnb_db.ct ovsdb-server/disconnect-active-ovsdb-server"
8. Run -  ovs-appctl -t $NB_PATH run ls-add tst2 -> This should fail.

The othwe is with pacemaker. This is tricky and you may or may  not be able to reproduce it. But this is how
I reproduced it. (Use openvswitch version which doesn't have this fix)

2. With pacemaker setup
a. Increase the monitor interval period by running
  - pcs resource show <ovn-resource-name>
  - pcs resource update <ovn-resource-name> op monitor interval 600 (I don't know the exact command, but you can copy from the pcs resource show output.

b. In one window run - "watch -n1 nestat -tunlpa | grep 6642". Make sure that ovn-controller is connected to the active ovsdb-server
b. On the master node, keep running the below commands
  - /usr/share/openvswitch/scripts/ovn-ctl demote_ovnsb --db-sb-sync-from-addr=192.0.2.254 
    When you run the above command, you should see the socket connection in the "netstat" output to getting reset.
  - /usr/share/openvswitch/scripts/ovn-ctl promote_ovnsb
    Again when you run this command, the socket connection should be reset.

If you keep running the above commands, it can happen that ovsdb-server doesn't reset the socket connection. When this happens you can verify if ovn-controller
is able to write to the sb db or not. For this you can bind a port. 
 Like - ovs-vsctl add-port br-int tst-- set interface tst type=internal
           ovs-vsctl set Interface $name external_ids:iface-id=$<SOME_LOGICAL_PORT_ID>

Hope this helps.

Thanks
Numan

Comment 4 haidong li 2019-10-22 06:22:32 UTC
verified on the latest version:
[root@dell-per740-18 ~]# rpm -qa | grep openvswitch
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
openvswitch2.11-2.11.0-26.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch
[root@dell-per740-18 ~]# rpm -qa | grep ovn
ovn2.11-2.11.1-8.el7fdp.x86_64
ovn2.11-central-2.11.1-8.el7fdp.x86_64
ovn2.11-host-2.11.1-8.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch
[root@dell-per740-18 ~]#
[root@dell-per740-18 ~]# ovn-nbctl --detach
/var/run/openvswitch/ovn-nbctl.24631.ctl
[root@dell-per740-18 ~]# NB_PATH=/var/run/openvswitch/ovn-nbctl.24631.ctl
[root@dell-per740-18 ~]# ovs-appctl -t $NB_PATH run ls-add test
[root@dell-per740-18 ~]# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/set-active-ovsdb-server tcp:192.0.2.254:6641
[root@dell-per740-18 ~]#  ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/connect-active-ovsdb-server
[root@dell-per740-18 ~]#  ovs-appctl -t $NB_PATH run ls-add test1
transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"}
ovs-appctl: /var/run/openvswitch/ovn-nbctl.24631.ctl: server returned an error
[root@dell-per740-18 ~]# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl ovsdb-server/disconnect-active-ovsdb-server
[root@dell-per740-18 ~]#  ovs-appctl -t $NB_PATH run ls-add test2
[root@dell-per740-18 ~]# ovn-nbctl show
switch dc71bf7c-adb1-44f4-bc4e-511067edcf34 (test2)
switch f8313504-6647-4d1d-97f3-9c60a553ea91 (test)
[root@dell-per740-18 ~]#

Comment 6 errata-xmlrpc 2019-11-06 05:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3719

Comment 7 OvS team 2020-07-15 17:51:07 UTC
* Wed Jul 15 2020 Flavio Leitner <fbl> - 2.11.3-58
- spec: Fix configure to use dpdkdir without version.
  [583acc91dd782f1e73cc20a27b7cbd8bb5a7bc98]

* Mon Jul 13 2020 Flavio Leitner <fbl> - 2.11.3-57
- redhat: Rename OVSCI job name.
  [cbcaa831188b77f253f718203dc743904538464a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-56
- This is fast-datapath-rhel-8
  [98f312f126a245f2609a8dcea9604e09832181f0]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-55
- bus/pci: fix VF memory access (#1851170)
  [fa4d90db57191665037114e4098f3d1f6b6ea9c7]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-54
- vhost: fix vring index check (#1831391)
  [8e33084d85d80cea72d02de0abf36c142dcefa2a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-53
- vhost: check log mmap offset and size overflow (#1831391)
  [753ae0cf66553e8fd71b8e76642900d9fb62c406]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-52
- vhost: add device op when notification to guest is sent (#1726579)
  [92715cf99cbebdb6d13e223872cdd44f822a4ebe]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-51
- net/i40e: re-program promiscuous mode on VF interface (#1733402)
  [0fe1f42b5f3bc0b714f063d57cc79215459d28dc]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-50
- bus/pci: always check IOMMU capabilities (#1711739)
  [0815c39d39c0b34dd7456bde23077e1f25250dec]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-49
- eal: fix IOVA mode selection as VA for PCI drivers (#1711739)
  [11fbef3c85f71b257dc37dd9b570025ad4a24dfa]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-48
- bus/pci: consider only usable devices for IOVA mode (#1711739)
  [69f5cb4c56c59505c76d4599cb0117b9fd6bfc11]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-47
- eal: compute IOVA mode based on PA availability (#1711739)
  [d5e1d2fa507875898bae71762c84c4f1d63ed972]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-46
- netdev-linux: Update LAG in all cases. (#1812892)
  [276351180996d21a96b6539671e4eed4e636f65d]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-45
- netdev-offload-tc: Re-fetch block ID after probing. (#1812892)
  [83cebd3221538df693d7170c3a17ed9a381911c6]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-44
- netdev-offload-tc: Flush rules on ingress block when init tc flow api (#1812892)
  [e5d7d5ec243b68d65383ca5075d7128f13e8aebc]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-43
- netdev-vport: Use the dst_port in tunnel netdev name (#1727599)
  [f4a6fb757441ee0ba5bf808a18cd8bf7a65a9124]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-42
- lib/tc: Fix flow dump for tunnel id equal zero (#1732305)
  [765ba1d1c0898446d3c05d9c7d3e92134647787a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-41
- lib/tc: Support optional tunnel id (#1732305)
  [42f09fe96f8664a4165261c935d0a4117f0675d1]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-40
- tc: Set 'no_percpu' flag for compatible actions (#1780690)
  [42f07f6bd81f65f52b84bb7a0011c5bb21af71ce]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-39
- rhel: let *-ctl handle runtime directory (#1785586)
  [c3763ec916aef757d113a73fb402cf89753e92a7]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-38
- rhel: set useropts optional for ovsdb-server (#1785586)
  [77bed8f0e4c0a3b7396a219d4680d585e88caf95]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-37
- rhel: run ovn with the same user as ovs (#1785586)
  [8f5f39b4afcfcfc8f29e79db138629630909352a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-36
- rhel: secure openvswitch useropts (#1785586)
  [71154ad26f1c22aacc60ab0a1ea335b7b2a6588a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-35
- userspace: Improved packet drop statistics. (#1726568)
  [a6b7a37be86d9fe990e4511f56b99d23d14f763d]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-34
- netdev-dpdk: Fix sw stats perf drop. (#1790841)
  [54f4571750280654fa05705b2d4657823dffbf64]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-33
- netdev-dpdk: Detailed packet drop statistics. (#1790841)
  [1e1b33541a3a832e32d7515b660f2939b251718a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-32
- netdev-dpdk: Reuse vhost function for dpdk ETH custom stats. (#1790841)
  [e0d00f70c5154535a86295ea58f6ef726e478fc8]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-31
- netdev-dpdk: Refactor vhost custom stats for extensibility. (#1790841)
  [b084d7a5c2644ac5e6ec667c80ae9c39b3f22350]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-30
- netdev-dpdk: Fix not reporting rx_oversize_errors in stats. (#1790841)
  [26017f85c82ba01a1e884a031605095b4f64ee69]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-29
- ovsdb replication: Provide option to configure probe interval. (#1788800)
  [e8a669ead72973ced8bb15d9a18e25b323f05ab0]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-28
- netdev-dpdk: Add coverage counter to count vhost IRQs. (#1726579)
  [3c3997eb0aa9693f89a6a3083b6fa12772d522dd]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-27
- netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event. (#1719644)
  [ca1a1a8e1c6ec2b44744876b26630448022b95e9]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-26
- bridge: Allow manual notifications about interfaces' updates. (#1719644)
  [f58b68088819d4ec8b7bd3a1821929f5fea3170d]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-25
- Shutdown SSL connection before closing socket (#1780745)
  [aa97017175536816f70d111647b5dc9bedd824ff]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-24
- flake8: also check the ovs-check-dead-ifs script (#1751161)
  [ecd3a1b407816c629c17f410f95eab868ab68257]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-23
- ovs-check-dead-ifs: unshadow pid variable (#1751161)
  [a086e7618191f0efc75746c1fe6d4481a397f2ac]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-22
- ovs-check-dead-ifs: python3 print format (#1751161)
  [d61553f744b42dc05186910be30171ed1f8425e3]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-21
- ovs-tcpundump: exit when getting version (#1764127)
  [ea9923af222ed5bf398846b553d7b7fe54e10bd6]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-20
- ovs-tcpundump: allow multiple packet lengths (#1764125)
  [ac3b7794054e2b15b22855930b23ede24b5d5835]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-19
- jsonrpc: increase input buffer size from 512 to 4096 (#1776883)
  [9c93db837390817b3bae8b2104bec5becbd946cf]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-18
- netdev-dpdk: Track vhost tx contention. (#1740144)
  [31112a95027735528554c91953de89175f94e191]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-17
- ovsdb-server: Allow replication from older schema version servers. (#1766586)
  [cb53fe2282c1c260cb7cc98c9d21e0573b304283]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-16
- ovsdb-server: Don't drop all connections on read/write status change. (#1761572)
  [5a0a77328bcab168ad04fba006158f2c2884befb]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-15
- ofproto-dpif: Fix continuation with patch port (#1761461)
  [069d4bd4378e02bd61121f32fb2bc18ac316f358]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-14
- vswitch: ratelimit the device add log (#1737146)
  [052e541d4580fe49d3461c3045755374a0726dd5]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-13
- netdev-dpdk: Enable tx-retries-max config. (#1747531)
  [734086f5d4608b7cdf03a5d0a182245354e1f6eb]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-12
- netdev-dpdk: Add custom stat for vhost tx retries. (#1747531)
  [0c238ac414e750fad80ec810ff42395df6c2e540]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-11
- doc: Move vhost tx retry info to separate section. (#1747531)
  [91d9e4d92b9efe06dccbf22f42faf1ae183a96e9]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-10
- netdev-vport: Make ip6gre netdev type to use TC rules (#1725623)
  [d3315b8035a875e9e3b425d72a97191fbcb7e065]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-9
- tunnel: Add layer 2 IPv6 GRE encapsulation support. (#1725623)
  [0c20e7e83ddb50dbb6e0c37f986216e3953ea12e]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-8
- ovsdb-server: drop all connections on read/write status change (#1720947)
  [0f0be40ee08c15a114029a5c0e046dc58d38fb09]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-7
- netdev-tc-offloads: Support match on priority tags (#1725623)
  [895735b3827e2afdd7c968d965e9f4fd9b0e1278]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-6
- rhel: limit stack size to 2M. (#1720315)
  [79c6209e71801b94396ce4833cff99a2c0969e30]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-5
- Add a new OVS action check_pkt_larger (#1702564)
  [c899ac57880e4446a00d83a590a5eb60fc081fdc]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-4
- netlink linux: account for the netnsid netlink attr. (#1692812)
  [ce14b518b702c2401a9a291a0afd654de5cd44a5]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-3
- rhel: Add an example to specify custom options (#1687775)
  [a7dd6b6eb5e2dfe15d9387f83b614c8661b18bdd]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-2
- ovs-ctl: Permit to specify additional options (#1687775)
  [b8a874b82e423a87965503da2384c45e84b6509a]

* Fri Jul 10 2020 Timothy Redaelli <tredaelli> - 2.11.3-1
- Merge commit 'a4efc599e0244e43fd417b2fb38b7f120eb1ebd4' into fast-datapath-rhel-7
  [8da1428afe7a47d5fe02d396ede18d7ecfb60128]

- Backport "vhost: fix virtqueue not accessible" (#1792399)
- Backport "vhost: prevent zero copy mode if IOMMU is on" (#1792399)
- Backport "vhost: convert buffer addresses to GPA for logging" (#1792399)
- Backport "vhost: translate incoming log address to GPA" (#1792399)
- Backport "vhost: fix vring address handling during live migration" (#1792399)


Note You need to log in before you can comment on or make changes to this bug.