We see in large scale tests that ovn-controller disconnects from ovsdb due to inactivity probe timer. There is an effort upstream to disable this probe: https://patchwork.ozlabs.org/patch/1264446/ This BZ is to track landing it downstream.
* Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-24 - raft: Disable RAFT jsonrpc inactivity probe. (#1822290) [b12acf45a6872dda85642cbc73dd86eb529be17e] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-23 - raft: Fix leak of the incomplete command. (#1835729) [bb552cffb89104c2bb19b8aff749b8b825a6db13] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-22 - raft: Fix the problem of stuck in candidate role forever. (#1828639) [c5937276691bb90f99fad1871b5e3ca4ac9391e7] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-21 - raft: Fix next_index in install_snapshot reply handling. (#1828639) [09ac3c327ec678f36cd9df451b7846acdf734c0f] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-20 - raft: Avoid busy loop during leader election. (#1828639) [19683b041e19a49e275a4b42f5bb5b0528de898a] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-19 - raft: Fix raft_is_connected() when there is no leader yet. (#1828639) [2dae730162e5e1b084ac0d1fc339d2f09bd8cddb] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-18 - ovsdb-server: Don't disconnect clients after raft install_snapshot. (#1828639) [da9680c6095df8d6c477aa10e29baa8f00dc2e25] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-17 - raft-rpc: Fix message format. (#1828639) [e9bb63d6190925db63b4cad83e57a945c4ac0629]
following step https://bugzilla.redhat.com/show_bug.cgi?id=1836308#c4, reproduced on openvswitch2.13.0-17.el7: [root@dell-per740-12 bz1822290]# tcpdump -r 17-6643.pcap -nn -A | grep echo reading from file 17-6643.pcap, link-type EN10MB (Ethernet) ...B.k..{"id":"echo","method":"echo","params":[]} .k(....B{"id":"echo","method":"echo","params":[]} .k(....B{"id":"echo","result":[],"error":null} ...C.k(.{"id":"echo","result":[],"error":null} .w. ...\{"id":"echo","method":"echo","params":[]} .....w. {"id":"echo","method":"echo","params":[]} .....w. {"id":"echo","result":[],"error":null} .w. ....{"id":"echo","result":[],"error":null} .....k)'{"id":"echo","method":"echo","params":[]} .k<.....{"id":"echo","method":"echo","params":[]} .k<.....{"id":"echo","result":[],"error":null} .....k<.{"id":"echo","result":[],"error":null} [root@dell-per740-12 bz1822290]# tcpdump -r 17-6644.pcap -nn -A | grep echo reading from file 17-6644.pcap, link-type EN10MB (Ethernet) ..k..j.8{"id":"echo","method":"echo","params":[]} .j....k.{"id":"echo","method":"echo","params":[]} .j....k.{"id":"echo","result":[],"error":null} ..k..j..{"id":"echo","result":[],"error":null} .w^.....{"id":"echo","method":"echo","params":[]} ...U.wK1{"id":"echo","method":"echo","params":[]} ...U.w^.{"id":"echo","result":[],"error":null} .w^....U{"id":"echo","result":[],"error":null} ...d.j..{"id":"echo","method":"echo","params":[]} .k. ...d{"id":"echo","method":"echo","params":[]} .k. ...d{"id":"echo","result":[],"error":null} ...e.k. {"id":"echo","result":[],"error":null} <=== echo packets sent Verified on openvswitch2.13.0-30.el7: [root@dell-per740-12 bz1822290]# rpm -qa | grep "openvswitch|ovn" -E ovn2.13-2.13.0-34.el7fdp.x86_64 ovn2.13-central-2.13.0-34.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch openvswitch2.13-2.13.0-30.el7fdp.x86_64 ovn2.13-host-2.13.0-34.el7fdp.x86_64 [root@dell-per740-12 bz1822290]# tcpdump -r 30-6643.pcap -nn -A | grep echo reading from file 30-6643.pcap, link-type EN10MB (Ethernet) [root@dell-per740-12 bz1822290]# tcpdump -r 30-6644.pcap -nn -A | grep echo reading from file 30-6644.pcap, link-type EN10MB (Ethernet) <=== no echo packets
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2944