Description of problem: This BZ is used to track backporting of the RAFT incomplete command memory leak to openvswitch2.13. Upstream patch: https://mail.openvswitch.org/pipermail/ovs-dev/2020-May/370105.html
* Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-24 - raft: Disable RAFT jsonrpc inactivity probe. (#1822290) [b12acf45a6872dda85642cbc73dd86eb529be17e] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-23 - raft: Fix leak of the incomplete command. (#1835729) [bb552cffb89104c2bb19b8aff749b8b825a6db13] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-22 - raft: Fix the problem of stuck in candidate role forever. (#1828639) [c5937276691bb90f99fad1871b5e3ca4ac9391e7] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-21 - raft: Fix next_index in install_snapshot reply handling. (#1828639) [09ac3c327ec678f36cd9df451b7846acdf734c0f] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-20 - raft: Avoid busy loop during leader election. (#1828639) [19683b041e19a49e275a4b42f5bb5b0528de898a] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-19 - raft: Fix raft_is_connected() when there is no leader yet. (#1828639) [2dae730162e5e1b084ac0d1fc339d2f09bd8cddb] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-18 - ovsdb-server: Don't disconnect clients after raft install_snapshot. (#1828639) [da9680c6095df8d6c477aa10e29baa8f00dc2e25] * Thu May 14 2020 Dumitru Ceara <dceara> - 2.13.0-17 - raft-rpc: Fix message format. (#1828639) [e9bb63d6190925db63b4cad83e57a945c4ac0629]
I verified the fix on the downstream package by running the ovsdb-cluster testsuite with valgrind, essentially test case "OVSDB cluster - txn on follower-2, follower-2 crash before sending execReq, reconnect to follower-3". On openvswitch2.13-2.13.0-17 (without fix): # clone the openvswitch2.13 dist-git repo. # checkout the revision corresponding to openvswitch2.13-2.13.0-17. $ rh-pgk prep $ cd ovs-2.13.0 && ./boot.sh && ./configure $ make check-ovsdb-cluster-valgrind TESTSUITEFLAGS="124" [...] 124: OVSDB cluster - txn on follower-2, follower-2 crash before sending execReq, reconnect to follower-3 ok [...] $ grep "are definitely lost" tests/ovsdb-cluster-testsuite.dir/*/valgrind.* tests/ovsdb-cluster-testsuite.dir/124/valgrind.14805:==14805== 72 bytes in 1 blocks are definitely lost in loss record 452 of 642 On openvswitch2.13-2.13.0-29 (with fix): $ make check-ovsdb-cluster-valgrind TESTSUITEFLAGS="126" [...] 126: OVSDB cluster - txn on follower-2, follower-2 crash before sending execReq, reconnect to follower-3 ok [...] $ grep "are definitely lost" tests/ovsdb-cluster-testsuite.dir/*/valgrind.* $ Jianlin, could you please move this to VERIFIED, I don't seem to have the rights to do that. Thanks, Dumitru
thanks Dumitru for running the valgrind. set VERIFIED per comment 5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2944