While forcefully transferring leadership, leader completes all the current 'in-flight' commands, so the higher layers can re-try corresponding transactions when the new leader is elected. However, most of these commands are already sent to followers, hence they will actually be committed by the majority of the cluster members, i.e. will be treated as committed by the new leader, unless there is an actual network problem between servers. However, the old leader will decline append replies, since it's not the leader anymore and commands are already completed with RAFT_CMD_LOST_LEADERSHIP status. New leader will replicate the commit index back to the old leader. Old leader, will re-try the previously completed transaction anyway, because the "cluster error"s are temporary. If transaction had some prerequisites that doesn't allow double committing or there are other database constraints (like indexes) that will not allow transaction to be committed twice, the server will reply to the client with a false-negative transaction result. If there are no prerequisites or additional database constraints, the server will execute the same transaction second time as a follower. E.g. in the OVN case, this may result in creation of duplicated logical switches / routers / load balancers. I.e. resources with the same non-indexed name. That may cause issues later where ovn-nbctl will not be able to add ports to these switches or some other unpleasant consequences. This BZ is to investigate and, probably, fix the most common case, where the leadership is transferred voluntarily under normal operation. The idea is to not complete all commands in case of of compaction-triggered leadership transfer and let them timeout or be committed by advancing the commit index after receiving messages from a new leader ("Command completed without reply"). Similar issue was fixed before for the transaction executed through the follower during a leader failover: 5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover.") There are sill, probably, rare cases where 'at-least-once' semantics might still be a thing. E.g. in corner cases involving the client re-connection and actual network / communication issues between raft servers. But this needs much deeper analysis. The change, AFAIU, should eliminate duplicated transaction commits and false-negatives in normal operation.
Patch sent for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20220505122439.523610-1-i.maximets@ovn.org/
* Thu May 26 2022 Open vSwitch CI <ovs-ci> - 2.13.0-184 - Merging upstream branch-2.13 [RH git: 9d3ca376af] Commit list: 91d3fc87ac classifier: Make find_match_wc() prototype and definition match. 78a5ad8051 ovsdb: raft: Fix transaction double commit due to lost leadership. (#2046340) d0171d3b47 Revert "odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP." 5ae081baeb ofproto-dpif: Trigger revalidation if ct tp changes.
the reproducer didn't fail on the old version openvwitch2.13-180. and logs on the fixed version openswitch2.13-184: [root@wsfd-advnetlab18 bz2046340]# grep "Transferring leadership" -A 2 /var/log/ovn/ovsdb-server-nb.log 2022-08-03T05:45:54.075Z|00035|raft|INFO|Transferring leadership to write a snapshot. 2022-08-03T05:45:54.084Z|00036|raft|INFO|server 92d9 is leader for term 2 2022-08-03T05:45:58.420Z|00037|memory|INFO|peak resident set size grew 106% in last 400.2 seconds, from 96268 kB to 198732 kB -- 2022-08-03T06:04:00.294Z|00043|raft|INFO|Transferring leadership to write a snapshot. 2022-08-03T06:04:01.173Z|00044|raft|INFO|server 92d9 is leader for term 4 2022-08-03T06:04:01.174Z|00045|raft|INFO|command completed after role change from leader to follower (eid: 9ae2bd2b-fd3f-4458-a61a-8c3af2a4c41e, commit index: 87301) -- 2022-08-03T06:27:54.826Z|00074|raft|INFO|Transferring leadership to write a snapshot. 2022-08-03T06:27:54.826Z|00075|raft|INFO|rejected append_reply (not leader) 2022-08-03T06:27:54.826Z|00076|raft|INFO|rejected append_reply (not leader) <=== Ilya, does the log show that the issue still exist on the new version?
(In reply to Jianlin Shi from comment #5) > > 2022-08-03T06:27:54.826Z|00074|raft|INFO|Transferring leadership to write a > snapshot. > 2022-08-03T06:27:54.826Z|00075|raft|INFO|rejected append_reply (not leader) > > 2022-08-03T06:27:54.826Z|00076|raft|INFO|rejected append_reply (not leader) > > <=== Ilya, does the log show that the issue still exist on the new version? The log itself only indicates the condition in which the issue can happen. If you have the log like above, but the transaction didn't fail, it means that issue is fixed, i.e. the fix is working as expected.
set VERIFIED + SanityOnly per comment 6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:6367