2046340 – ovsdb-server: raft: Transaction double commit on leadership transfer

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2046340 - ovsdb-server: raft: Transaction double commit on leadership transfer

Summary: ovsdb-server: raft: Transaction double commit on leadership transfer

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	ovsdb2.13
Sub Component:
Version:	FDP 22.A
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ilya Maximets
QA Contact:	Jianlin Shi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-26 16:05 UTC by Ilya Maximets
Modified:	2022-09-06 18:48 UTC (History)
CC List:	5 users (show)
Fixed In Version:	openvswitch2.13-2.13.0-184.el8fdp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-06 18:48:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FD-1744	0	None	None	None	2022-01-26 16:11:17 UTC
Red Hat Product Errata	RHBA-2022:6367	0	None	None	None	2022-09-06 18:48:43 UTC

Description Ilya Maximets 2022-01-26 16:05:26 UTC

While forcefully transferring leadership, leader completes all the current
'in-flight' commands, so the higher layers can re-try corresponding transactions
when the new leader is elected.  However, most of these commands are already
sent to followers, hence they will actually be committed by the majority of
the cluster members, i.e. will be treated as committed by the new leader, unless
there is an actual network problem between servers.  However, the old leader
will decline append replies, since it's not the leader anymore and commands
are already completed with RAFT_CMD_LOST_LEADERSHIP status.

New leader will replicate the commit index back to the old leader.
Old leader, will re-try the previously completed transaction anyway, because
the "cluster error"s are temporary.  If transaction had some prerequisites
that doesn't allow double committing or there are other database
constraints (like indexes) that will not allow transaction to be committed
twice, the server will reply to the client with a false-negative transaction
result.  If there are no prerequisites or additional database constraints,
the server will execute the same transaction second time as a follower.
E.g. in the OVN case, this may result in creation of duplicated logical
switches / routers / load balancers.  I.e. resources with the same non-indexed
name.  That may cause issues later where ovn-nbctl will not be able to add
ports to these switches or some other unpleasant consequences.

This BZ is to investigate and, probably, fix the most common case, where the
leadership is transferred voluntarily under normal operation.  The idea is
to not complete all commands in case of of compaction-triggered leadership
transfer and let them timeout or be committed by advancing the commit index
after receiving messages from a new leader ("Command completed without reply").

Similar issue was fixed before for the transaction executed through the
follower during a leader failover:
  5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover.")

There are sill, probably, rare cases where 'at-least-once' semantics might
still be a thing. E.g. in corner cases involving the client re-connection
and actual network / communication issues between raft servers.  But this
needs much deeper analysis.  The change, AFAIU, should eliminate duplicated
transaction commits and false-negatives in normal operation.

Comment 1 Ilya Maximets 2022-05-05 12:27:13 UTC

Patch sent for review:
  https://patchwork.ozlabs.org/project/openvswitch/patch/20220505122439.523610-1-i.maximets@ovn.org/

Comment 2 OvS team 2022-05-26 15:35:32 UTC

* Thu May 26 2022 Open vSwitch CI <ovs-ci> - 2.13.0-184
- Merging upstream branch-2.13 [RH git: 9d3ca376af]
    Commit list:
    91d3fc87ac classifier: Make find_match_wc() prototype and definition match.
    78a5ad8051 ovsdb: raft: Fix transaction double commit due to lost leadership. (#2046340)
    d0171d3b47 Revert "odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP."
    5ae081baeb ofproto-dpif: Trigger revalidation if ct tp changes.

Comment 5 Jianlin Shi 2022-08-03 07:38:17 UTC

the reproducer didn't fail on the old version openvwitch2.13-180. and logs on the fixed version openswitch2.13-184:

[root@wsfd-advnetlab18 bz2046340]# grep "Transferring leadership" -A 2 /var/log/ovn/ovsdb-server-nb.log  
2022-08-03T05:45:54.075Z|00035|raft|INFO|Transferring leadership to write a snapshot.                 
2022-08-03T05:45:54.084Z|00036|raft|INFO|server 92d9 is leader for term 2                             
2022-08-03T05:45:58.420Z|00037|memory|INFO|peak resident set size grew 106% in last 400.2 seconds, from 96268 kB to 198732 kB
--                                                                                                    
2022-08-03T06:04:00.294Z|00043|raft|INFO|Transferring leadership to write a snapshot.                 
2022-08-03T06:04:01.173Z|00044|raft|INFO|server 92d9 is leader for term 4                             
2022-08-03T06:04:01.174Z|00045|raft|INFO|command completed after role change from leader to follower (eid: 9ae2bd2b-fd3f-4458-a61a-8c3af2a4c41e, commit index: 87301)
--                                                                                                    
2022-08-03T06:27:54.826Z|00074|raft|INFO|Transferring leadership to write a snapshot.                 
2022-08-03T06:27:54.826Z|00075|raft|INFO|rejected append_reply (not leader)                           
2022-08-03T06:27:54.826Z|00076|raft|INFO|rejected append_reply (not leader) 

<=== Ilya, does the log show that the issue still exist on the new version?

Comment 6 Ilya Maximets 2022-08-03 09:41:03 UTC

(In reply to Jianlin Shi from comment #5)
> 
> 2022-08-03T06:27:54.826Z|00074|raft|INFO|Transferring leadership to write a
> snapshot.                 
> 2022-08-03T06:27:54.826Z|00075|raft|INFO|rejected append_reply (not leader) 
> 
> 2022-08-03T06:27:54.826Z|00076|raft|INFO|rejected append_reply (not leader) 
> 
> <=== Ilya, does the log show that the issue still exist on the new version?

The log itself only indicates the condition in which the issue can happen.
If you have the log like above, but the transaction didn't fail, it means
that issue is fixed, i.e. the fix is working as expected.

Comment 7 Jianlin Shi 2022-08-03 09:42:36 UTC

set VERIFIED + SanityOnly per comment 6

Comment 9 errata-xmlrpc 2022-09-06 18:48:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6367

Note You need to log in before you can comment on or make changes to this bug.