Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1821360

Summary:	[OVN SCALE] HA/RAFT not integrated correctly into ovsdb
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	Anton Ivanov <anivanov>
Component:	ovsdb	Assignee:	OVN Team <ovnteam>
Status:	CLOSED NOTABUG	QA Contact:	ovs-qe
Severity:	unspecified	Docs Contact:
Priority:	low
Version:	RHEL 8.0	CC:	ctrautma, echaudro, fleitner, jhsiao, mmichels, qding, ralongi, sfaye
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-06-18 09:54:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Anton Ivanov 2020-04-06 16:37:07 UTC

This was found when analysing failures in the early prototypes of the async IO patchset.

Initial condition:

A DB is removed from OVSDB due to an admin request or a raft HA decision

This triggers notifications for monitor cancels of all clients which are subscribed to the database. These notifications are enqueued to be sent to clients which takes a finite amount of time. 

Any client which will issue a transaction to the database during this time window will receive a "syntax error" JSON reply.

This will be extremely difficult to fix without major API additions because there is no mandatory flush and there is no "upper level" means of triggering a json rpc "echo" and waiting for an echo reply to ensure that anything on the wire between the server and the client has been flushed.

This is likely to be an issue ONLY at scale when there are a lot of pending requests and a lot of pending notifications to transmit.

It is somewhat mitigated by ovsdb connection being effectively half-duplex and it not invoking jsonrpc session receive if there is pending transmit. While this mitigates it, it does not fix it. If ovsdb is optimized for throughput in any way, this is likely to become easier to reproduce.