Bug 1765619

Summary: corosync can corrupt messages under heavy load and large messages [rhel-8.1.0.z]
Product: Red Hat Enterprise Linux 8 Reporter: Oneata Mircea Teodor <toneata>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: aherr, ccaulfie, cfeist, cluster-maint, cluster-qe, coughlan, fdinitto, jfriesse, mjuricek, phagara, toneata
Target Milestone: rcKeywords: ZStream
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-3.0.2-3.el8_1.1 Doc Type: Bug Fix
Doc Text:
Cause: Corosync forms new membership and tries to send messages in recovery. Consequence: Messages are not fully sent and other nodes receives them corrupted. Fix: Properly set maximum size of message. Result: Messages are always fully sent so other nodes receive them correctly.
Story Points: ---
Clone Of: 1765025 Environment:
Last Closed: 2019-12-17 10:46:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1765025    
Bug Blocks: 1765617    
Attachments:
Description Flags
8.1.z-bz1765619-1-totemsrp-Reduce-MTU-to-left-room-second-mcast none

Comment 2 Jan Friesse 2019-10-30 13:39:37 UTC
Created attachment 1630613 [details]
8.1.z-bz1765619-1-totemsrp-Reduce-MTU-to-left-room-second-mcast

totemsrp: Reduce MTU to left room second mcast

Messages sent during recovery phase are encapsulated so such message has
extra size of mcast structure. This is not so big problem for UDPU,
because most of the switches are able to fragment and defragment packet
but it is problem for knet, because totempg is using maximum packet size
(65536 bytes) and when another header is added during retransmition,
then packet is too large.

Solution is to reduce mtu by 2 * sizeof (struct mcast).

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Fabio M. Di Nitto <fdinitto>

Comment 10 errata-xmlrpc 2019-12-17 10:46:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4264