Bug 1765025
| Summary: | corosync can corrupt messages under heavy load and large messages | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Christine Caulfield <ccaulfie> | ||||
| Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 8.0 | CC: | aherr, ccaulfie, cfeist, cluster-maint, coughlan, fdinitto, phagara, toneata | ||||
| Target Milestone: | rc | Keywords: | ZStream | ||||
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | corosync-3.0.2-4.el8 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause:
Corosync forms new membership and tries to send messages in recovery.
Consequence:
Messages are not fully sent and other nodes receives them corrupted.
Fix:
Properly set maximum size of message.
Result:
Messages are always fully sent so other nodes receive them correctly.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1765617 1765619 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-04-28 15:56:45 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1765617, 1765619 | ||||||
| Attachments: |
|
||||||
|
Description
Christine Caulfield
2019-10-24 07:52:02 UTC
I should add that this only happens when using knet as the transport as the MTU size is set larger than the allowed knet buffer size for some large packets. do these messages not get re-sent until successfully received, or are they always corrupt? do nodes get fenced because of this? in other words, what's the actual impact from admin's point of view? > do these messages not get re-sent until successfully received, or are they always corrupt? do nodes get fenced because of this?
> in other words, what's the actual impact from admin's point of view?
TBH I've never seen it in the wild, only on my test rig, so I don't know for sure what the impact would be. It does depend on the message that gets corrupted, so possible effects could be unexpected behaviour, hangs or fencing. If it's a pacemaker message then it will probably be resent as I believe pacemaker checksums its messages.
Created attachment 1629053 [details]
totemsrp: Reduce MTU to left room second mcast
totemsrp: Reduce MTU to left room second mcast
Messages sent during recovery phase are encapsulated so such message has
extra size of mcast structure. This is not so big problem for UDPU,
because most of the switches are able to fragment and defragment packet
but it is problem for knet, because totempg is using maximum packet size
(65536 bytes) and when another header is added during retransmition,
then packet is too large.
Solution is to reduce mtu by 2 * sizeof (struct mcast).
Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Fabio M. Di Nitto <fdinitto>
before ====== > [root@virt-034 ~]# rpm -q corosync > corosync-3.0.2-3.el8.x86_64 set the following options in corosync.conf in order to make it easier to trigger the bug: > token: 400 > token_coefficient: 0 start cluster: > [root@virt-034 ~]# pcs status > Cluster name: STSRHTS30715 > Stack: corosync > Current DC: virt-065 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum > Last updated: Wed Jan 29 17:51:46 2020 > Last change: Wed Jan 29 17:29:48 2020 by root via cibadmin on virt-034 > > 13 nodes configured > 13 resources configured > > Online: [ virt-034 virt-056 virt-057 virt-058 virt-059 virt-060 virt-061 virt-062 virt-065 virt-068 virt-069 virt-070 virt-074 ] > > Full list of resources: > > fence-virt-034 (stonith:fence_xvm): Started virt-034 > fence-virt-056 (stonith:fence_xvm): Started virt-056 > fence-virt-057 (stonith:fence_xvm): Started virt-057 > fence-virt-058 (stonith:fence_xvm): Started virt-058 > fence-virt-059 (stonith:fence_xvm): Started virt-059 > fence-virt-060 (stonith:fence_xvm): Started virt-060 > fence-virt-061 (stonith:fence_xvm): Started virt-061 > fence-virt-062 (stonith:fence_xvm): Started virt-062 > fence-virt-065 (stonith:fence_xvm): Started virt-065 > fence-virt-068 (stonith:fence_xvm): Started virt-068 > fence-virt-069 (stonith:fence_xvm): Started virt-069 > fence-virt-070 (stonith:fence_xvm): Started virt-070 > fence-virt-074 (stonith:fence_xvm): Started virt-074 > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled start `cpghum` on all but one node, then `cpghum -f` on the remaining one. some nodes report CRC mismatches: > [root@virt-070 test]# ./cpghum > cpghum: 119 messages received, 4096 bytes per write. RTT min/avg/max: 4312/51378/468873 > cpghum: counters don't match. got 1021, expected 1920 from node 1 > cpghum: CRCs don't match. got b8366af3, expected b1f6f778 from nodeid 1 > cpghum: counters don't match. got 1921, expected 1022 from node 1 > cpghum: counters don't match. got 1021, expected 2225 from node 1 > cpghum: CRCs don't match. got b8366af3, expected 36e6d7d5 from nodeid 1 > cpghum: counters don't match. got 2226, expected 1022 from node 1 > cpghum: counters don't match. got 1021, expected 2613 from node 1 > cpghum: CRCs don't match. got b8366af3, expected 5528d35e from nodeid 1 > cpghum: counters don't match. got 2614, expected 1022 from node 1 > cpghum: counters don't match. got 1021, expected 3071 from node 1 > cpghum: CRCs don't match. got b8366af3, expected 9d355ebf from nodeid 1 > cpghum: counters don't match. got 3072, expected 1022 from node 1 > cpghum: 47847 messages received, 4096 bytes per write. RTT min/avg/max: 3226/354354/2989521 > cpghum: counters don't match. got 55191, expected 53275 from node 1 result: CRC errors occur after ===== > [root@virt-034 test]# rpm -q corosync > corosync-3.0.3-2.el8.x86_64 set the following options in corosync.conf in order to make it easier to trigger the bug: > token: 400 > token_coefficient: 0 start cluster: > [root@virt-034 test]# pcs status > Cluster name: STSRHTS12010 > Cluster Summary: > * Stack: corosync > * Current DC: virt-069 (version 2.0.3-4.el8-4b1f869f0f) - partition with quorum > * Last updated: Wed Jan 29 20:00:15 2020 > * Last change: Wed Jan 29 19:48:03 2020 by root via cibadmin on virt-034 > * 13 nodes configured > * 13 resource instances configured > > Node List: > * Online: [ virt-034 virt-056 virt-057 virt-058 virt-059 virt-060 virt-061 virt-062 virt-065 virt-068 virt-069 virt-070 virt-074 ] > > Full List of Resources: > * fence-virt-034 (stonith:fence_xvm): Started virt-034 > * fence-virt-056 (stonith:fence_xvm): Started virt-056 > * fence-virt-057 (stonith:fence_xvm): Started virt-057 > * fence-virt-058 (stonith:fence_xvm): Started virt-061 > * fence-virt-059 (stonith:fence_xvm): Started virt-062 > * fence-virt-060 (stonith:fence_xvm): Started virt-065 > * fence-virt-061 (stonith:fence_xvm): Started virt-069 > * fence-virt-062 (stonith:fence_xvm): Started virt-070 > * fence-virt-065 (stonith:fence_xvm): Started virt-074 > * fence-virt-068 (stonith:fence_xvm): Started virt-058 > * fence-virt-069 (stonith:fence_xvm): Started virt-060 > * fence-virt-070 (stonith:fence_xvm): Started virt-068 > * fence-virt-074 (stonith:fence_xvm): Started virt-059 > > Daemon Status: > corosync: active/disabled > pacemaker: active/disabled > pcsd: active/enabled start `cpghum` on all but one node, then `cpghum -f` on the remaining one: > [root@virt-034 test]# ./cpghum -f > 131289 messages received 64 bytes per write 10.000 Seconds runtime 13128.854 TP/s 0.840 MB/s RTT for this size (min/avg/max) 2473/155467/2129302 > 79488 messages received 320 bytes per write 10.010 Seconds runtime 7941.084 TP/s 2.541 MB/s RTT for this size (min/avg/max) 4630/164556/2555632 > cpghum: counters don't match. got 663, expected 660 from node 12 > cpghum: counters don't match. got 660, expected 658 from node 13 > cpghum: counters don't match. got 693, expected 692 from node 2 > cpghum: counters don't match. got 690, expected 689 from node 3 > cpghum: counters don't match. got 688, expected 687 from node 4 > cpghum: counters don't match. got 684, expected 683 from node 5 > cpghum: counters don't match. got 683, expected 682 from node 6 > cpghum: counters don't match. got 682, expected 679 from node 7 > cpghum: counters don't match. got 676, expected 672 from node 9 > cpghum: counters don't match. got 672, expected 669 from node 10 > cpghum: counters don't match. got 667, expected 663 from node 11 > 13931 messages received 1600 bytes per write 10.011 Seconds runtime 1391.575 TP/s 2.227 MB/s RTT for this size (min/avg/max) 2892/734898/6191690 > 8479 messages received 8000 bytes per write 10.020 Seconds runtime 846.194 TP/s 6.770 MB/s RTT for this size (min/avg/max) 4032/32442/195605 > cpghum: counters don't match. got 715, expected 713 from node 2 > cpghum: counters don't match. got 680, expected 678 from node 13 > cpghum: counters don't match. got 712, expected 710 from node 3 > cpghum: counters don't match. got 710, expected 708 from node 4 > cpghum: counters don't match. got 706, expected 704 from node 5 > cpghum: counters don't match. got 705, expected 703 from node 6 > cpghum: counters don't match. got 702, expected 700 from node 7 > cpghum: counters don't match. got 699, expected 697 from node 8 > cpghum: counters don't match. got 695, expected 693 from node 9 > cpghum: counters don't match. got 692, expected 690 from node 10 > cpghum: counters don't match. got 686, expected 684 from node 11 > cpghum: counters don't match. got 683, expected 681 from node 12 > 756 messages received 40000 bytes per write 10.001 Seconds runtime 75.595 TP/s 3.024 MB/s RTT for this size (min/avg/max) 4710/714697/5581197 > cpghum: counters don't match. got 695, expected 691 from node 12 > cpghum: counters don't match. got 692, expected 688 from node 13 > cpghum: counters don't match. got 727, expected 724 from node 2 > 51 messages received 200000 bytes per write 10.758 Seconds runtime 4.741 TP/s 0.948 MB/s RTT for this size (min/avg/max) 11723/1103590/10031384 > cpghum: counters don't match. got 725, expected 719 from node 3 > cpghum: counters don't match. got 723, expected 717 from node 4 > cpghum: counters don't match. got 719, expected 713 from node 5 > cpghum: counters don't match. got 718, expected 711 from node 6 > cpghum: counters don't match. got 714, expected 708 from node 7 > cpghum: counters don't match. got 711, expected 705 from node 8 > cpghum: counters don't match. got 707, expected 702 from node 9 > cpghum: counters don't match. got 705, expected 699 from node 10 > cpghum: counters don't match. got 698, expected 693 from node 11 > 127 messages received 1000000 bytes per write 10.005 Seconds runtime 12.694 TP/s 12.694 MB/s RTT for this size (min/avg/max) 109680/10589399/12703485 > > Stats: > packets sent: 233447 > send failures: 0 > send retries: 2871038 > length errors: 0 > packets recvd: 234121 > sequence errors: 35 > crc errors: 0 > min RTT: 2473 > max RTT: 12703486 > avg RTT: 164851 The flooding node prints sequence errors, which is expected due to a known bug in cpghum (see comment#16). Similar sequence errors appear on the listening nodes, eg.: > cpghum: 39948 messages received, 4096 bytes per write. RTT min/avg/max: 925/331992/17183573 > cpghum: counters don't match. got 216434, expected 216039 from node 1 > cpghum: 13271 messages received, 4096 bytes per write. RTT min/avg/max: 925/330413/17183573 > cpghum: counters don't match. got 232838, expected 232808 from node 1 > cpghum: 3445 messages received, 4096 bytes per write. RTT min/avg/max: 925/332751/17183573 > cpghum: 639 messages received, 4096 bytes per write. RTT min/avg/max: 925/336903/17183573 > cpghum: counters don't match. got 233420, expected 233419 from node 1 No CRC or other errors detected on any node. Marking verified in 3.0.3-2.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1674 |