682771 – RFE: remove 1M message size limit

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 682771 - RFE: remove 1M message size limit

Summary: RFE: remove 1M message size limit

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	7.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Christine Caulfield
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	975903
Blocks:	1133060 1174884 1205796 1251103
TreeView+	depends on / blocked

Reported:	2011-03-07 14:50 UTC by Florian Haas
Modified:	2019-06-13 07:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:	corosync-2.3.4-6.el7
Doc Type:	Enhancement
Doc Text:	Feature: The maximum size of a message that could be transferred using corosync CPG messaging facility was previously limited to 1MB. This limit has now been lifted. Reason: Pacemaker uses corosync CPG messaging to communicate changes in cluster state, and with larger numbers of resources this amount of information could get quite large and, even with data compression, exceed the maximum size allowed by corosync. Result: There is now no limit on the size of the data packets sent using CPG messaging in corosync. It is still necessary to configure pacemaker in /etc/sysconfig/pacemaker to allow larger messages to be sent.
Clone Of:
Clones:	975903 (view as bug list)
Environment:
Last Closed:	2015-11-19 11:41:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cpg: Add support for messages larger than 1Mb (23.08 KB, patch) 2015-04-01 13:16 UTC, Jan Friesse	no flags	Details \| Diff
Really add cpghum (13.01 KB, patch) 2015-06-22 08:28 UTC, Jan Friesse	no flags	Details \| Diff
Don't link with libz when not needed (2.60 KB, patch) 2015-06-22 14:05 UTC, Jan Friesse	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:2354	0	normal	SHIPPED_LIVE	corosync bug fix and enhancement update	2015-11-19 10:28:00 UTC

Description Florian Haas 2011-03-07 14:50:39 UTC

Description of problem:
IIUC, Corosync currently has a 1M message size limit due to a hardcoded default in TOTEM buffer allocation. This may become a problem as Pacemaker clusters become more complex, with cluster sizes upward of 16 nodes and CIBs exceeding perhaps dozens of resources.

Version-Release number of selected component (if applicable):
1.2.3

Expected results:

Make the message size limit configurable, or (if this is possible) remove the hard limit altogether.

Comment 3 Steven Dake 2011-03-24 17:40:19 UTC

The client->server ipc portion of this RFE could be addressed by using the zero-copy feature to allocate buffers when the requested buffer size is greater then 1MB (and then do a memcpy).  From server to client, an additional message type could be added to indicate the buffer is a freshly mmapped buffer needing special attention by the dispatch code.  The totempg code could then have a memory allocation that takes place if a new message is received that will be larger then 1MB.  All sounds pretty complicated though and prone to breakage.

Do you have customers that have run into this limit?

Regards
-steve

Comment 4 Steven Dake 2011-03-24 17:41:04 UTC

Angus, please comment on how this RFE would be achieved in the libqb corosync 2.0+ case.

Comment 5 Angus Salkeld 2011-03-26 11:23:53 UTC

Are you sending XML text? Is it possible to compress the text
(it should compress well)?

Another option is to automatically fragment the message between
client and server. I'de need to have a look into a bit more though.

Comment 6 Andrew Beekhof 2011-04-13 07:42:56 UTC

It is XML that is being sent and we do compress it already.
However the status section can get really big so hitting the limit is still conceivable.

I don't think we necessarily need to remove the limit completely, just allow it to be tuned from corosync.conf (_before_ startup) by those that find it necessary.  

This would have the nice property of also allowing it to be tuned down, thus lowering corosync's memory footprint in situations not needing large messages.

Comment 9 Steven Dake 2011-08-10 19:24:12 UTC

Will propose as a 2.0 feature (rhel7 timeframe).

Comment 12 Jan Friesse 2013-06-19 14:57:41 UTC

IPC is now handled by LibQB. According to https://github.com/asalkeld/libqb/issues/14, that problem still exists. There is also another problem https://github.com/asalkeld/libqb/issues/71. After removing these two issues, support in corosync should be seamless. Cloning this bug. This bug will be used for corosync and cloned one Bug 975903 for LibQB.

Comment 16 Christine Caulfield 2015-03-06 08:40:44 UTC

commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd
Author: Christine Caulfield <ccaulfie>
Date:   Thu Mar 5 16:45:15 2015 +0000

    cpg: Add support for messages larger than 1Mb

Comment 18 Jan Friesse 2015-04-01 13:16:35 UTC

Created attachment 1009656 [details]
cpg: Add support for messages larger than 1Mb

cpg: Add support for messages larger than 1Mb

If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.

cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.

The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.

New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.

Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>

Comment 25 Jan Friesse 2015-06-22 08:28:19 UTC

Created attachment 1041624 [details]
Really add cpghum

Really add cpghum

Signed-off-by: Jan Friesse <jfriesse>

Comment 27 Jan Friesse 2015-06-22 14:05:07 UTC

Created attachment 1041839 [details]
Don't link with libz when not needed

Don't link with libz when not needed

Commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd added check for libz
resulting in linking with lib z for all libraries. This is not expected
behavior. Patch solves it by making defining automake conditional so
cpghum is linked only if libz is available and LIBS variable is not
modified at all.

Signed-off-by: Jan Friesse <jfriesse>

Comment 28 Nate Straz 2015-08-14 16:02:34 UTC

I'm not able to get through the test case David used in 
 bug 1174462 comment 8.  Is there a configuration change that's needed too?


[root@host-026 ~]# for x in `seq 1 40`; do pcs resource create FAKE$x Dummy meta target-role=Stopped fake="`openssl rand -hex 32000`" || break; echo $x done; done
1 done
2 done
3 done
4 done
Error: unable to get cib
Error: unable to get cib
[root@host-026 ~]# tail /var/log/messages -n 30
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Initiating action 16: monitor FAKE2_monitor_0 on host-027
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Initiating action 14: monitor FAKE2_monitor_0 on host-026 (local)
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Initiating action 15: probe_complete probe_complete-host-027 on host-027 - no waiting
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Initiating action 17: probe_complete probe_complete-host-028 on host-028 - no waiting
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Operation FAKE2_monitor_0: not running (node=host-026, call=43, rc=7, cib-update=479, confirmed=true)
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Initiating action 13: probe_complete probe_complete-host-026 on host-026 (local) - no waiting
Aug 14 10:59:31 host-026 crmd[13434]:  notice: Transition 381 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete
Aug 14 10:59:31 host-026 crmd[13434]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 14 10:59:32 host-026 cibadmin[1527]:  notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe
Aug 14 10:59:32 host-026 crmd[13434]:  notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Aug 14 10:59:32 host-026 pengine[13433]:  notice: Calculated Transition 382: /var/lib/pacemaker/pengine/pe-input-53.bz2
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 18: monitor FAKE3_monitor_0 on host-028
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 16: monitor FAKE3_monitor_0 on host-027
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 14: monitor FAKE3_monitor_0 on host-026 (local)
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 15: probe_complete probe_complete-host-027 on host-027 - no waiting
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 17: probe_complete probe_complete-host-028 on host-028 - no waiting
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Operation FAKE3_monitor_0: not running (node=host-026, call=47, rc=7, cib-update=481, confirmed=true)
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Initiating action 13: probe_complete probe_complete-host-026 on host-026 (local) - no waiting
Aug 14 10:59:32 host-026 crmd[13434]:  notice: Transition 382 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-53.bz2): Complete
Aug 14 10:59:32 host-026 crmd[13434]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 14 10:59:32 host-026 cibadmin[1547]:  notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe
Aug 14 10:59:33 host-026 cib[13429]:   error: Compression of 329080 bytes failed: output data will not fit into the buffer provided (-8)
Aug 14 10:59:33 host-026 cib[13429]:   error: Could not compress the message into less than the configured ipc limit (131072 bytes).Set PCMK_ipc_buffer to a higher value (658160 bytes suggested)
Aug 14 10:59:33 host-026 cib[13429]:  notice: Notification failed: Message too long (-90)
Aug 14 10:59:33 host-026 cib[13429]:   error: Compression of 286029 bytes failed: output data will not fit into the buffer provided (-8)
Aug 14 10:59:33 host-026 cib[13429]:   error: Could not compress the message into less than the configured ipc limit (131072 bytes).Set PCMK_ipc_buffer to a higher value (1316320 bytes suggested)
Aug 14 10:59:33 host-026 cib[13429]:  notice: Message to 0x18fdd00[1551] failed: Message too long (-90)
Aug 14 10:59:33 host-026 cib[13429]: warning: A-Sync reply to cibadmin failed: No message of desired type
Aug 14 11:00:01 host-026 systemd: Started Session 1541 of user root.
Aug 14 11:00:01 host-026 systemd: Starting Session 1541 of user root.

[root@host-026 ~]# rpm -q pacemaker corosync libqb
pacemaker-1.1.13-6.el7.x86_64
corosync-2.3.4-7.el7.x86_64
libqb-0.17.1-2.el7.x86_64

Comment 29 Nate Straz 2015-08-14 20:14:28 UTC

I found the setting in /etc/sysconfig/pacemaker to adjust the IPC buffer and set it to 2MB.  This allowed me to get through the test cases in bug 1174462 comment 8 and 9.

Comment 30 errata-xmlrpc 2015-11-19 11:41:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2354.html

Note You need to log in before you can comment on or make changes to this bug.