Hide Forgot
Description of problem: IIUC, Corosync currently has a 1M message size limit due to a hardcoded default in TOTEM buffer allocation. This may become a problem as Pacemaker clusters become more complex, with cluster sizes upward of 16 nodes and CIBs exceeding perhaps dozens of resources. Version-Release number of selected component (if applicable): 1.2.3 Expected results: Make the message size limit configurable, or (if this is possible) remove the hard limit altogether.
The client->server ipc portion of this RFE could be addressed by using the zero-copy feature to allocate buffers when the requested buffer size is greater then 1MB (and then do a memcpy). From server to client, an additional message type could be added to indicate the buffer is a freshly mmapped buffer needing special attention by the dispatch code. The totempg code could then have a memory allocation that takes place if a new message is received that will be larger then 1MB. All sounds pretty complicated though and prone to breakage. Do you have customers that have run into this limit? Regards -steve
Angus, please comment on how this RFE would be achieved in the libqb corosync 2.0+ case.
Are you sending XML text? Is it possible to compress the text (it should compress well)? Another option is to automatically fragment the message between client and server. I'de need to have a look into a bit more though.
It is XML that is being sent and we do compress it already. However the status section can get really big so hitting the limit is still conceivable. I don't think we necessarily need to remove the limit completely, just allow it to be tuned from corosync.conf (_before_ startup) by those that find it necessary. This would have the nice property of also allowing it to be tuned down, thus lowering corosync's memory footprint in situations not needing large messages.
Will propose as a 2.0 feature (rhel7 timeframe).
IPC is now handled by LibQB. According to https://github.com/asalkeld/libqb/issues/14, that problem still exists. There is also another problem https://github.com/asalkeld/libqb/issues/71. After removing these two issues, support in corosync should be seamless. Cloning this bug. This bug will be used for corosync and cloned one Bug 975903 for LibQB.
commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd Author: Christine Caulfield <ccaulfie> Date: Thu Mar 5 16:45:15 2015 +0000 cpg: Add support for messages larger than 1Mb
Created attachment 1009656 [details] cpg: Add support for messages larger than 1Mb cpg: Add support for messages larger than 1Mb If a cpg client sends a message larger than 1Mb (actually slightly less to allow for internal buffers) cpg will now fragment that into several corosync messages before sending it around the ring. cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the cpg membership was disrupted during the send operation and the message needs to be resent. The new API call cpg_max_atomic_msgsize_get() returns the maximum size of a message that will not be fragmented internally. New test program cpghum was written to stress test this functionality, it checks message integrity and order of receipt. Signed-off-by: Christine Caulfield <ccaulfie> Reviewed-by: Jan Friesse <jfriesse>
Created attachment 1041624 [details] Really add cpghum Really add cpghum Signed-off-by: Jan Friesse <jfriesse>
Created attachment 1041839 [details] Don't link with libz when not needed Don't link with libz when not needed Commit 8cc8e513633a1a8b12c416e32fb5362fcf4d65dd added check for libz resulting in linking with lib z for all libraries. This is not expected behavior. Patch solves it by making defining automake conditional so cpghum is linked only if libz is available and LIBS variable is not modified at all. Signed-off-by: Jan Friesse <jfriesse>
I'm not able to get through the test case David used in bug 1174462 comment 8. Is there a configuration change that's needed too? [root@host-026 ~]# for x in `seq 1 40`; do pcs resource create FAKE$x Dummy meta target-role=Stopped fake="`openssl rand -hex 32000`" || break; echo $x done; done 1 done 2 done 3 done 4 done Error: unable to get cib Error: unable to get cib [root@host-026 ~]# tail /var/log/messages -n 30 Aug 14 10:59:31 host-026 crmd[13434]: notice: Initiating action 16: monitor FAKE2_monitor_0 on host-027 Aug 14 10:59:31 host-026 crmd[13434]: notice: Initiating action 14: monitor FAKE2_monitor_0 on host-026 (local) Aug 14 10:59:31 host-026 crmd[13434]: notice: Initiating action 15: probe_complete probe_complete-host-027 on host-027 - no waiting Aug 14 10:59:31 host-026 crmd[13434]: notice: Initiating action 17: probe_complete probe_complete-host-028 on host-028 - no waiting Aug 14 10:59:31 host-026 crmd[13434]: notice: Operation FAKE2_monitor_0: not running (node=host-026, call=43, rc=7, cib-update=479, confirmed=true) Aug 14 10:59:31 host-026 crmd[13434]: notice: Initiating action 13: probe_complete probe_complete-host-026 on host-026 (local) - no waiting Aug 14 10:59:31 host-026 crmd[13434]: notice: Transition 381 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-52.bz2): Complete Aug 14 10:59:31 host-026 crmd[13434]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Aug 14 10:59:32 host-026 cibadmin[1527]: notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe Aug 14 10:59:32 host-026 crmd[13434]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Aug 14 10:59:32 host-026 pengine[13433]: notice: Calculated Transition 382: /var/lib/pacemaker/pengine/pe-input-53.bz2 Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 18: monitor FAKE3_monitor_0 on host-028 Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 16: monitor FAKE3_monitor_0 on host-027 Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 14: monitor FAKE3_monitor_0 on host-026 (local) Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 15: probe_complete probe_complete-host-027 on host-027 - no waiting Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 17: probe_complete probe_complete-host-028 on host-028 - no waiting Aug 14 10:59:32 host-026 crmd[13434]: notice: Operation FAKE3_monitor_0: not running (node=host-026, call=47, rc=7, cib-update=481, confirmed=true) Aug 14 10:59:32 host-026 crmd[13434]: notice: Initiating action 13: probe_complete probe_complete-host-026 on host-026 (local) - no waiting Aug 14 10:59:32 host-026 crmd[13434]: notice: Transition 382 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-53.bz2): Complete Aug 14 10:59:32 host-026 crmd[13434]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Aug 14 10:59:32 host-026 cibadmin[1547]: notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe Aug 14 10:59:33 host-026 cib[13429]: error: Compression of 329080 bytes failed: output data will not fit into the buffer provided (-8) Aug 14 10:59:33 host-026 cib[13429]: error: Could not compress the message into less than the configured ipc limit (131072 bytes).Set PCMK_ipc_buffer to a higher value (658160 bytes suggested) Aug 14 10:59:33 host-026 cib[13429]: notice: Notification failed: Message too long (-90) Aug 14 10:59:33 host-026 cib[13429]: error: Compression of 286029 bytes failed: output data will not fit into the buffer provided (-8) Aug 14 10:59:33 host-026 cib[13429]: error: Could not compress the message into less than the configured ipc limit (131072 bytes).Set PCMK_ipc_buffer to a higher value (1316320 bytes suggested) Aug 14 10:59:33 host-026 cib[13429]: notice: Message to 0x18fdd00[1551] failed: Message too long (-90) Aug 14 10:59:33 host-026 cib[13429]: warning: A-Sync reply to cibadmin failed: No message of desired type Aug 14 11:00:01 host-026 systemd: Started Session 1541 of user root. Aug 14 11:00:01 host-026 systemd: Starting Session 1541 of user root. [root@host-026 ~]# rpm -q pacemaker corosync libqb pacemaker-1.1.13-6.el7.x86_64 corosync-2.3.4-7.el7.x86_64 libqb-0.17.1-2.el7.x86_64
I found the setting in /etc/sysconfig/pacemaker to adjust the IPC buffer and set it to 2MB. This allowed me to get through the test cases in bug 1174462 comment 8 and 9.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2354.html