Bug 459152 - [RFE] add asynchronous version of mcast_joined to CPG
[RFE] add asynchronous version of mcast_joined to CPG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync (Show other bugs)
All Linux
medium Severity medium
: beta
: ---
Assigned To: Angus Salkeld
Cluster QE
: FutureFeature, Reopened
Depends On:
  Show dependency treegraph
Reported: 2008-08-14 14:44 EDT by Alan Conway
Modified: 2016-04-26 11:02 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-02-15 11:58:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
modified cpg.c to make mcast asynchrnous. (18.84 KB, text/x-csrc)
2008-08-14 14:44 EDT, Alan Conway
no flags Details

  None (edit)
Description Alan Conway 2008-08-14 14:44:33 EDT
Created attachment 314341 [details]
modified cpg.c to make mcast asynchrnous.

Description of problem:

cpg_mcast_joined is a synchronous call, i.e. the caller has to wait for the IPC round-trip to the aisexec daemon before proceeding. An asynchronous call from the client library would give much better throughput. This is to support the MRG project in order to use CPG as its replication protocol. Further work needed to determine the extent of the benefit, but inital experiment below showed a threefold speedup.

I hacked lib/cpg.c (attached) to ignore the response in mcast_joined and
instead retrieve and discard them in the dispatch poll. This almost
triples throughput of the cpgbench test on my laptop.

I'm not proposing this hack as an implementation, it was just to get an
idea of possible performance impact. At least the following things are
wrong with it:
 - async_mcast should be a new API, existing sync mcast should remain.
 - wrong locking around response_fd
 - polling 2 fd's in dispatch so cpg_fd_get() is no longer sufficient. 
 - cpgbench goes into flow control quickly (not surprising) but then
never gets out, I think I missed something about resetting flow control.

A neater solution might be to drop the responses at the daemon for
successful mcasts and send an error response to the dispatch_fd for
failed mcasts so there's only 1 fd that needs to be polled. Flow control changes as a result of async mcasts would also need to be sent to dispatch_fd.

Version-Release number of selected component (if applicable):


Steps to Reproduce:

Run cpgbench against normal AIS and with the attached changes to see the throughput difference. I saw almost a factor of 3 improvement.

Additional info:
Comment 1 Christine Caulfield 2009-01-19 10:39:32 EST
Forward to Steve as he is working on the new IPC layer that is needed to support this.
Comment 2 Steven Dake 2009-03-18 17:36:50 EDT
reassigning to rhel6 since that is the new upstream version for corosync.  If you would like this in a rhel5 release please open a separate bz.

Comment 3 RHEL Product and Program Management 2009-06-15 16:56:57 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
Comment 6 Alan Conway 2010-01-13 10:31:55 EST
It would mean a performance improvement for MRG but it's not a critical requirement in the short term.
Comment 8 Steven Dake 2010-03-01 04:13:02 EST
Honza asked to implement this feature.

Comment 10 RHEL Product and Program Management 2010-03-01 04:22:29 EST
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.