Bug 692237

Summary: [RFE] disambiguate cpg_dispatch timeout behavior from function behavior enhacement
Product: [Retired] Corosync Cluster Engine Reporter: dan clark <2clarkd>
Component: unknownAssignee: Jan Friesse <jfriesse>
Status: CLOSED UPSTREAM QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 1.3CC: 2clarkd, agk, asalkeld, cluster-maint, fdinitto, jfriesse, sdake
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 700517 (view as bug list) Environment:
Last Closed: 2012-02-08 15:06:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 700517    
Attachments:
Description Flags
Proposed patch for add of CS_DISPATCH_ONE_NONBLOCKING dispatch type none

Description dan clark 2011-03-30 18:54:51 UTC
Description of problem:
The cpg_dispatch routine overloads the 'dispatch_types' field to both provide the type of call DISPATCH_ALL vs DISPATCH_ONE or DISPATCH_BLOCK and the timeout behavior (-1 implying wait forever OR 0 implying return immediately).   The code currently blocks on both DISPATCH_ONE and DISPATCH_BLOCK but returns immediately for DISPATCH_ALL (unlike the comment).  

Version-Release number of selected component (if applicable):
3.0


How reproducible:
always

Steps to Reproduce:
1. use cpg_dispatch(handle, CS_DISPATCH_ONE) 

  
Actual results:
actual, block forever


Expected results:
return immediately

Additional info:
ideally a dispatch function would allow the behavior to be explicitly defined by the application instead of overloading the parameters
cpg_dispatch_time(handle, CPG_DISPATCH_ONE, timeout)

short of this it would be fine to have the code correspond with the comments and the perhaps intended functionality in the library:
current:
    /*
     * Timeout instantly for CS_DISPATCH_ONE or CS_DISPATCH_ALL and
     * wait indefinately for CS_DISPATCH_BLOCKING
     */
    if (dispatch_types == CPG_DISPATCH_ALL) {
        timeout = 0;
    }

patch
<    if (dispatch_types == CPG_DISPATCH_ALL) {
>    if (dispatch_types != CPG_DISPATCH_BLOCKING) {

Comment 3 dan clark 2011-04-11 16:39:03 UTC
The cpg_dispatch routine masks the return values when used in the non-blocking mode.  Ideally in a non-blocking mode the application could differentiate between a dispatch successfully occurring, especially in the DISPATCH_ONE case, and thus triggering the need to check again to see if there are more messages to process, and the case when there are no more messages to process, thus it is OK to register the FD associated with the connection for inbound messages in epoll and sleep for future events associated with corosync (or other aspects of the application).  I believe this problem occurs due to the code in the library which over-writes the underlying error of CS_ERR_TRY_AGAIN with CS_OK.
        if (error == CS_ERR_TRY_AGAIN) {
<            error = CS_OK;
            if (dispatch_types == CPG_DISPATCH_ALL) {
                break; /* exit do while cont is 1 loop */
            } else {
                continue; /* next poll */
            }
        }

Comment 5 dan clark 2011-05-23 18:40:40 UTC
can this be addressed in the next release?  Could there be some dialog about the future of the interface?  At present it is difficult to appropriately test for new messages.  This is additionally complicated by the fact that 'group change' messages are handled with the same dispatch routine as are message exchanges.  It would be optimal to allow filtering of the dispatch by type to allow higher level layers to control the flow of messages thus providing copy free semantics back into the application domain.

Comment 6 dan clark 2011-05-23 18:41:09 UTC
can this be re-categorized to fedora?

Comment 7 Jan Friesse 2011-06-28 13:42:24 UTC
*** Bug 700517 has been marked as a duplicate of this bug. ***

Comment 8 Steven Dake 2011-07-21 20:37:34 UTC
Dan,

Earliest we could address this issue is corosync 2.0.  corosync 1.y is frozen.

Comment 9 Jan Friesse 2012-02-08 14:20:54 UTC
Created attachment 560271 [details]
Proposed patch for add of CS_DISPATCH_ONE_NONBLOCKING dispatch type

Dan,
I believe this bug has two parts:
1.) Incorrect comments. In reality, CS_DISPATCH_ONE (as correctly described in manpages) is BLOCKING. Comment was incorrect and included patch fixes that
2.) No non-blocking variant of CS_DISPATCH_ONE exists. This is what this patch is all about.

I hope this patch will fulfill your needs.