Bug 976861

Summary: Possible parallel communication problem with 11+ nodes cluster
Product: Red Hat Enterprise Linux 6 Reporter: Jan Pokorný [poki] <jpokorny>
Component: luciAssignee: Ryan McCabe <rmccabe>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: low    
Version: 6.4CC: cfeist, cluster-maint, fdinitto, jruemker, rmccabe, rsteiger, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 21:40:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Pokorný [poki] 2013-06-21 17:02:09 UTC
Just noticed that send_batch_parallel function cannot currently cope with
11+ target nodes (provided with only send a single message during
the cluster communication round).

First 10 items of communication batch are send OK, but than the limit
of threads kicks in and the rest of items is silently ignored.

If my (and jrummy's) observation is correct, the algorithm needs to be
slightly extended to be robust enough (coping with at least 16 nodes
as supported is the very entry level fix here, universal one is better).

Comment 3 Jan Pokorný [poki] 2013-06-26 19:03:19 UTC
Ok, attachment 765707 [details] (of [bug 978479]) seems to prove [*] that no
end-point of the "multicast" is ever ignored regardless the threads limit.
The rest will simplt be proceeded in one of subsequent rounds until
the queue is empty.

Lowering the priority, but keeping this opened until final statement
is made.

[*] During that experiment, limit of threads was hardcoded as 3, however
the communication happened across 6 (later 8 nodes).  What can be observed
that the communication was split into several subsequent
rounds of 3 communication end-points at a time.

Comment 11 Chris Feist 2017-11-07 21:40:39 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Comment 12 Red Hat Bugzilla 2023-09-14 01:46:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days