Bug 976861 - Possible parallel communication problem with 11+ nodes cluster [NEEDINFO]
Possible parallel communication problem with 11+ nodes cluster
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: luci (Show other bugs)
6.4
Unspecified Unspecified
low Severity low
: rc
: ---
Assigned To: Ryan McCabe
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-21 13:02 EDT by Jan Pokorný
Modified: 2017-11-07 16:40 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-07 16:40:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rsteiger: needinfo? (rmccabe)


Attachments (Terms of Use)

  None (edit)
Description Jan Pokorný 2013-06-21 13:02:09 EDT
Just noticed that send_batch_parallel function cannot currently cope with
11+ target nodes (provided with only send a single message during
the cluster communication round).

First 10 items of communication batch are send OK, but than the limit
of threads kicks in and the rest of items is silently ignored.

If my (and jrummy's) observation is correct, the algorithm needs to be
slightly extended to be robust enough (coping with at least 16 nodes
as supported is the very entry level fix here, universal one is better).
Comment 3 Jan Pokorný 2013-06-26 15:03:19 EDT
Ok, attachment 765707 [details] (of [bug 978479]) seems to prove [*] that no
end-point of the "multicast" is ever ignored regardless the threads limit.
The rest will simplt be proceeded in one of subsequent rounds until
the queue is empty.

Lowering the priority, but keeping this opened until final statement
is made.

[*] During that experiment, limit of threads was hardcoded as 3, however
the communication happened across 6 (later 8 nodes).  What can be observed
that the communication was split into several subsequent
rounds of 3 communication end-points at a time.
Comment 11 Chris Feist 2017-11-07 16:40:39 EST
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.