Bug 976861 - Possible parallel communication problem with 11+ nodes cluster [NEEDINFO]
Possible parallel communication problem with 11+ nodes cluster
Status: ASSIGNED
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: luci (Show other bugs)
6.4
Unspecified Unspecified
low Severity low
: rc
: ---
Assigned To: Ryan McCabe
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-21 13:02 EDT by Jan Pokorný
Modified: 2017-06-01 04:51 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rsteiger: needinfo? (rmccabe)


Attachments (Terms of Use)

  None (edit)
Description Jan Pokorný 2013-06-21 13:02:09 EDT
Just noticed that send_batch_parallel function cannot currently cope with
11+ target nodes (provided with only send a single message during
the cluster communication round).

First 10 items of communication batch are send OK, but than the limit
of threads kicks in and the rest of items is silently ignored.

If my (and jrummy's) observation is correct, the algorithm needs to be
slightly extended to be robust enough (coping with at least 16 nodes
as supported is the very entry level fix here, universal one is better).
Comment 3 Jan Pokorný 2013-06-26 15:03:19 EDT
Ok, attachment 765707 [details] (of [bug 978479]) seems to prove [*] that no
end-point of the "multicast" is ever ignored regardless the threads limit.
The rest will simplt be proceeded in one of subsequent rounds until
the queue is empty.

Lowering the priority, but keeping this opened until final statement
is made.

[*] During that experiment, limit of threads was hardcoded as 3, however
the communication happened across 6 (later 8 nodes).  What can be observed
that the communication was split into several subsequent
rounds of 3 communication end-points at a time.

Note You need to log in before you can comment on or make changes to this bug.