Red Hat Bugzilla – Bug 247733
bug in flow control accounting can freeze cluster communication IO
Last modified: 2016-04-26 12:32:42 EDT
Description of problem:
Totem has a queue. The queue has a fixed size entries available for queueing
incoming messaqge segments. A determination is made on acceptance of the
message whether it should be flow controlled, or whether it should be sent to
the executive handler for processing. The determination is made using an
estimation based upon the library request size. If the executive message size
is larger then the library request size, sometimes totem will be unable to queue
the message because it does not have suitable room in its queue for the "real"
message once it is generated for transmission even though it has already been
accepted for transmission.
In the particular case if a message size of 4035 which requires 4 queue entries
is sent repeatidly with cpgbench while the queue has 3 entries, aisexec will
send the request to the executive handler instead of rejecting the message back
to the user with a try again error code. Then when the real message is queued
with totem, it takes up 4 spots in the queue when only 3 are available. Totem
protects itself from memory corruption by not allowing this queue operation to
occur and returns an error code. All services in aisexec assert when this error
code is returned since it should never be returned, EXCEPT cpg which increments
the outstanding reference count for flow control. Then the message is never
delivered which reduces the reference count for flow control. As a result, the
flow control value used to determine when to shut off incoming requests reaches
the shutoff point, but is never decremented back to the turn on point, resulting
in no new messages being queued into totem. The net result is a complete IO
lockup for the CPG service for various sizes of messages (and possibly
assertions for other services). Note totem implements message packing so any
number of normal requests could potentially generate this scenario.
Version-Release number of selected component (if applicable):
hard to reproduce with RHCS however modified "cpgbench" can recreate problem in
my test network with specific byte sizes 100% of the time.
Steps to Reproduce:
1. modify cpgbench to 4035 byte message size
2. run on 2 node gige cluster with netmtu of 8800 using jumbo frame gige
Flow control is enabled for IPC but never disabled
Flow control should be enabled but then disabled later once the server's output
queue has emptied sufficiently.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.