Bug 507421 - Cluster flow control does not appear to be working properly.
Cluster flow control does not appear to be working properly.
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1.1
All Linux
high Severity high
: 1.3
: ---
Assigned To: Alan Conway
Tomas Rusnak
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-22 13:52 EDT by Alan Conway
Modified: 2010-10-14 12:02 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 12:02:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to verify read-credit on recent code. (2.79 KB, patch)
2010-10-05 13:30 EDT, Alan Conway
no flags Details | Diff
Patch to verify read-credit on revision 752560. (2.79 KB, patch)
2010-10-05 13:37 EDT, Alan Conway
no flags Details | Diff
Correct patch to verify read-credit on revision 752560. (2.79 KB, patch)
2010-10-06 09:03 EDT, Alan Conway
no flags Details | Diff

  None (edit)
Description Alan Conway 2009-06-22 13:52:56 EDT
Description of problem:

While investigating bug 507388 it appears that cluster flow control is not working properly. By adding log statements to cluster::Connection::decode and giveReadCredit I saw long runs of calls to decode without any credit. These are interleaved with runs where credit appears to be workig properly.
Comment 1 Alan Conway 2009-09-09 17:36:36 EDT
Cluster flow control fixed as of r813100
Comment 2 Alan Conway 2009-09-10 10:57:52 EDT
The fix to cluser flow control requires both of the following commits
813100 and 810591
Comment 4 ppecka 2010-06-07 10:52:43 EDT

Can term "long runs of calls" be specified with some integer approximation. 

does "credit" in "...decode without any credit..." mean, that line "giveReadCredit(1);" is actually not evaluated within qpid::cluster::Connection::decode?

Actually i've tried to check on described that behaviour with r810573 over gdb, started similar configuration as in bug 507388, and then set up break on line where giveReadCredit(1); is inside qpid::cluster::Connection::decode while running perftest --durable=true --size=8
During my last try i saw 914 calls of qpid::cluster::Connection::decode which never resulted in calling giveReadCredit(1);


otherwise when i checked with just one "break qpid::cluster::Connection::giveReadCredit"
then many calls of giveReadCredit method (1500254) were observed

Can you please clarify this issue?
Comment 5 Alan Conway 2010-06-08 08:33:12 EDT
Long runs means > --cluster-read-max which is 10 by default. I was seeing runs of 20-100 I believe. I demonstrated the problem by adding log or printf statements to giveReadCredit and decode. If flow control is working then this should always be true:

0 <= (number_of_decode - number_of_giveReadCredit) <= --cluster-read-max

Not sure what you were seeing in the debugger.
Comment 6 ppecka 2010-06-29 13:03:51 EDT
I'm still not able to see described behaviour, even when I rolled to little bit older version of qpid. 

qpidd-0.5.752581-3.el5
qpidc-rdma-0.5.752581-3.el5
qpidd-acl-0.5.752581-3.el5
qpidd-ssl-0.5.752581-3.el5
qpidc-perftest-0.5.752581-3.el5
qpid-java-qman-0.5.751061-1.el5
python-qpid-0.5.752581-1.el5
qpid-dotnet-0.4.738274-2.el5
qpidc-ssl-0.5.752581-3.el5
ruby-qpid-0.4.749380-2.el5
qpidd-cluster-0.5.752581-3.el5
qpid-java-client-0.5.751061-1.el5
qpidd-devel-0.5.752581-3.el5
qpidc-devel-0.5.752581-3.el5
qpidd-rdma-0.5.752581-3.el5
qpidc-debuginfo-0.5.752581-3.el5
qpidc-0.5.752581-3.el5
qpid-java-common-0.5.751061-1.el5
qpidd-xml-0.5.752581-3.el5

As i don't have such a deep knowledge about whole credit mechanism can you please attach Connection.cpp file containing, as you've said "log or printf statements to giveReadCredit and decode", to show how exactly this issue can be reproduced. I'm using the same reproducer as posted in bug 507388
Comment 8 Frantisek Reznicek 2010-10-05 04:16:28 EDT
Hi Alan,
could you possibly specify in more detail how you would like us to test that cluster flow control is working properly, please?
Petr spent some time ago more than week w/o success.

Thanks upfront.
Comment 9 Alan Conway 2010-10-05 13:30:54 EDT
Created attachment 451736 [details]
Patch to verify read-credit on recent code.
Comment 10 Alan Conway 2010-10-05 13:37:03 EDT
Created attachment 451738 [details]
Patch to verify read-credit on revision 752560.

How to verify this bug: Unfortunately I can't think of any way to verify this bug without instrumenting the code so I've attached 2 patches. 

Apply rccheck-752560.patch to revision 752560, start a clustered broker and run 
perftest --size 8 --durable

You should see a lot of debugging messages and an assertion failure: 

qpidd: ../../qpid/cpp/src/qpid/cluster/Connection.cpp:218: size_t qpid::cluster::Connection::decode(const char*, size_t): Assertion `cc >= 0' failed.

Now apply rccheck-trunk.patch to the version you want to verify and run perftest as before. You should see a lot of debugging messages but no assertion failure.

The patches are functionally the same, but the code has changed so much that  separate patch files are needed for the different revisions.
Comment 11 Tomas Rusnak 2010-10-06 08:23:13 EDT
I tried to compile qpid at revision 752560 with patch from attachment to reproduce the issue:

qpid/cluster/Connection.cpp: In member function 'void qpid::cluster::Connection::giveReadCredit(int)':
qpid/cluster/Connection.cpp:101: error: 'class qpid::cluster::Cluster' has no member named 'getSettings'
make[3]: *** [qpid/cluster/Connection.lo] Error 1

void Connection::giveReadCredit(int credit) {
    if (cluster.getReadMax() && credit) {
        // FIXME aconway 2010-10-05: TEST ONLY
        int cc = (readCreditCheck += credit);
        QPID_LOG(critical, "FIXME " << this << " giveReadCredit(" << credit << ") " << cc);
        assert(cc <= int(cluster.getSettings().readMax));

        output.giveReadCredit(credit);
    }
}

The getSettings method is missing in particular Class. Please could you take a look at it, and found the right revision which can be compiled with your patch?
Comment 12 Alan Conway 2010-10-06 09:03:04 EDT
Created attachment 451884 [details]
Correct patch to verify read-credit on revision 752560.

This is the correct patch for revision 752560, previous one was a mistake.
Comment 13 Tomas Rusnak 2010-10-08 05:50:07 EDT
Restested over current packages on RHEL5/x86 and x86_64 with:

qpid-cpp-server-store-0.7.946106-17.el5
qpid-tests-0.7.946106-1.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
python-qpid-0.7.946106-14.el5
qpid-java-common-0.7.946106-10.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-tools-0.7.946106-11.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5

No regression found.

>>> VERIFIED
Comment 14 Alan Conway 2010-10-12 15:18:54 EDT
Tech note not required, this problem was internally found & fixed there are no known cases of it causing a user-visible problem.
Comment 16 errata-xmlrpc 2010-10-14 12:02:10 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.