Bug 726733 - qpid-perftest hangs in high thread-count scenarios
Summary: qpid-perftest hangs in high thread-count scenarios
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: Leonid Zhaldybin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-29 16:05 UTC by Kim van der Riet
Modified: 2014-11-09 22:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-07 17:41:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
qpid-stat for another case of mis-sent messages. (2.25 KB, text/plain)
2011-07-29 17:34 UTC, Kim van der Riet
no flags Details

Description Kim van der Riet 2011-07-29 16:05:04 UTC
Running qpid-perftest in high thread-count scenarios will result in intermittent hanging of the test. The probability of hanging increases with threads.

Version: Trunk (1152240/4468)

Broker (running on blade, mrg43; /data is mounted Fusion-io SSD):

rm -rf /data/store/*; ./qpidd --auth no --load-module /home/kpvdr/mrg/store.ref/lib/.libs/msgstore.so --store-dir /data/store --jfile-size 512 --num-jfiles 32 --log-enable info+

Client (running on blade, mrg42):

./qpid-perftest -b 20.0.10.43 -s --iterations 5 --count 500000 --durable no --npubs 1 --qt 20 --nsubs 1

This will start to hang at a qt count of 8 and above. Setting --npubs and --nsubs to a higher value will increase the problem significantly.

Once the test has hung, stack traces of both the broker and client show normal patterns.

Checking the queue with qpid-stat after a hang shows that the messages bound for queue qpid-perftest17 ended up on queue qpid-perftest12 instead:

[kpvdr@mrg42 store.ref]$ /home/kpvdr/mrg/qpid.ref/tools/src/py/qpid-stat -b 20.0.10.43 -q
Queues
  queue                                        dur  autoDel  excl  msg    msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  =============================================================================================================================
  qpid-perftest17                                                     0      0      0       0      0        0         1     1
  qpid-perftest16                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest15                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest14                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest13                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest12                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest11                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest10                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest_pub_done                                              0     20     20       0    349      349         1     1
  qpid-perftest19                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest18                                                     0    500k   500k      0    512m     512m        1     1
  qpid-perftest_sub_iteration                                         0      0      0       0      0        0        20     1
  qmfc-v2-ui-mrg42.lab.bos.redhat.com.20322.1       Y        Y        0      0      0       0      0        0         1     1
  topic-mrg42.lab.bos.redhat.com.20322.1            Y        Y        0      0      0       0      0        0         1     4
  qmfc-v2-mrg42.lab.bos.redhat.com.20322.1          Y        Y        0     11     11       0   72.5k    72.5k        1     2
  qpid-perftest_sub_done                                              0     19     19       0    342      342         1     1
  reply-mrg42.lab.bos.redhat.com.20322.1            Y        Y        0     58     58       0   22.4k    22.4k        1     2
  qpid-perftest9                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest8                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest7                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest6                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest5                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest4                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest3                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest2                                                    500k  1.00m   500k    512m  1.02g     512m        1     1
  qpid-perftest1                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest0                                                      0    500k   500k      0    512m     512m        1     1
  qpid-perftest_pub_start                                             0     20     20       0    100      100        20     1
  qmfc-v2-hb-mrg42.lab.bos.redhat.com.20322.1       Y        Y        0      0      0       0      0        0         1     2
  qpid-perftest_sub_ready                                             0     20     20       0    100      100         1     1


Could this be a race condition in which the publisher destination is being muddled/overwritten somehow?

Comment 1 Kim van der Riet 2011-07-29 16:07:47 UTC
From Description above: 

Checking the queue with qpid-stat after a hang shows that the messages bound for queue qpid-perftest17 ended up on queue qpid-perftest12 instead:
                                        ^^^^^^^^^^^^^^^

This _should_ be:

Checking the queue with qpid-stat after a hang shows that the messages bound for queue qpid-perftest17 ended up on queue qpid-perftest2 instead:

Comment 2 Kim van der Riet 2011-07-29 17:34:49 UTC
Created attachment 515913 [details]
qpid-stat for another case of mis-sent messages.

Additional example in which 2 queues had their messages misplaced in the same test: both qpid-perftest4 and qpid-perftest5 had their 500k messages sent to qpid-perftest0 and qpid-perftest2.

Comment 3 Kim van der Riet 2011-08-03 18:58:18 UTC
Fixed by Gordon r.1152825

Comment 4 Ted Ross 2012-03-29 19:58:55 UTC
This is in the 0.14 rebase

Comment 5 Leonid Zhaldybin 2012-04-06 13:18:38 UTC
CLOSED/CRELEASE -> ASSIGNED -> ON_QA
The defect has to go through QA process.

Comment 6 Leonid Zhaldybin 2012-04-06 13:20:00 UTC
Tested on RHEL5.8 and RHEL6.2 on both main architectures (i386 and
x86_64). This problem was fixed.
Packages used for testing:

RHEL5.8
qpid-cpp-client-0.14-14.el5
qpid-cpp-client-devel-0.14-14.el5
qpid-cpp-client-devel-docs-0.14-14.el5
qpid-cpp-client-ssl-0.14-14.el5
qpid-cpp-server-0.14-14.el5
qpid-cpp-server-cluster-0.14-14.el5
qpid-cpp-server-devel-0.14-14.el5
qpid-cpp-server-ssl-0.14-14.el5
qpid-cpp-server-store-0.14-14.el5
qpid-cpp-server-xml-0.14-14.el5

RHEL6.2
qpid-cpp-client-0.14-14.el6_2
qpid-cpp-client-devel-0.14-14.el6_2
qpid-cpp-client-devel-docs-0.14-14.el6_2
qpid-cpp-client-rdma-0.14-14.el6_2
qpid-cpp-client-ssl-0.14-14.el6_2
qpid-cpp-debuginfo-0.14-14.el6_2
qpid-cpp-server-0.14-14.el6_2
qpid-cpp-server-cluster-0.14-14.el6_2
qpid-cpp-server-devel-0.14-14.el6_2
qpid-cpp-server-rdma-0.14-14.el6_2
qpid-cpp-server-ssl-0.14-14.el6_2
qpid-cpp-server-store-0.14-14.el6_2
qpid-cpp-server-xml-0.14-14.el6_2
rh-qpid-cpp-tests-0.14-14.el6_2

-> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.