Bug 473936 - perftest hanging during durable perftest
perftest hanging during durable perftest
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
medium Severity medium
: 1.1
: ---
Assigned To: messaging-bugs
Kim van der Riet
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-01 09:54 EST by Frantisek Reznicek
Modified: 2015-11-15 19:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-03 08:49:30 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2008-12-01 09:54:47 EST
Description of problem:
While running perftest RHTS tests I can see strange perftest behavior.

perftest hangs one of first perftest<--->qpidd sessions.

Rapidly reproducable on my RHEL5.2 x86_64 using qpid_test_broker_perftest_ecte_bz468428 RHTS test (http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG_Messaging/qpid_test_broker_perftest_ecte_bz468428/).

Part A loops following command:
PERFTEST_TEST_PARAMS="--log-enable info+"
perftest --durable yes --tcp-nodelay ${PERFTEST_TEST_PARAMS} \
    --npubs 4 --nsubs 4 --sub-tx 10



Version-Release number of selected component (if applicable):
rhm-0.3.2804-4.el5, qpidd-0.3.719671-1.el5, qpidc-perftest-0.3.719671-1.el5


How reproducible:
95-100% on quad code Intel(R) Xeon(R) CPU X5460  @ 3.16GHz

Steps to Reproduce:
alpha]
1. Download qpid_test_broker_perftest_ecte_bz468428
2. Run test

beta]
1. Run broker (non-daemon)
2. Run perftest --durable yes --tcp-nodelay --log-enable info+ --npubs 4 --nsubs 4 --sub-tx 10


Actual results:

perftest hangs during perftest-qpidd session (qpid_test_broker_perftest_ecte_bz468428 test):

[root@dhcp-lab-200 qpid_test_broker_perftest_ecte_bz468428]# gps 'make rb'
root     10977  0.0  0.0  59116   864 pts/13   S    12:02   0:00 make rb
root     25699  0.0  0.0  61108   748 pts/13   S+   15:16   0:00 grep make rb
[root@dhcp-lab-200 qpid_test_broker_perftest_ecte_bz468428]# pstree -p 10977
make(10977)───runtest.sh(10981)─┬─runtest.sh(11003)─┬─perftest(11145)─┬─{perftest}(11155)
                                │                   │                 ├─{perftest}(11156)
                                │                   │                 ├─{perftest}(11157)
                                │                   │                 ├─{perftest}(11158)
                                │                   │                 ├─{perftest}(11159)
                                │                   │                 ├─{perftest}(11161)
                                │                   │                 ├─{perftest}(11162)
                                │                   │                 ├─{perftest}(11163)
                                │                   │                 └─{perftest}(11164)
                                │                   └─runtest.sh(11091)───runtest.sh(11092)───qpidd(11099)─┬─{qpidd}(11100)
                                │                                                                          ├─{qpidd}(11101)
                                │                                                                          ├─{qpidd}(11102)
                                │                                                                          ├─{qpidd}(11103)
                                │                                                                          ├─{qpidd}(11104)
                                │                                                                          ├─{qpidd}(11106)
                                │                                                                          ├─{qpidd}(11107)
                                │                                                                          ├─{qpidd}(11109)
                                │                                                                          ├─{qpidd}(11110)
                                │                                                                          ├─{qpidd}(11111)
                                │                                                                          ├─{qpidd}(11112)
                                │                                                                          └─{qpidd}(11113)
                                └─tee(11005)


perftest program stack trace:

[root@dhcp-lab-200 qpid_test_broker_perftest_ecte_bz468428]# pstack 11145
Thread 10 (Thread 1175386432 (LWP 11155)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a5261f2b in qpid::client::TCPConnector::run ()
#4  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1185876288 (LWP 11156)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a5285529 in qpid::sys::Waitable::wait ()
#2  0x00000034a5284108 in qpid::client::LocalQueue::get ()
#3  0x00000034a5284969 in qpid::client::LocalQueue::get ()
#4  0x00000034a5284c29 in qpid::client::LocalQueue::pop ()
#5  0x0000000000419c55 in qpid::log::Options::~Options ()
#6  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#7  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1196366144 (LWP 11157)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a5261f2b in qpid::client::TCPConnector::run ()
#4  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1206856000 (LWP 11158)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a5285529 in qpid::sys::Waitable::wait ()
#2  0x00000034a5284108 in qpid::client::LocalQueue::get ()
#3  0x00000034a5284969 in qpid::client::LocalQueue::get ()
#4  0x00000034a5284c29 in qpid::client::LocalQueue::pop ()
#5  0x0000000000419c55 in qpid::log::Options::~Options ()
#6  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#7  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1217345856 (LWP 11159)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a5261f2b in qpid::client::TCPConnector::run ()
#4  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1227835712 (LWP 11161)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a5285529 in qpid::sys::Waitable::wait ()
#2  0x00000034a5284108 in qpid::client::LocalQueue::get ()
#3  0x00000034a5284969 in qpid::client::LocalQueue::get ()
#4  0x00000034a5284c29 in qpid::client::LocalQueue::pop ()
#5  0x0000000000419c55 in qpid::log::Options::~Options ()
#6  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#7  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1238325568 (LWP 11162)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a5261f2b in qpid::client::TCPConnector::run ()
#4  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1248815424 (LWP 11163)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a5285529 in qpid::sys::Waitable::wait ()
#2  0x00000034a5284108 in qpid::client::LocalQueue::get ()
#3  0x00000034a5284969 in qpid::client::LocalQueue::get ()
#4  0x00000034a5284c29 in qpid::client::LocalQueue::pop ()
#5  0x0000000000419c55 in qpid::log::Options::~Options ()
#6  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#7  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1259305280 (LWP 11164)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a5261f2b in qpid::client::TCPConnector::run ()
#4  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#5  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#6  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47361525085888 (LWP 11145)):
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000034a5285529 in qpid::sys::Waitable::wait ()
#2  0x00000034a5284108 in qpid::client::LocalQueue::get ()
#3  0x00000034a5284969 in qpid::client::LocalQueue::get ()
#4  0x00000034a5284c29 in qpid::client::LocalQueue::pop ()
#5  0x0000000000414ba6 in qpid::optValue<std::string> ()
#6  0x00000000004158b8 in qpid::optValue<std::string> ()
#7  0x000000000040c6d4 in qpid::framing::TransferContent::getData ()
#8  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#9  0x000000000040b7a9 in qpid::framing::TransferContent::getData ()
#10 0x00007fff73ccbcc8 in ?? ()
#11 0x0000000000000000 in ?? ()
#0  0x000000350bc0a496 in pthread_cond_wait@@GLIBC_2.3.2 ()


qpidd program stack trace:

[root@dhcp-lab-200 qpid_test_broker_perftest_ecte_bz468428]# pstack 11099
Thread 13 (Thread 1110358336 (LWP 11100)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00000034a536f2f6 in qpid::broker::Timer::run ()
#2  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 12 (Thread 1120848192 (LWP 11101)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00000034a536f2f6 in qpid::broker::Timer::run ()
#2  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 11 (Thread 1131338048 (LWP 11102)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00000034a536f2f6 in qpid::broker::Timer::run ()
#2  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 10 (Thread 1095444800 (LWP 11103)):
#0  0x000000350bc0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x00000034a536f2f6 in qpid::broker::Timer::run ()
#2  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#3  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 9 (Thread 1141827904 (LWP 11104)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 8 (Thread 1152317760 (LWP 11106)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 7 (Thread 1162807616 (LWP 11107)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 6 (Thread 1173297472 (LWP 11109)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 5 (Thread 1183787328 (LWP 11110)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 4 (Thread 1194277184 (LWP 11111)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 1204767040 (LWP 11112)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 1215256896 (LWP 11113)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a4d7311a in qpid::sys::AbsTime::AbsTime ()
#4  0x000000350bc062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000350b0d1b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 47255519278992 (LWP 11099)):
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6
#1  0x00000034a4d7b3cd in qpid::sys::Poller::wait ()
#2  0x00000034a4dbef18 in qpid::sys::Dispatcher::run ()
#3  0x00000034a52c72de in qpid::broker::Broker::run ()
#4  0x0000000000406858 in qpid::log::Options::~Options ()
#5  0x0000000000405438 in __cxa_pure_virtual ()
#6  0x000000350b01d8b4 in __libc_start_main () from /lib64/libc.so.6
#7  0x0000000000404eb9 in __cxa_pure_virtual ()
#8  0x00007fff223ce3f8 in ?? ()
#9  0x0000000000000000 in ?? ()
#0  0x000000350b0d1f58 in epoll_wait () from /lib64/libc.so.6

Expected results:

No hang. Perftest should exit after some reasonable time.

Additional info:

I performed pstack dumps multiple-times and see the same ones over the time.
Comment 1 Frantisek Reznicek 2008-12-02 11:05:17 EST
Reproduced now also on another machine RHTS 'nec-em15.rhts.bos.redhat.com' which is:
[10:19:39] Machine and user info:
[10:19:39]     os: Red Hat Enterprise Linux Server release 5.2 (Tikanga)
[10:19:39]  uname: Linux nec-em15.rhts.bos.redhat.com 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[10:19:39]   user: root
[10:19:39]    pwd: /root/qpid_test_broker_perftest_ecte_bz468428
[10:19:39] Get CPU information:
processor(s)    : 0...3 {4 in total}
vendor_id       : GenuineIntel
model name      :                   Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz         : 2793.093
cpu cores       : 1
bogomips        : 5588.46
bogomips        : 5585.36
bogomips        : 5585.35
bogomips        : 5585.29


I saw it hang on the first durable perftest (pid:4281)
After 45 minutes I cancelled the test and experiment package is available on
mrg3.lab.bos.redhat.com:/root/qpid_test_broker_perftest_ecte_bz468428_hang_081202.tar.bz2


Both qpidd and perftest switched to trace log mode so log files available in:
qpid_test_broker_perftest_ecte_bz468428/perftest.transcript.log
qpid_test_broker_perftest_ecte_bz468428/qpidd.transcript.log

pstree & ps gave this (afte 40 mins of inactivity)
make(4131)───runtest.sh(4135)─┬─runtest.sh(4150)─┬─perftest(4281)─┬─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                ├─{perftest}
                              │                  │                └─{perftest}
                              │                  └─runtest.sh(4237)───runtest.sh(4238)───qpidd(4241)─┬─{qpidd}(4243)
                              └─tee(4151)


[root@nec-em15 qpid_test_broker_perftest_ecte_bz468428]# ps auxw | egrep '(4131|4135|4150|4151|4281|4237|4238|4241|429[1-9]|424[3-9]|4250)'
root      4131  0.0  0.0  59164   896 pts/0    S+   10:19   0:00 make rb
root      4135  0.0  0.1  64480  1876 pts/0    S+   10:19   0:00 /bin/bash ./runtest.sh
root      4150  0.0  0.1  64480  1408 pts/0    S+   10:19   0:00 /bin/bash ./runtest.sh
root      4151  0.0  0.0  58904   584 pts/0    S+   10:19   0:00 tee /root/qpid_test_broker_perftest_ecte_bz468428/qpid_test_broker_perftest_ecte_bz468428.log
root      4237  0.0  0.1  64480  1068 pts/0    S+   10:19   0:00 /bin/bash ./runtest.sh
root      4238  0.0  0.1  64480  1288 pts/0    S+   10:19   0:00 /bin/bash ./runtest.sh
root      4241  0.7  3.7 394400 38728 pts/0    Sl+  10:19   0:13 /usr/sbin/qpidd --data-dir /root/qpid_test_broker_perftest_ecte_bz468428/rhts_qpidd/20081202_101939/broker.4229 --port 5672 --auth no --log-enable trace+
root      4281  0.2  3.1 389512 32396 pts/0    Sl+  10:19   0:03 /usr/bin/perftest --durable yes --tcp-nodelay --log-enable trace+ --npubs 4 --nsubs 4 --sub-tx 10

pstack gave to me "" output on both PIDs: 4281 (perftest), 4241(qpidd)
I can re-run somewhere else if needed.
Comment 2 Gordon Sim 2008-12-02 12:28:00 EST
I believe this is not-a-bug (at least based on the logs on mrg3).

The publisher is failing due to "PublishThread exception: Enqueue capacity threshold exceeded on queue "perftest0". (JournalImpl.cpp:510)" which is expected as the subscribers will be much slower than the publishers and the queue will grow until the publishers complete. As there is four of them using default count (500,000) and size (1k) the journal capacity needs to be above 2G to avoid running out of capacity.

When the publisher fails midway, the subscribers will never get the required number of messages and will wait for ever.
Comment 3 Frantisek Reznicek 2008-12-03 08:49:30 EST
I performed test with reduced number of messages:
perftest --durable yes --tcp-nodelay --count 5000 --npubs 4 --nsubs 4 --sub-tx 10
and now do not see 
PublishThread exception: Enqueue capacity threshold exceeded on queue "perftest0"
anymore and perftest normally passes also enlarging the journal will make perftest passing.

->CLOSE && NOT-A-BUG

Note You need to log in before you can comment on or make changes to this bug.