Bug 612600
Summary: | clustered qpidd - shutdowns due to memory bloat ('critical Error decoding events, may indicate a broker version mismatch: St9bad_alloc') | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||
Component: | qpid-cpp | Assignee: | Alan Conway <aconway> | ||||
Status: | CLOSED NOTABUG | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | Development | CC: | esammons | ||||
Target Milestone: | 1.3 | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-08-17 13:01:53 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Frantisek Reznicek
2010-07-08 15:21:41 UTC
The broker configuration for above described issue is: [root@mrg-qe-09 fcluster]# cat /etc/qpidd.conf # # Licensed to the Apache Software Foundation (ASF) under one ... # "qpidd --help" or "man qpidd" for more details. cluster-mechanism=ANONYMOUS auth=no log-to-file=/tmp/qpidd.log log-enable=info+ log-enable=debug+:cluster cluster-name=fcluster This is most likely to be a memory problem, what is the memory consumption like? I've investigated and found that there is most probably clustered qpidd memory bloat over time... Let's imagine following situation: 4 cluster nodes: 09 and 10 are RHEL5.5 x86_64; 11 12 are RHEL 5.5 i686. The long stress test using two perftests, tsxtest and various qmf agents lead to following efects: 1] i686 machines are leaving cluster as first ones: 11 and 12 to leave the cluster due to 2010-07-19 17:51:12 critical Error decoding events, may indicate a broker version mismatch: St9bad_alloc ... 2010-07-19 17:51:15 notice cluster(10.34.45.12:1630 LEFT) leaving cluster fcluster 2] then other RHEL 5.5 x86_64 which was NOT STARTED AS FIRST ONE leaves due to memory bloat: just an info about memory usage of first launched cluster node (machine 09): PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 25974 qpidd 15 0 729M 333M 8516 S 0.0 4.1 45:22.32 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon the other 64 bits which is running past 1] (machine 10) PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 21669 qpidd 15 0 11.2G 7042M 3652 S 0.0 87.4 41:08.47 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon Looking via pmap on the qpidd on machine 10 which is eating over 10GB virt memory: [root@mrg-qe-10 fcluster]# cat qpidd.pmap.0929 21669: /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon 0000000000400000 64K r-x-- /usr/sbin/qpidd 000000000060f000 4K rw--- /usr/sbin/qpidd 0000000004494000 983004K rw--- [ anon ] 00000000404aa000 4K ----- [ anon ] 00000000404ab000 10240K rw--- [ anon ] 0000000041094000 4K ----- [ anon ] 0000000041095000 10240K rw--- [ anon ] 0000000041a95000 4K ----- [ anon ] 0000000041a96000 10240K rw--- [ anon ] 0000000042496000 4K ----- [ anon ] 0000000042497000 10240K rw--- [ anon ] 0000000042e97000 4K ----- [ anon ] 0000000042e98000 10240K rw--- [ anon ] 0000000043898000 4K ----- [ anon ] 0000000043899000 10240K rw--- [ anon ] 0000000044299000 4K ----- [ anon ] 000000004429a000 10240K rw--- [ anon ] 0000000044c9a000 4K ----- [ anon ] 0000000044c9b000 10240K rw--- [ anon ] 000000004569b000 4K ----- [ anon ] 000000004569c000 10240K rw--- [ anon ] 000000004609c000 4K ----- [ anon ] 000000004609d000 10240K rw--- [ anon ] 00002aaaaaaab000 196K rw--- [ anon ] 00002aaaaaadc000 24K rw-s- /var/lib/qpidd/rhm/dat/__db.001 00002aaaaaae2000 272K rw-s- /var/lib/qpidd/rhm/dat/__db.002 00002aaaaab26000 288K rw-s- /var/lib/qpidd/rhm/dat/__db.003 00002aaaaab6e000 648K rw-s- /var/lib/qpidd/rhm/dat/__db.004 00002aaaaac10000 16K rw-s- /var/lib/qpidd/rhm/dat/__db.005 00002aaaaac14000 2932K rw-s- [ shmid=0x3c000d ] 00002aaaac000000 65536K rw--- [ anon ] 00002aaab0000000 65536K rw--- [ anon ] 00002aaab4000000 200K rw--- [ anon ] 00002aaab8000000 65484K rw--- [ anon ] 00002aaabbff3000 52K ----- [ anon ] 00002aaabc000000 65536K rw--- [ anon ] 00002aaac0000000 184K rw--- [ anon ] 00002aaac4000000 65536K rw--- [ anon ] 00002aaac8000000 65536K rw--- [ anon ] 00002aaacc000000 65536K rw--- [ anon ] 00002aaad0000000 8K rw--- [ anon ] 00002aaad8000000 131072K rw--- [ anon ] 00002aaae4000000 65536K rw--- [ anon ] 00002aaae8000000 65536K rw--- [ anon ] 00002aaaec000000 65532K rw--- [ anon ] 00002aaaeffff000 4K ----- [ anon ] 00002aaaf0000000 65532K rw--- [ anon ] 00002aaaf3fff000 4K ----- [ anon ] 00002aaaf4000000 65536K rw--- [ anon ] 00002aaaf8000000 131072K rw--- [ anon ] 00002aab00000000 65536K rw--- [ anon ] 00002aab04000000 65536K rw--- [ anon ] 00002aab08000000 65536K rw--- [ anon ] 00002aab0c000000 65536K rw--- [ anon ] 00002aab10000000 65536K rw--- [ anon ] 00002aab14000000 65536K rw--- [ anon ] 00002aab18000000 65536K rw--- [ anon ] 00002aab1c000000 65536K rw--- [ anon ] 00002aab20000000 393220K rw--- [ anon ] 00002aab3c000000 65536K rw--- [ anon ] 00002aab40000000 65536K rw--- [ anon ] 00002aab44000000 65536K rw--- [ anon ] 00002aab48000000 65536K rw--- [ anon ] 00002aab4c000000 56340K rw--- [ anon ] 00002aab4f705000 9196K ----- [ anon ] 00002aab50000000 65536K rw--- [ anon ] 00002aab54000000 65536K rw--- [ anon ] 00002aab58000000 65536K rw--- [ anon ] 00002aab5c000000 65536K rw--- [ anon ] 00002aab60000000 65480K rw--- [ anon ] 00002aab63ff2000 56K ----- [ anon ] 00002aab64000000 65536K rw--- [ anon ] 00002aab68000000 65536K rw--- [ anon ] 00002aab6c000000 65536K rw--- [ anon ] 00002aab70000000 65536K rw--- [ anon ] 00002aab74000000 131072K rw--- [ anon ] 00002aab7c000000 65536K rw--- [ anon ] 00002aab80000000 65536K rw--- [ anon ] 00002aab84000000 65536K rw--- [ anon ] 00002aab88000000 65536K rw--- [ anon ] 00002aab8c000000 65536K rw--- [ anon ] 00002aab90000000 65536K rw--- [ anon ] 00002aab94000000 65480K rw--- [ anon ] 00002aab97ff2000 56K ----- [ anon ] 00002aab98000000 65536K rw--- [ anon ] 00002aab9c000000 65536K rw--- [ anon ] 00002aaba0000000 131036K rw--- [ anon ] 00002aaba7ff7000 36K ----- [ anon ] 00002aaba8000000 65536K rw--- [ anon ] 00002aabac000000 65536K rw--- [ anon ] 00002aabb0000000 65536K rw--- [ anon ] 00002aabb4000000 65476K rw--- [ anon ] 00002aabb7ff1000 60K ----- [ anon ] 00002aabb8000000 65532K rw--- [ anon ] 00002aabbbfff000 4K ----- [ anon ] 00002aabc0000000 65488K rw--- [ anon ] 00002aabc3ff4000 48K ----- [ anon ] 00002aabc4000000 65536K rw--- [ anon ] 00002aabc8000000 65536K rw--- [ anon ] 00002aabcc000000 65536K rw--- [ anon ] 00002aabd0000000 65536K rw--- [ anon ] 00002aabd4000000 65536K rw--- [ anon ] 00002aabd8000000 65536K rw--- [ anon ] 00002aabdc000000 65484K rw--- [ anon ] 00002aabdfff3000 52K ----- [ anon ] 00002aabe0000000 65536K rw--- [ anon ] 00002aabe4000000 65512K rw--- [ anon ] 00002aabe7ffa000 24K ----- [ anon ] 00002aabe8000000 65536K rw--- [ anon ] 00002aabec000000 131072K rw--- [ anon ] 00002aabf4000000 20480K rw--- [ anon ] 00002aabf8000000 65536K rw--- [ anon ] 00002aabfc000000 65536K rw--- [ anon ] 00002aac00000000 17408K rw--- [ anon ] 00002aac04000000 65536K rw--- [ anon ] 00002aac08000000 65536K rw--- [ anon ] 00002aac0c000000 29696K rw--- [ anon ] 00002aac10000000 65536K rw--- [ anon ] 00002aac14000000 65536K rw--- [ anon ] 00002aac18000000 10240K rw--- [ anon ] 00002aac1c000000 65528K rw--- [ anon ] 00002aac1fffe000 8K ----- [ anon ] 00002aac24000000 65516K rw--- [ anon ] 00002aac27ffb000 20K ----- [ anon ] 00002aac28000000 65484K rw--- [ anon ] 00002aac2bff3000 52K ----- [ anon ] 00002aac2c000000 65512K rw--- [ anon ] 00002aac2fffa000 24K ----- [ anon ] 00002aac30000000 65536K rw--- [ anon ] 00002aac34000000 65536K rw--- [ anon ] 00002aac38000000 65536K rw--- [ anon ] 00002aac3c000000 65524K rw--- [ anon ] 00002aac3fffd000 12K ----- [ anon ] 00002aac40000000 65480K rw--- [ anon ] 00002aac43ff2000 56K ----- [ anon ] 00002aac44000000 65536K rw--- [ anon ] 00002aac48000000 65512K rw--- [ anon ] 00002aac4bffa000 24K ----- [ anon ] 00002aac4c000000 65536K rw--- [ anon ] 00002aac50000000 65536K rw--- [ anon ] 00002aac54000000 65536K rw--- [ anon ] 00002aac58000000 65536K rw--- [ anon ] 00002aac5c000000 28672K rw--- [ anon ] 00002aac60000000 65536K rw--- [ anon ] 00002aac64000000 65536K rw--- [ anon ] 00002aac68000000 10240K rw--- [ anon ] 00002aac6c000000 65536K rw--- [ anon ] 00002aac70000000 65536K rw--- [ anon ] 00002aac74000000 65536K rw--- [ anon ] 00002aac78000000 4096K rw--- [ anon ] 00002aac7c000000 65536K rw--- [ anon ] 00002aac80000000 65536K rw--- [ anon ] 00002aac84000000 27648K rw--- [ anon ] 00002aac88000000 65536K rw--- [ anon ] 00002aac8c000000 65480K rw--- [ anon ] 00002aac8fff2000 56K ----- [ anon ] 00002aac90000000 6144K rw--- [ anon ] 00002aac94000000 65532K rw--- [ anon ] 00002aac97fff000 4K ----- [ anon ] 00002aac98000000 65536K rw--- [ anon ] 00002aac9c000000 27648K rw--- [ anon ] 00002aaca0000000 65536K rw--- [ anon ] 00002aaca4000000 65536K rw--- [ anon ] 00002aaca8000000 1602564K rw--- [ anon ] 00002aad0c000000 65484K rw--- [ anon ] 00002aad0fff3000 52K ----- [ anon ] 00002aad10000000 65536K rw--- [ anon ] 00002aad14000000 131072K rw--- [ anon ] 00002aad1c000000 34816K rw--- [ anon ] 00002aad20000000 65536K rw--- [ anon ] 00002aad24000000 65480K rw--- [ anon ] 00002aad27ff2000 56K ----- [ anon ] 00002aad28000000 23552K rw--- [ anon ] 00002aad2c000000 65536K rw--- [ anon ] 00002aad30000000 65492K rw--- [ anon ] 00002aad33ff5000 44K ----- [ anon ] 00002aad34000000 65500K rw--- [ anon ] 00002aad37ff7000 36K ----- [ anon ] 00002aad38000000 65536K rw--- [ anon ] 00002aad3c000000 50176K rw--- [ anon ] 00002aad40000000 65488K rw--- [ anon ] 00002aad43ff4000 48K ----- [ anon ] 00002aad44000000 65536K rw--- [ anon ] 00002aad48000000 1024K rw--- [ anon ] 00002aad4c000000 65536K rw--- [ anon ] 00002aad50000000 65520K rw--- [ anon ] 00002aad53ffc000 16K ----- [ anon ] 00002aad54000000 46080K rw--- [ anon ] 00002aad58000000 65536K rw--- [ anon ] 00002aad5c000000 65536K rw--- [ anon ] 00002aad60000000 27648K rw--- [ anon ] 00002aad64000000 42472K rw--- [ anon ] 00002aad6697a000 23064K ----- [ anon ] 00002aad68000000 65536K rw--- [ anon ] 00002aad6c000000 5120K rw--- [ anon ] 00002aad70000000 54004K rw--- [ anon ] 00002aad734bd000 11532K ----- [ anon ] 00002aad74000000 26876K rw--- [ anon ] 00002aad75a3f000 38660K ----- [ anon ] 00002aad78000000 33792K rw--- [ anon ] 00002aad7c000000 3592K rw--- [ anon ] 00002aad7c382000 61944K ----- [ anon ] 00002aad80000000 16100K rw--- [ anon ] 00002aad80fb9000 49436K ----- [ anon ] 00002b7bb130c000 112K r-x-- /lib64/ld-2.5.so 00002b7bb1328000 8K rw--- [ anon ] 00002b7bb1330000 4K rw--- [ anon ] 00002b7bb1527000 4K r---- /lib64/ld-2.5.so 00002b7bb1528000 4K rw--- /lib64/ld-2.5.so 00002b7bb1529000 2572K r-x-- /usr/lib64/libqpidbroker.so.3.0.0 00002b7bb17ac000 2048K ----- /usr/lib64/libqpidbroker.so.3.0.0 00002b7bb19ac000 76K rw--- /usr/lib64/libqpidbroker.so.3.0.0 00002b7bb19bf000 12K rw--- [ anon ] 00002b7bb19c2000 2456K r-x-- /usr/lib64/libqpidcommon.so.3.0.0 00002b7bb1c28000 2044K ----- /usr/lib64/libqpidcommon.so.3.0.0 00002b7bb1e27000 120K rw--- /usr/lib64/libqpidcommon.so.3.0.0 00002b7bb1e45000 4K rw--- [ anon ] 00002b7bb1e46000 96K r-x-- /usr/lib64/libqpidtypes.so.1.0.0 00002b7bb1e5e000 2048K ----- /usr/lib64/libqpidtypes.so.1.0.0 00002b7bb205e000 4K rw--- /usr/lib64/libqpidtypes.so.1.0.0 00002b7bb205f000 4K rw--- [ anon ] 00002b7bb2060000 216K r-x-- /usr/lib64/libboost_program_options.so.1.33.1 00002b7bb2096000 2044K ----- /usr/lib64/libboost_program_options.so.1.33.1 00002b7bb2295000 8K rw--- /usr/lib64/libboost_program_options.so.1.33.1 00002b7bb2297000 64K r-x-- /usr/lib64/libboost_filesystem.so.1.33.1 00002b7bb22a7000 2044K ----- /usr/lib64/libboost_filesystem.so.1.33.1 00002b7bb24a6000 4K rw--- /usr/lib64/libboost_filesystem.so.1.33.1 00002b7bb24a7000 12K r-x-- /lib64/libuuid.so.1.2 00002b7bb24aa000 2048K ----- /lib64/libuuid.so.1.2 00002b7bb26aa000 4K rw--- /lib64/libuuid.so.1.2 00002b7bb26ab000 4K rw--- [ anon ] 00002b7bb26ac000 8K r-x-- /lib64/libdl-2.5.so 00002b7bb26ae000 2048K ----- /lib64/libdl-2.5.so 00002b7bb28ae000 4K r---- /lib64/libdl-2.5.so 00002b7bb28af000 4K rw--- /lib64/libdl-2.5.so 00002b7bb28b0000 28K r-x-- /lib64/librt-2.5.so 00002b7bb28b7000 2048K ----- /lib64/librt-2.5.so 00002b7bb2ab7000 4K r---- /lib64/librt-2.5.so 00002b7bb2ab8000 4K rw--- /lib64/librt-2.5.so 00002b7bb2ab9000 96K r-x-- /usr/lib64/libsasl2.so.2.0.22 00002b7bb2ad1000 2048K ----- /usr/lib64/libsasl2.so.2.0.22 00002b7bb2cd1000 4K rw--- /usr/lib64/libsasl2.so.2.0.22 00002b7bb2cd2000 4K rw--- [ anon ] 00002b7bb2cd3000 920K r-x-- /usr/lib64/libstdc++.so.6.0.8 00002b7bb2db9000 2044K ----- /usr/lib64/libstdc++.so.6.0.8 00002b7bb2fb8000 24K r---- /usr/lib64/libstdc++.so.6.0.8 00002b7bb2fbe000 12K rw--- /usr/lib64/libstdc++.so.6.0.8 00002b7bb2fc1000 72K rw--- [ anon ] 00002b7bb2fd3000 520K r-x-- /lib64/libm-2.5.so 00002b7bb3055000 2044K ----- /lib64/libm-2.5.so 00002b7bb3254000 4K r---- /lib64/libm-2.5.so 00002b7bb3255000 4K rw--- /lib64/libm-2.5.so 00002b7bb3256000 52K r-x-- /lib64/libgcc_s-4.1.2-20080825.so.1 00002b7bb3263000 2048K ----- /lib64/libgcc_s-4.1.2-20080825.so.1 00002b7bb3463000 4K rw--- /lib64/libgcc_s-4.1.2-20080825.so.1 00002b7bb3464000 4K rw--- [ anon ] 00002b7bb3465000 1336K r-x-- /lib64/libc-2.5.so 00002b7bb35b3000 2044K ----- /lib64/libc-2.5.so 00002b7bb37b2000 16K r---- /lib64/libc-2.5.so 00002b7bb37b6000 4K rw--- /lib64/libc-2.5.so 00002b7bb37b7000 20K rw--- [ anon ] 00002b7bb37bc000 88K r-x-- /lib64/libpthread-2.5.so 00002b7bb37d2000 2044K ----- /lib64/libpthread-2.5.so 00002b7bb39d1000 4K r---- /lib64/libpthread-2.5.so 00002b7bb39d2000 4K rw--- /lib64/libpthread-2.5.so 00002b7bb39d3000 20K rw--- [ anon ] 00002b7bb39d8000 68K r-x-- /lib64/libresolv-2.5.so 00002b7bb39e9000 2048K ----- /lib64/libresolv-2.5.so 00002b7bb3be9000 4K r---- /lib64/libresolv-2.5.so 00002b7bb3bea000 4K rw--- /lib64/libresolv-2.5.so 00002b7bb3beb000 8K rw--- [ anon ] 00002b7bb3bed000 36K r-x-- /lib64/libcrypt-2.5.so 00002b7bb3bf6000 2044K ----- /lib64/libcrypt-2.5.so 00002b7bb3df5000 4K r---- /lib64/libcrypt-2.5.so 00002b7bb3df6000 4K rw--- /lib64/libcrypt-2.5.so 00002b7bb3df7000 192K rw--- [ anon ] 00002b7bb3e27000 216K r-x-- /usr/lib64/qpid/daemon/acl.so 00002b7bb3e5d000 2044K ----- /usr/lib64/qpid/daemon/acl.so 00002b7bb405c000 12K rw--- /usr/lib64/qpid/daemon/acl.so 00002b7bb405f000 40K r-x-- /usr/lib64/qpid/daemon/watchdog.so 00002b7bb4069000 2048K ----- /usr/lib64/qpid/daemon/watchdog.so 00002b7bb4269000 4K rw--- /usr/lib64/qpid/daemon/watchdog.so 00002b7bb426a000 76K r-x-- /usr/lib64/qpid/daemon/rdma.so 00002b7bb427d000 2048K ----- /usr/lib64/qpid/daemon/rdma.so 00002b7bb447d000 4K rw--- /usr/lib64/qpid/daemon/rdma.so 00002b7bb4485000 116K r-x-- /usr/lib64/librdmawrap.so.3.0.0 00002b7bb44a2000 2044K ----- /usr/lib64/librdmawrap.so.3.0.0 00002b7bb46a1000 8K rw--- /usr/lib64/librdmawrap.so.3.0.0 00002b7bb46a3000 48K r-x-- /usr/lib64/libibverbs.so.1.0.0 00002b7bb46af000 2048K ----- /usr/lib64/libibverbs.so.1.0.0 00002b7bb48af000 4K rw--- /usr/lib64/libibverbs.so.1.0.0 00002b7bb48b0000 20K r-x-- /usr/lib64/librdmacm.so.1.0.0 00002b7bb48b5000 2044K ----- /usr/lib64/librdmacm.so.1.0.0 00002b7bb4ab4000 4K rw--- /usr/lib64/librdmacm.so.1.0.0 00002b7bb4ab5000 48K r-x-- /usr/lib64/qpid/daemon/replication_exchange.so 00002b7bb4ac1000 2048K ----- /usr/lib64/qpid/daemon/replication_exchange.so 00002b7bb4cc1000 4K rw--- /usr/lib64/qpid/daemon/replication_exchange.so 00002b7bb4cc2000 928K r-x-- /usr/lib64/qpid/daemon/msgstore.so 00002b7bb4daa000 2048K ----- /usr/lib64/qpid/daemon/msgstore.so 00002b7bb4faa000 24K rw--- /usr/lib64/qpid/daemon/msgstore.so 00002b7bb4fb0000 4K rw--- [ anon ] 00002b7bb4fb8000 1060K r-x-- /usr/lib64/libdb_cxx-4.3.so 00002b7bb50c1000 2044K ----- /usr/lib64/libdb_cxx-4.3.so 00002b7bb52c0000 24K rw--- /usr/lib64/libdb_cxx-4.3.so 00002b7bb52c6000 4K r-x-- /usr/lib64/libaio.so.1.0.1 00002b7bb52c7000 2044K ----- /usr/lib64/libaio.so.1.0.1 00002b7bb54c6000 4K rw--- /usr/lib64/libaio.so.1.0.1 00002b7bb54c7000 80K r-x-- /usr/lib64/qpid/daemon/ssl.so 00002b7bb54db000 2048K ----- /usr/lib64/qpid/daemon/ssl.so 00002b7bb56db000 8K rw--- /usr/lib64/qpid/daemon/ssl.so 00002b7bb56e4000 160K r-x-- /usr/lib64/libsslcommon.so.3.0.0 00002b7bb570c000 2048K ----- /usr/lib64/libsslcommon.so.3.0.0 00002b7bb590c000 8K rw--- /usr/lib64/libsslcommon.so.3.0.0 00002b7bb590e000 1176K r-x-- /usr/lib64/libnss3.so 00002b7bb5a34000 2044K ----- /usr/lib64/libnss3.so 00002b7bb5c33000 28K rw--- /usr/lib64/libnss3.so 00002b7bb5c3a000 4K rw--- [ anon ] 00002b7bb5c3b000 204K r-x-- /usr/lib64/libssl3.so 00002b7bb5c6e000 2048K ----- /usr/lib64/libssl3.so 00002b7bb5e6e000 12K rw--- /usr/lib64/libssl3.so 00002b7bb5e71000 216K r-x-- /usr/lib64/libnspr4.so 00002b7bb5ea7000 2048K ----- /usr/lib64/libnspr4.so 00002b7bb60a7000 8K rw--- /usr/lib64/libnspr4.so 00002b7bb60a9000 12K rw--- [ anon ] 00002b7bb60ac000 100K r-x-- /usr/lib64/libnssutil3.so 00002b7bb60c5000 2044K ----- /usr/lib64/libnssutil3.so 00002b7bb62c4000 20K rw--- /usr/lib64/libnssutil3.so 00002b7bb62c9000 4K rw--- [ anon ] 00002b7bb62ca000 16K r-x-- /usr/lib64/libplc4.so 00002b7bb62ce000 2044K ----- /usr/lib64/libplc4.so 00002b7bb64cd000 4K rw--- /usr/lib64/libplc4.so 00002b7bb64ce000 12K r-x-- /usr/lib64/libplds4.so 00002b7bb64d1000 2044K ----- /usr/lib64/libplds4.so 00002b7bb66d0000 4K rw--- /usr/lib64/libplds4.so 00002b7bb66d1000 80K r-x-- /usr/lib64/libz.so.1.2.3 00002b7bb66e5000 2044K ----- /usr/lib64/libz.so.1.2.3 00002b7bb68e4000 4K rw--- /usr/lib64/libz.so.1.2.3 00002b7bb68e5000 60K r-x-- /usr/lib64/qpid/daemon/replicating_listener.so 00002b7bb68f4000 2044K ----- /usr/lib64/qpid/daemon/replicating_listener.so 00002b7bb6af3000 8K rw--- /usr/lib64/qpid/daemon/replicating_listener.so 00002b7bb6af5000 84K r-x-- /usr/lib64/qpid/daemon/xml.so 00002b7bb6b0a000 2044K ----- /usr/lib64/qpid/daemon/xml.so 00002b7bb6d09000 8K rw--- /usr/lib64/qpid/daemon/xml.so 00002b7bb6d12000 3936K r-x-- /usr/lib64/libxerces-c.so.28.0 00002b7bb70ea000 2044K ----- /usr/lib64/libxerces-c.so.28.0 00002b7bb72e9000 272K rw--- /usr/lib64/libxerces-c.so.28.0 00002b7bb732d000 4K rw--- [ anon ] 00002b7bb732e000 3880K r-x-- /usr/lib64/libxqilla.so.3.0.0 00002b7bb76f8000 2044K ----- /usr/lib64/libxqilla.so.3.0.0 00002b7bb78f7000 412K rw--- /usr/lib64/libxqilla.so.3.0.0 00002b7bb795e000 4K rw--- [ anon ] 00002b7bb795f000 800K r-x-- /usr/lib64/qpid/daemon/cluster.so 00002b7bb7a27000 2048K ----- /usr/lib64/qpid/daemon/cluster.so 00002b7bb7c27000 32K rw--- /usr/lib64/qpid/daemon/cluster.so 00002b7bb7c2f000 8K rw--- [ anon ] 00002b7bb7c38000 16K r-x-- /usr/lib64/openais/libcpg.so.2.0.0 00002b7bb7c3c000 2044K ----- /usr/lib64/openais/libcpg.so.2.0.0 00002b7bb7e3b000 4K rw--- /usr/lib64/openais/libcpg.so.2.0.0 00002b7bb7e3c000 20K r-x-- /usr/lib64/libcman.so.2.0.115 00002b7bb7e41000 2044K ----- /usr/lib64/libcman.so.2.0.115 00002b7bb8040000 4K rw--- /usr/lib64/libcman.so.2.0.115 00002b7bb8041000 872K r-x-- /usr/lib64/libqpidclient.so.3.0.0 00002b7bb811b000 2048K ----- /usr/lib64/libqpidclient.so.3.0.0 00002b7bb831b000 28K rw--- /usr/lib64/libqpidclient.so.3.0.0 00002b7bb8322000 8K rw--- [ anon ] 00007ffffca39000 528K rw--- [ stack ] ffffffffff600000 8192K ----- [ anon ] total 11803876K Difference in last couple of minutes shows: [root@mrg-qe-10 fcluster]# diff -u qpidd.pmap.0920 qpidd.pmap.0929 --- qpidd.pmap.0920 2010-07-20 09:22:19.000000000 +0200 +++ qpidd.pmap.0929 2010-07-20 09:30:49.000000000 +0200 @@ -188,19 +188,21 @@ 00002aad53ffc000 16K ----- [ anon ] 00002aad54000000 46080K rw--- [ anon ] 00002aad58000000 65536K rw--- [ anon ] -00002aad5c000000 54192K rw--- [ anon ] -00002aad5f4ec000 11344K ----- [ anon ] +00002aad5c000000 65536K rw--- [ anon ] 00002aad60000000 27648K rw--- [ anon ] -00002aad64000000 11808K rw--- [ anon ] -00002aad64b88000 53728K ----- [ anon ] -00002aad68000000 30268K rw--- [ anon ] -00002aad69d8f000 35268K ----- [ anon ] +00002aad64000000 42472K rw--- [ anon ] +00002aad6697a000 23064K ----- [ anon ] +00002aad68000000 65536K rw--- [ anon ] 00002aad6c000000 5120K rw--- [ anon ] -00002aad70000000 34620K rw--- [ anon ] -00002aad721cf000 30916K ----- [ anon ] -00002aad74000000 13876K rw--- [ anon ] -00002aad74d8d000 51660K ----- [ anon ] +00002aad70000000 54004K rw--- [ anon ] +00002aad734bd000 11532K ----- [ anon ] +00002aad74000000 26876K rw--- [ anon ] +00002aad75a3f000 38660K ----- [ anon ] 00002aad78000000 33792K rw--- [ anon ] +00002aad7c000000 3592K rw--- [ anon ] +00002aad7c382000 61944K ----- [ anon ] +00002aad80000000 16100K rw--- [ anon ] +00002aad80fb9000 49436K ----- [ anon ] 00002b7bb130c000 112K r-x-- /lib64/ld-2.5.so 00002b7bb1328000 8K rw--- [ anon ] 00002b7bb1330000 4K rw--- [ anon ] @@ -364,4 +366,4 @@ 00002b7bb8322000 8K rw--- [ anon ] 00007ffffca39000 528K rw--- [ stack ] ffffffffff600000 8192K ----- [ anon ] - total 11672804K + total 11803876K To be clear on what is running there, there is at the moment of pmap check running tsxtest on 09 and nothing ran on 10. Just to make sure that the clients are not causing the issue I stopped all of them and saw that memory usage changed like this: machine 09 (first in cluster) PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 25974 qpidd 15 0 729M 332M 8516 S 0.0 4.1 45:26.42 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon machine 10 (second in cluster) PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 21669 qpidd 15 0 11.2G 7074M 3652 S 0.0 87.8 41:09.80 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon This confirms the conclusion, first node memory usage dropped by approx. 1MB, but second one is still incredibly high. The conclusion and hypothesis: Clustered nodes are suffering from memory bloat when not started as first cluster node. This issue causes the 32 bit cluster nodes to exit sooner than 64 bit nodes and finally the 64bits not first-in-cluster nodes fill up the memory and swap. Tested on qpid-cpp-server-0.7.946106-6.el5 and qpid-cpp-server-0.7.946106-7.el5 with same behavior One final stats about memory status of 09 machine (first-in-cluster): [root@mrg-qe-09 fcluster]# cat /proc/25974/status Name: qpidd State: S (sleeping) SleepAVG: 98% Tgid: 25974 Pid: 25974 PPid: 1 TracerPid: 0 Uid: 101 101 101 101 Gid: 39 39 39 39 FDSize: 1024 Groups: 0 39 VmPeak: 801296 kB VmSize: 746840 kB VmLck: 0 kB VmHWM: 345528 kB VmRSS: 340144 kB VmData: 620536 kB VmStk: 536 kB VmExe: 64 kB VmLib: 24048 kB VmPTE: 1032 kB StaBrk: 129dd000 kB Brk: 174d4000 kB StaStk: 7fffa7daeb00 kB Threads: 14 SigQ: 1/69632 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: fffffffe7ffbfeff SigIgn: 0000000000011001 SigCgt: 0000000180004002 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff Mems_allowed: 00000000,00000003 and 10 (second-in-cluster): [root@mrg-qe-10 fcluster]# cat /proc/21669/status Name: qpidd State: S (sleeping) SleepAVG: 98% Tgid: 21669 Pid: 21669 PPid: 1 TracerPid: 0 Uid: 101 101 101 101 Gid: 39 39 39 39 FDSize: 1024 Groups: 0 39 VmPeak: 11861220 kB VmSize: 11795684 kB VmLck: 0 kB VmHWM: 7931468 kB VmRSS: 7245504 kB VmData: 11678700 kB VmStk: 528 kB VmExe: 64 kB VmLib: 22948 kB VmPTE: 21752 kB StaBrk: 04494000 kB Brk: 4048b000 kB StaStk: 7ffffcabc440 kB Threads: 11 SigQ: 1/69632 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: fffffffe7ffbfeff SigIgn: 0000000000011001 SigCgt: 0000000180004002 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff Mems_allowed: 00000000,00000003 Created attachment 433094 [details]
The detailed pmap and process statuses
I've confirmed that busy clustered brokers are growing over time, it's not a matter of queue depth. There appears to be a leak somewhere. Withdraw the previous comment, the bloat I was seeing was queue depth. I ran the following on a 4 node cluster over the weekend, 64 hours so far with no error. Top shows qpidd processes are stable at between 700M and 1024M node1: info_thread.sh node2: tsxtest_thread.sh node3&4: perftest_thread.sh No core files, grep 'ecode:[^0]' in logs found nothing, tail of logs looks normal. This was with trunk r985356 which corresponds to release tags mrg1.3-beta5 qpid-cpp-mrg-0.7.946106-12 After weekend retest I cannot reproduce the issue anymore. In the meantime I updated the test to properly kill the clients connected to the clustered nodes. I verified that the above described situation happened when clients were not properly killed, so there were multiple clients accessing the broker. For example 50 perftests, 20 tsxtests and ~ 40 qpid-stat. I believe that this bug should be closed as CLOSED/NOTABUG due to mistake in the test. Here are current results of the broker VM memory consumption (700 - 1815 MB): [root@mrg-qe-09 fcluster]# ps auxw | grep qpidd | grep -v grep qpidd 8837 18.2 6.2 937940 515472 ? Ssl Aug13 1000:40 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon [root@mrg-qe-10 fcluster]# ps auxw | grep qpidd | grep -v grep qpidd 26503 17.7 15.2 1815592 1257820 ? Ssl Aug13 977:06 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon [root@mrg-qe-11 ~]# [root@mrg-qe-11 fcluster]# ps auxw | grep qpidd | grep -v grep qpidd 7839 20.5 6.2 698616 519712 ? Ssl Aug13 1126:28 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon [root@mrg-qe-12 ~]# [root@mrg-qe-12 fcluster]# ps auxw | grep qpidd | grep -v grep qpidd 31718 25.2 6.2 701124 516080 ? Ssl Aug13 1388:37 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon |