Description of problem: Our environment consists of 4 VM's , (RHEL5, 3cores, 2048MB each). When brokers are heavily stressed with multiple perftests and one of the node rejoins cluster often (see "Steps to Reproduce" step 3. ). One of the nodes which permanently remains in the cluster will grow in it's memory consumption. After few hours, one node gets OOM and this consequently leads cluster into not being responsive and qpidd abort. Node with many restarts got problems joining cluster after a while 2011-05-09 14:10:50 notice Initializing CPG 2011-05-09 14:10:51 notice Cluster store state: dirty cluster-id=f3d3c02e-3859-43c6-90ff-3bfa3a39c380 2011-05-09 14:10:51 info Retrying cpg_join 2011-05-09 14:10:51 info Retrying cpg_join 2011-05-09 14:10:51 info Retrying cpg_join 2011-05-09 14:10:51 info Retrying cpg_join 2011-05-09 14:10:51 info Retrying cpg_join 2011-05-09 14:10:51 debug Shutting down CPG 2011-05-09 14:10:51 critical Unexpected error: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6) 2011-05-09 14:10:52 critical Unexpected error: Daemon startup failed: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6) I'm unsure whether this is relevant, to this issue or not but this was seen in logs from OOM node: 2011-05-09 14:10:43 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 2011-05-09 14:10:50 error cluster(192.168.5.4:12068 READY/error) aborting connection 192.168.5.2:11974-469: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) apply config change to error 626452337: 192.168.5.3:11788 192.168.5.4:12068 2011-05-09 14:10:59 error cluster(192.168.5.4:12068 READY/error) aborting connection 192.168.5.2:11974-469: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) apply config change to error 626452337: 192.168.5.1:17032 192.168.5.3:11788 192.168.5.4:12068 2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 Version-Release number of selected component (if applicable): # rpm -qa | grep qpid | sort -u python-qpid-0.10-1.el5 python-qpid-qmf-0.10-6.el5 qpid-cpp-client-0.10-6.el5 qpid-cpp-client-devel-0.10-6.el5 qpid-cpp-client-devel-docs-0.10-6.el5 qpid-cpp-client-ssl-0.10-6.el5 qpid-cpp-mrg-debuginfo-0.10-6.el5 qpid-cpp-server-0.10-6.el5 qpid-cpp-server-cluster-0.10-6.el5 qpid-cpp-server-ssl-0.10-6.el5 qpid-cpp-server-store-0.10-6.el5 qpid-cpp-server-xml-0.10-6.el5 qpid-java-client-0.10-4.el5 qpid-java-common-0.10-4.el5 qpid-java-example-0.10-4.el5 qpid-qmf-0.10-2.el5 qpid-qmf-0.10-6.el5 qpid-qmf-devel-0.10-2.el5 qpid-qmf-devel-0.10-6.el5 qpid-tools-0.10-4.el5 rh-qpid-cpp-tests-0.10-6.el5 How reproducible: it takes few hours to get there Steps to Reproduce: setup cluster of 4 VM's, on each echo ' cluster-mechanism=ANONYMOUS auth=yes cluster-name=mrg-qe-16-virtual-cluster log-to-file=/tmp/qpidd.log log-enable=info+ log-enable=debug+:cluster ' >/etc/qpidd.conf service openais start service qpidd start 1. ON EVERY VM while true; do qpid-perftest \ --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) -s \ --base-name "$(uname -n)_001" \ --password guest --username guest;\ sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log 2. ON EVERY VM while true; do qpid-perftest \ --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) -s \ --base-name "$(uname -n)_002" \ --password guest --username guest;\ sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log ========= 3. for ONE NODE ONLY run ( this will start && stop qpid service randomly ) ========= while true; do date +%Y%m%d_%H%M; service qpidd stop ; TIMEDOWN=$(( ${RANDOM} % 20 )); echo "TIMEDOWN=${TIMEDOWN}"; sleep ${TIMEDOWN}; service qpidd start; TIMEUP=$(( ${RANDOM} % 600 )); echo "TIMEUP=${TIMEUP}"; sleep ${TIMEUP}; done | tee service_restart_$(date +%Y%m%d_%H%M).log Actual results: One of the qpidd nodes gets OOM, cluster is not responsive, all qpidd's aborts cluster Expected results: cluster is responsive all the time Additional info: VM's network interface for openais is virtio device Average RX/TX on network interfaces dedicated for openais was 5.3 / 2.9 TiB
Trying to reproduce this I've seen openais hang and crash. I've raised bug 706278. Some suspicion that this may be related to the fact that these are VMs. Are rhel5 guests on rhel6.1 (beta?) a supported configuration?
VM's should be rhel5.6 on rhel 5.6 host
I've been testing qpid-cpp-server-0.10-7.el5 on the same boxes for about 11 hours with no error.
I have been running the reproducer qpid-cpp-server-0.10-7.el5 on the same boxes since Wed May 25 13:04, which is 5 days now. The brokers are all running and responding and the clients are running without error. I think this issue is fixed in the 10-7 packages.
I was running from yesterday and my latest observation shows: that issue should be broken into two parts: 1.1) Sometimes porocess qpid-perftest responsible for OOM. This might be because we are running two instances of qpid-perftest from each node. Both with default values message count(500000) and byte size(1024). This means in worst case there are 8 perftests running against one cluster-member. VM's are provided with only 2GB memory. I'm going to retest with msg size 128. But i have no clue what makes perftest so memory intensive. (in such case it eats ~ 10MB/s) there is "handmade" qpid-perftest core file @10.34.37.125:/root/fcluster/core.4518 1.2) then consequently qpidd on such node aborts cluster. on next attempt to rejoin cluster we see from logs: 2011-05-30 08:07:40 notice Cluster store state: dirty cluster-id=71798f90-8219-4bf2-941d-c9fbaae48233 2011-05-30 08:07:40 info Retrying cpg_join 2011-05-30 08:07:40 info Retrying cpg_join 2011-05-30 08:07:40 info Retrying cpg_join 2011-05-30 08:07:40 info Retrying cpg_join 2011-05-30 08:07:40 info Retrying cpg_join 2011-05-30 08:07:40 debug Shutting down CPG 2011-05-30 08:07:41 critical Unexpected error: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6) 2011-05-30 08:07:41 critical Unexpected error: Daemon startup failed: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6) Because one of four qpidd nodes is randomly toggled - it fails to start and join cluster with the same results as previously described node. The other two remaining nodes are then unresponsive. at this time they need to deal with load done by perftests which should be spread among four nodes handmade core files of surviving node and aisexec @10.34.37.128 /root/fcluster/core.4153 /root/fcluster/core.4170 2.0 ) We lowered message size to 128 B ( to avoid situation described by bullet 1.1) A = 10.34.37.124 B = 10.34.37.127 C = 10.34.37.128 D = 10.34.37.125 ALL NODES A,B,C,D: were running while true; do qpid-perftest --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) \ -s --base-name "$(uname -n)_001" --password guest --username guest \ --size 128;sleep $(( ${RANDOM} % 3 )); done | tee ./random1_perftest.log while true; do qpid-perftest --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) \ -s --base-name "$(uname -n)_002" --password guest --username guest \ --size 128;sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log ONLY NODE A: while true; do date +%Y%m%d_%H%M; service qpidd stop ; \ TIMEDOWN=$(( ${RANDOM} % 20 )); echo "TIMEDOWN=${TIMEDOWN}"; sleep ${TIMEDOWN}; service qpidd start; TIMEUP=$(( ${RANDOM} % 600 )); echo "TIMEUP=${TIMEUP}"; sleep ${TIMEUP}; done | tee service_restart_$(date +%Y%m%d_%H%M).log ONLY NODES B,C,D: while true; do qpid-cluster; sleep $(( ${RANDOM} % 7 )); \ done | tee ./log_qpid_cluster.log 2.1) possible OOM observed on node B After less than 24 hours, we found that (excluded node A - which is forced to leave and reconnect with cluster) one node running on 10.34.37.127 (B) is close to OOM - handmade core file dumped B @10.34.37.127:/root/core.5046 remaining nodes also dumped by gdb D @10.34.37.125:/root/core.26929 C @10.34.37.128:/root/core.5204 NODE B) /tmp/qpidd.log ... 011-05-31 05:29:36 error Execution exception: not-found: Unknown destination rdest (qp id/broker/SemanticState.cpp:564) 2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) channel error 396276176 4 on 127.0.0.1:5672-127.0.0.1:51081(192.168.5.4:5171-9873 shadow) must be resolved with: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 : not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564) 2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:30:22 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.1:5542 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:30:22 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 ... 2011-05-31 05:31:30 warning QueueCleaner task overran 1 times by 600ms (taking 829221000ns) on average. 2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.1:5731 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 resolved with 192.168.5.2:5046 2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:40:25 critical cluster(192.168.5.2:5046 READY/error) local error 3962761764 did not occur on member 192.168.5.3:5204: not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564) 2011-05-31 05:40:25 critical Error delivering frames: local error did not occur on all cluster members : not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564) (qpid/cluster/ErrorCheck.cpp:89) 2011-05-31 05:40:25 notice cluster(192.168.5.2:5046 LEFT/error) leaving cluster mrg-qe-16-virtual-cluster 2011-05-31 05:40:28 debug Shutting down CPG 2011-05-31 05:40:36 notice Shut down node A) /tmp/qpidd.log 2011-05-31 05:27:24 notice Store directory /var/lib/qpidd/rhm was pushed down (saved) into directory /var/lib/qpidd/_cluster.bak.00e4. 2011-05-31 05:27:24 notice Journal "TplStore": Created 2011-05-31 05:27:25 info SASL enabled 2011-05-31 05:27:25 notice Listening on TCP port 5672 2011-05-31 05:27:25 info SSL plugin not enabled, you must set --ssl-cert-db to enable it. 2011-05-31 05:27:25 info Policy file not specified. ACL Disabled, no ACL checking being done! 2011-05-31 05:27:25 debug cluster(192.168.5.1:5731 INIT) initial status map complete. 2011-05-31 05:27:25 debug cluster(192.168.5.1:5731 INIT) elders: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 2011-05-31 05:27:25 info cluster(192.168.5.1:5731 INIT) not active for links. 2011-05-31 05:27:25 notice cluster(192.168.5.1:5731 INIT) cluster-uuid = 71798f90-8219-4bf2-941d-c9fbaae48233 2011-05-31 05:27:25 notice cluster(192.168.5.1:5731 JOINER) joining cluster mrg-qe-16-virtual-cluster 2011-05-31 05:27:25 debug Shutting down CPG 2011-05-31 05:27:25 notice Shut down 2011-05-31 05:27:25 critical Unexpected error: Error writing to parent. node C) /tmp/qpidd.log 2011-05-31 05:06:57 warning JournalInactive:TplStore task late and overran 2 times: lat e 2ms, overran 2ms (taking 10000ns) on average. 2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:09 warning JournalInactive:TplStore task late and overran 2 times: lat e 2ms, overran 2ms (taking 14000ns) on average. 2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:10 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:11 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) 2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:13 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:19 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:29 warning JournalInactive:TplStore task late and overran 2 times: lat e 2ms, overran 2ms (taking 11500ns) on average. ... 2011-05-31 05:29:36 debug cluster(192.168.5.3:5204 READY) error 3962761764 did not occur locally. 2011-05-31 05:29:37 warning JournalInactive:TplStore task late and overran 3 times: late 2ms, overran 2ms (taking 9666ns) on average. node D) /tmp/qpidd.log 2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:10 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:11 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101) 2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112) 2011-05-31 05:07:13 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5 .2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93) 2011-05-31 05:07:19 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
Created attachment 502255 [details] How to run qpidd with cman for quorum I'm wondering if the virtual cluster is suffering a virtual network partition & split-brain. I have seen this in another site running on VMs (bug 684783) Can you try running the test on a 3 node cluster (or 5 node, has to be odd though) as described in the attached PDF? If this is split brain you will see failures but they should be due to loss of quorum.
(In reply to comment #7) > Created attachment 502255 [details] > How to run qpidd with cman for quorum > > I'm wondering if the virtual cluster is suffering a virtual network partition & > split-brain. I have seen this in another site running on VMs (bug 684783) > Can you try running the test on a 3 node cluster (or 5 node, has to be odd > though) as described in the attached PDF? If this is split brain you will see > failures but they should be due to loss of quorum. I made a pmap of leaked qpidd before it was killed, which shows that qpidd is using much more resources than other nodes. Location of all interesting core files can be found in comment #6. Since yesterday we are running "soft" version of our test, we are not running qpid-perftest when load on VM is over 4.00. But still here are two questions opened 1) why qpid-perftest leaks (if it's not caused by high IO) 2) what makes qpidd to leak I'll recheck with CMAN, on this cluster when our new "soft" test will be running for over 36 hours. two days from now 5046: /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon 0000000000400000 64K r-x-- /usr/sbin/qpidd 000000000060f000 80K rw--- /usr/sbin/qpidd 000000000e831000 776136K rw--- [ anon ] 000000004087b000 4K ----- [ anon ] 000000004087c000 10240K rw--- [ anon ] 000000004189f000 4K ----- [ anon ] 00000000418a0000 10240K rw--- [ anon ] 00000000422a0000 4K ----- [ anon ] 00000000422a1000 10240K rw--- [ anon ] 0000000042ca1000 4K ----- [ anon ] 0000000042ca2000 10240K rw--- [ anon ] 00000000436a2000 4K ----- [ anon ] 00000000436a3000 10240K rw--- [ anon ] 00000000440a3000 4K ----- [ anon ] 00000000440a4000 10240K rw--- [ anon ] 0000000044aa4000 4K ----- [ anon ] 0000000044aa5000 10240K rw--- [ anon ] 00000000454a5000 4K ----- [ anon ] 00000000454a6000 10240K rw--- [ anon ] 0000000045ea6000 4K ----- [ anon ] 0000000045ea7000 10240K rw--- [ anon ] 0000003a43600000 100K r-x-- /usr/lib64/libqpidtypes.so.1.2.0 0000003a43619000 2044K ----- /usr/lib64/libqpidtypes.so.1.2.0 0000003a43818000 4K rw--- /usr/lib64/libqpidtypes.so.1.2.0 0000003a43a00000 2520K r-x-- /usr/lib64/libqpidcommon.so.5.0.0 0000003a43c76000 2044K ----- /usr/lib64/libqpidcommon.so.5.0.0 0000003a43e75000 124K rw--- /usr/lib64/libqpidcommon.so.5.0.0 0000003a43e94000 4K rw--- [ anon ] 0000003a44000000 2908K r-x-- /usr/lib64/libqpidbroker.so.5.0.0 0000003a442d7000 2044K ----- /usr/lib64/libqpidbroker.so.5.0.0 0000003a444d6000 92K rw--- /usr/lib64/libqpidbroker.so.5.0.0 0000003a444ed000 12K rw--- [ anon ] 0000003a66400000 112K r-x-- /lib64/ld-2.5.so 0000003a6661b000 4K r---- /lib64/ld-2.5.so 0000003a6661c000 4K rw--- /lib64/ld-2.5.so 0000003a66800000 1336K r-x-- /lib64/libc-2.5.so 0000003a6694e000 2048K ----- /lib64/libc-2.5.so 0000003a66b4e000 16K r---- /lib64/libc-2.5.so 0000003a66b52000 4K rw--- /lib64/libc-2.5.so 0000003a66b53000 20K rw--- [ anon ] 0000003a66c00000 8K r-x-- /lib64/libdl-2.5.so 0000003a66c02000 2048K ----- /lib64/libdl-2.5.so 0000003a66e02000 4K r---- /lib64/libdl-2.5.so 0000003a66e03000 4K rw--- /lib64/libdl-2.5.so 0000003a67000000 88K r-x-- /lib64/libpthread-2.5.so 0000003a67016000 2044K ----- /lib64/libpthread-2.5.so 0000003a67215000 4K r---- /lib64/libpthread-2.5.so 0000003a67216000 4K rw--- /lib64/libpthread-2.5.so 0000003a67217000 16K rw--- [ anon ] 0000003a67400000 520K r-x-- /lib64/libm-2.5.so 0000003a67482000 2044K ----- /lib64/libm-2.5.so 0000003a67681000 4K r---- /lib64/libm-2.5.so 0000003a67682000 4K rw--- /lib64/libm-2.5.so 0000003a67800000 216K r-x-- /usr/lib64/libboost_program_options.so.1.33.1 0000003a67836000 2044K ----- /usr/lib64/libboost_program_options.so.1.33.1 0000003a67a35000 8K rw--- /usr/lib64/libboost_program_options.so.1.33.1 0000003a68400000 28K r-x-- /lib64/librt-2.5.so 0000003a68407000 2048K ----- /lib64/librt-2.5.so 0000003a68607000 4K r---- /lib64/librt-2.5.so 0000003a68608000 4K rw--- /lib64/librt-2.5.so 0000003a68800000 64K r-x-- /usr/lib64/libboost_filesystem.so.1.33.1 0000003a68810000 2044K ----- /usr/lib64/libboost_filesystem.so.1.33.1 0000003a68a0f000 4K rw--- /usr/lib64/libboost_filesystem.so.1.33.1 0000003a69400000 68K r-x-- /lib64/libresolv-2.5.so 0000003a69411000 2048K ----- /lib64/libresolv-2.5.so 0000003a69611000 4K r---- /lib64/libresolv-2.5.so 0000003a69612000 4K rw--- /lib64/libresolv-2.5.so 0000003a69613000 8K rw--- [ anon ] 0000003a6c800000 52K r-x-- /lib64/libgcc_s-4.1.2-20080825.so.1 0000003a6c80d000 2048K ----- /lib64/libgcc_s-4.1.2-20080825.so.1 0000003a6ca0d000 4K rw--- /lib64/libgcc_s-4.1.2-20080825.so.1 0000003a6f800000 920K r-x-- /usr/lib64/libstdc++.so.6.0.8 0000003a6f8e6000 2044K ----- /usr/lib64/libstdc++.so.6.0.8 0000003a6fae5000 24K r---- /usr/lib64/libstdc++.so.6.0.8 0000003a6faeb000 12K rw--- /usr/lib64/libstdc++.so.6.0.8 0000003a6faee000 72K rw--- [ anon ] 0000003a76200000 36K r-x-- /lib64/libcrypt-2.5.so 0000003a76209000 2044K ----- /lib64/libcrypt-2.5.so 0000003a76408000 4K r---- /lib64/libcrypt-2.5.so 0000003a76409000 4K rw--- /lib64/libcrypt-2.5.so 0000003a7640a000 184K rw--- [ anon ] 0000003a79200000 12K r-x-- /usr/lib64/libplds4.so 0000003a79203000 2044K ----- /usr/lib64/libplds4.so 0000003a79402000 4K rw--- /usr/lib64/libplds4.so 0000003a79600000 16K r-x-- /usr/lib64/libplc4.so 0000003a79604000 2044K ----- /usr/lib64/libplc4.so 0000003a79803000 4K rw--- /usr/lib64/libplc4.so 0000003a79e00000 100K r-x-- /usr/lib64/libnssutil3.so 0000003a79e19000 2044K ----- /usr/lib64/libnssutil3.so 0000003a7a018000 24K rw--- /usr/lib64/libnssutil3.so 0000003a7a200000 1176K r-x-- /usr/lib64/libnss3.so 0000003a7a326000 2044K ----- /usr/lib64/libnss3.so 0000003a7a525000 28K rw--- /usr/lib64/libnss3.so 0000003a7a52c000 4K rw--- [ anon ] 0000003a7a600000 216K r-x-- /usr/lib64/libnspr4.so 0000003a7a636000 2048K ----- /usr/lib64/libnspr4.so 0000003a7a836000 8K rw--- /usr/lib64/libnspr4.so 0000003a7a838000 12K rw--- [ anon ] 0000003a7ae00000 12K r-x-- /lib64/libuuid.so.1.2 0000003a7ae03000 2048K ----- /lib64/libuuid.so.1.2 0000003a7b003000 4K rw--- /lib64/libuuid.so.1.2 0000003a7b600000 204K r-x-- /usr/lib64/libssl3.so 0000003a7b633000 2048K ----- /usr/lib64/libssl3.so 0000003a7b833000 12K rw--- /usr/lib64/libssl3.so 0000003a7c600000 96K r-x-- /usr/lib64/libsasl2.so.2.0.22 0000003a7c618000 2048K ----- /usr/lib64/libsasl2.so.2.0.22 0000003a7c818000 4K rw--- /usr/lib64/libsasl2.so.2.0.22 00002aaaaaaab000 196K rw--- [ anon ] 00002aaaaaadc000 24K rw-s- /var/lib/qpidd/rhm/dat/__db.001 00002aaaaaae2000 272K rw-s- /var/lib/qpidd/rhm/dat/__db.002 00002aaaaab26000 288K rw-s- /var/lib/qpidd/rhm/dat/__db.003 00002aaaaab6e000 648K rw-s- /var/lib/qpidd/rhm/dat/__db.004 00002aaaaac10000 16K rw-s- /var/lib/qpidd/rhm/dat/__db.005 00002aaaaac14000 2932K rw-s- [ shmid=0x2a0005 ] 00002aaaaaef1000 860K r-x-- /usr/lib64/sasl2/libsasldb.so.2.0.22 00002aaaaafc8000 2044K ----- /usr/lib64/sasl2/libsasldb.so.2.0.22 00002aaaab1c7000 16K rw--- /usr/lib64/sasl2/libsasldb.so.2.0.22 00002aaaab1cb000 16K r-x-- /usr/lib64/sasl2/libanonymous.so.2.0.22 00002aaaab1cf000 2044K ----- /usr/lib64/sasl2/libanonymous.so.2.0.22 00002aaaab3ce000 4K rw--- /usr/lib64/sasl2/libanonymous.so.2.0.22 00002aaaab3cf000 16K r-x-- /usr/lib64/sasl2/liblogin.so.2.0.22 00002aaaab3d3000 2044K ----- /usr/lib64/sasl2/liblogin.so.2.0.22 00002aaaab5d2000 4K rw--- /usr/lib64/sasl2/liblogin.so.2.0.22 00002aaaab5d3000 16K r-x-- /usr/lib64/sasl2/libplain.so.2.0.22 00002aaaab5d7000 2044K ----- /usr/lib64/sasl2/libplain.so.2.0.22 00002aaaab7d6000 4K rw--- /usr/lib64/sasl2/libplain.so.2.0.22 00002aaaac000000 65536K rw--- [ anon ] 00002aaab1801000 104K r-x-- /usr/lib64/qpid/client/sslconnector.so 00002aaab181b000 2044K ----- /usr/lib64/qpid/client/sslconnector.so 00002aaab1a1a000 8K rw--- /usr/lib64/qpid/client/sslconnector.so 00002aaab4000000 65536K rw--- [ anon ] 00002aaab8000000 65536K rw--- [ anon ] 00002aaac0000000 65536K rw--- [ anon ] 00002aaac4000000 65536K rw--- [ anon ] 00002aaacc000000 131072K rw--- [ anon ] 00002aaad4000000 65520K rw--- [ anon ] 00002aaad7ffc000 16K ----- [ anon ] 00002aaad8000000 65504K rw--- [ anon ] 00002aaadbff8000 32K ----- [ anon ] 00002aaadc000000 65536K rw--- [ anon ] 00002aaae0000000 65536K rw--- [ anon ] 00002aaae4000000 40772K rw--- [ anon ] 00002aaae67d1000 24764K ----- [ anon ] 00002aaae8000000 131072K rw--- [ anon ] 00002aaaf0000000 196608K rw--- [ anon ] 00002aaafc000000 196612K rw--- [ anon ] 00002aab0c000000 18192K rw--- [ anon ] 00002aab0d1c4000 47344K ----- [ anon ] 00002aab18000000 22800K rw--- [ anon ] 00002aab19644000 42736K ----- [ anon ] 00002aab1c000000 196612K rw--- [ anon ] 00002b772ae3f000 12K rw--- [ anon ] 00002b772ae54000 28K rw--- [ anon ] 00002b772ae5b000 40K r-x-- /usr/lib64/qpid/daemon/watchdog.so 00002b772ae65000 2048K ----- /usr/lib64/qpid/daemon/watchdog.so 00002b772b065000 4K rw--- /usr/lib64/qpid/daemon/watchdog.so 00002b772b066000 852K r-x-- /usr/lib64/qpid/daemon/cluster.so 00002b772b13b000 2048K ----- /usr/lib64/qpid/daemon/cluster.so 00002b772b33b000 36K rw--- /usr/lib64/qpid/daemon/cluster.so 00002b772b344000 4K rw--- [ anon ] 00002b772b358000 16K r-x-- /usr/lib64/openais/libcpg.so.2.0.0 00002b772b35c000 2044K ----- /usr/lib64/openais/libcpg.so.2.0.0 00002b772b55b000 4K rw--- /usr/lib64/openais/libcpg.so.2.0.0 00002b772b55c000 20K r-x-- /usr/lib64/libcman.so.2.0.115 00002b772b561000 2044K ----- /usr/lib64/libcman.so.2.0.115 00002b772b760000 4K rw--- /usr/lib64/libcman.so.2.0.115 00002b772b761000 852K r-x-- /usr/lib64/libqpidclient.so.5.0.0 00002b772b836000 2044K ----- /usr/lib64/libqpidclient.so.5.0.0 00002b772ba35000 28K rw--- /usr/lib64/libqpidclient.so.5.0.0 00002b772ba3c000 8K rw--- [ anon ] 00002b772ba3e000 104K r-x-- /usr/lib64/qpid/daemon/xml.so 00002b772ba58000 2048K ----- /usr/lib64/qpid/daemon/xml.so 00002b772bc58000 8K rw--- /usr/lib64/qpid/daemon/xml.so 00002b772bc6d000 3936K r-x-- /usr/lib64/libxerces-c.so.28.0 00002b772c045000 2044K ----- /usr/lib64/libxerces-c.so.28.0 00002b772c244000 272K rw--- /usr/lib64/libxerces-c.so.28.0 00002b772c288000 4K rw--- [ anon ] 00002b772c289000 3880K r-x-- /usr/lib64/libxqilla.so.3.0.0 00002b772c653000 2044K ----- /usr/lib64/libxqilla.so.3.0.0 00002b772c852000 412K rw--- /usr/lib64/libxqilla.so.3.0.0 00002b772c8b9000 4K rw--- [ anon ] 00002b772c8ba000 60K r-x-- /usr/lib64/qpid/daemon/replicating_listener.so 00002b772c8c9000 2044K ----- /usr/lib64/qpid/daemon/replicating_listener.so 00002b772cac8000 8K rw--- /usr/lib64/qpid/daemon/replicating_listener.so 00002b772caca000 48K r-x-- /usr/lib64/qpid/daemon/replication_exchange.so 00002b772cad6000 2048K ----- /usr/lib64/qpid/daemon/replication_exchange.so 00002b772ccd6000 4K rw--- /usr/lib64/qpid/daemon/replication_exchange.so 00002b772ccd7000 84K r-x-- /usr/lib64/qpid/daemon/ssl.so 00002b772ccec000 2044K ----- /usr/lib64/qpid/daemon/ssl.so 00002b772ceeb000 8K rw--- /usr/lib64/qpid/daemon/ssl.so 00002b772cf00000 164K r-x-- /usr/lib64/libsslcommon.so.5.0.0 00002b772cf29000 2044K ----- /usr/lib64/libsslcommon.so.5.0.0 00002b772d128000 8K rw--- /usr/lib64/libsslcommon.so.5.0.0 00002b772d12a000 80K r-x-- /usr/lib64/libz.so.1.2.3 00002b772d13e000 2044K ----- /usr/lib64/libz.so.1.2.3 00002b772d33d000 4K rw--- /usr/lib64/libz.so.1.2.3 00002b772d33e000 940K r-x-- /usr/lib64/qpid/daemon/msgstore.so 00002b772d429000 2048K ----- /usr/lib64/qpid/daemon/msgstore.so 00002b772d629000 24K rw--- /usr/lib64/qpid/daemon/msgstore.so 00002b772d62f000 4K rw--- [ anon ] 00002b772d643000 1060K r-x-- /usr/lib64/libdb_cxx-4.3.so 00002b772d74c000 2044K ----- /usr/lib64/libdb_cxx-4.3.so 00002b772d94b000 24K rw--- /usr/lib64/libdb_cxx-4.3.so 00002b772d951000 4K r-x-- /usr/lib64/libaio.so.1.0.1 00002b772d952000 2044K ----- /usr/lib64/libaio.so.1.0.1 00002b772db51000 4K rw--- /usr/lib64/libaio.so.1.0.1 00002b772db52000 216K r-x-- /usr/lib64/qpid/daemon/acl.so 00002b772db88000 2044K ----- /usr/lib64/qpid/daemon/acl.so 00002b772dd87000 12K rw--- /usr/lib64/qpid/daemon/acl.so 00007fffa4540000 528K rw--- [ stack ] 00007fffa45fc000 16K r-x-- [ anon ] ffffffffff600000 8192K ----- [ anon ] total 2635896K
(In reply to comment #7) > Created attachment 502255 [details] > How to run qpidd with cman for quorum > > I'm wondering if the virtual cluster is suffering a virtual network partition & > split-brain. I have seen this in another site running on VMs (bug 684783) > Can you try running the test on a 3 node cluster (or 5 node, has to be odd > though) as described in the attached PDF? If this is split brain you will see > failures but they should be due to loss of quorum. In latest scenario: load is maintained to be =< 4.00, OOM haven't been observed for more than 30 hours. I'll retest with CMAN
any further developments?
no OOM observed with CMAN
By documentation CMAN is not mandatory to have when running cluster. Also the fact that when all connections are closed the memory consumption steady high for hours no matter that queues are empty. I believe this issue should be fixed.
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.