Bug 703590 - OOM on clustered qpidd node leads to cluster abort
Summary: OOM on clustered qpidd node leads to cluster abort
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-10 18:23 UTC by ppecka
Modified: 2025-02-10 03:13 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 726379 (view as bug list)
Environment:
Last Closed: 2025-02-10 03:13:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
How to run qpidd with cman for quorum (13.53 KB, application/pdf)
2011-06-01 12:55 UTC, Alan Conway
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 723809 0 medium CLOSED Possible memory leak in qpid-cpp-server 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 726379 0 urgent CLOSED Qpidd possible memory leaks 2021-02-22 00:41:40 UTC

Internal Links: 723809 726379

Description ppecka 2011-05-10 18:23:42 UTC
Description of problem:
Our environment consists of 4 VM's , (RHEL5, 3cores, 2048MB each). When brokers are heavily stressed with multiple perftests and one of the node rejoins cluster often (see "Steps to Reproduce" step 3. ). One of the nodes which permanently remains in the cluster will grow in it's memory consumption.
After few hours, one node gets OOM and this consequently leads cluster into not being responsive and qpidd abort.


Node with many restarts got problems joining cluster after a while

2011-05-09 14:10:50 notice Initializing CPG
2011-05-09 14:10:51 notice Cluster store state: dirty cluster-id=f3d3c02e-3859-43c6-90ff-3bfa3a39c380
2011-05-09 14:10:51 info Retrying cpg_join
2011-05-09 14:10:51 info Retrying cpg_join
2011-05-09 14:10:51 info Retrying cpg_join
2011-05-09 14:10:51 info Retrying cpg_join
2011-05-09 14:10:51 info Retrying cpg_join
2011-05-09 14:10:51 debug Shutting down CPG
2011-05-09 14:10:51 critical Unexpected error: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6)
2011-05-09 14:10:52 critical Unexpected error: Daemon startup failed: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6)



I'm unsure whether this is relevant, to this issue or not but this was seen in logs from OOM node:

2011-05-09 14:10:43 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 
2011-05-09 14:10:50 error cluster(192.168.5.4:12068 READY/error) aborting connection 192.168.5.2:11974-469: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) apply config change to error 626452337: 192.168.5.3:11788 192.168.5.4:12068 
2011-05-09 14:10:59 error cluster(192.168.5.4:12068 READY/error) aborting connection 192.168.5.2:11974-469: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 
2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) apply config change to error 626452337: 192.168.5.1:17032 192.168.5.3:11788 192.168.5.4:12068 
2011-05-09 14:10:59 debug cluster(192.168.5.4:12068 READY/error) error 626452337 must be resolved with 192.168.5.4:12068 





Version-Release number of selected component (if applicable):
# rpm -qa | grep qpid | sort -u
python-qpid-0.10-1.el5
python-qpid-qmf-0.10-6.el5
qpid-cpp-client-0.10-6.el5
qpid-cpp-client-devel-0.10-6.el5
qpid-cpp-client-devel-docs-0.10-6.el5
qpid-cpp-client-ssl-0.10-6.el5
qpid-cpp-mrg-debuginfo-0.10-6.el5
qpid-cpp-server-0.10-6.el5
qpid-cpp-server-cluster-0.10-6.el5
qpid-cpp-server-ssl-0.10-6.el5
qpid-cpp-server-store-0.10-6.el5
qpid-cpp-server-xml-0.10-6.el5
qpid-java-client-0.10-4.el5
qpid-java-common-0.10-4.el5
qpid-java-example-0.10-4.el5
qpid-qmf-0.10-2.el5
qpid-qmf-0.10-6.el5
qpid-qmf-devel-0.10-2.el5
qpid-qmf-devel-0.10-6.el5
qpid-tools-0.10-4.el5
rh-qpid-cpp-tests-0.10-6.el5


How reproducible:
it takes few hours to get there





Steps to Reproduce:
setup cluster of 4 VM's, on each 

echo ' 
cluster-mechanism=ANONYMOUS
auth=yes
cluster-name=mrg-qe-16-virtual-cluster
log-to-file=/tmp/qpidd.log
log-enable=info+
log-enable=debug+:cluster
' >/etc/qpidd.conf 



service openais start
service qpidd start


1. ON EVERY VM
while true; do qpid-perftest \
--broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) -s \
--base-name "$(uname -n)_001" \
--password guest --username guest;\
sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log


2. ON EVERY VM
while true; do qpid-perftest \
--broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) -s \
--base-name "$(uname -n)_002" \
--password guest --username guest;\
sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log



=========
3. for ONE NODE ONLY run ( this will start && stop qpid service randomly )
=========
while true; do date +%Y%m%d_%H%M; service qpidd stop ; TIMEDOWN=$(( ${RANDOM} % 20 )); echo "TIMEDOWN=${TIMEDOWN}"; sleep ${TIMEDOWN}; service qpidd start; TIMEUP=$(( ${RANDOM} % 600 )); echo "TIMEUP=${TIMEUP}"; sleep ${TIMEUP}; done | tee service_restart_$(date +%Y%m%d_%H%M).log
  


Actual results:
One of the qpidd nodes gets OOM, cluster is not responsive, all qpidd's aborts cluster


Expected results:
cluster is responsive all the time


Additional info:
VM's network interface for openais is virtio device
Average RX/TX on network interfaces dedicated for openais was 5.3 / 2.9 TiB

Comment 2 Alan Conway 2011-05-19 22:20:59 UTC
Trying to reproduce this I've seen openais hang and crash. I've raised bug 706278.
Some suspicion that this may be related to the fact that these are VMs.

Are rhel5 guests on rhel6.1 (beta?) a supported configuration?

Comment 3 ppecka 2011-05-23 09:47:50 UTC
VM's should be rhel5.6 on rhel 5.6 host

Comment 4 Alan Conway 2011-05-26 16:17:34 UTC
I've been testing qpid-cpp-server-0.10-7.el5 on the same boxes for about 11 hours with no error.

Comment 5 Alan Conway 2011-05-30 16:57:57 UTC
I have been running the reproducer qpid-cpp-server-0.10-7.el5 on the same boxes  since Wed May 25 13:04, which is 5 days now. The brokers are all running and responding and the clients are running without error. I think this issue is fixed in the 10-7 packages.

Comment 6 ppecka 2011-05-31 11:30:03 UTC
I was running from yesterday and my latest observation shows:
that issue should be broken into two parts:

1.1) Sometimes porocess qpid-perftest responsible for OOM. This might be because we are running two instances of qpid-perftest from each node. Both with default values message count(500000) and byte size(1024). This means in worst case there are 8 perftests running against one cluster-member. VM's are provided with only 2GB memory. I'm going to retest with msg size 128. But i have no clue what makes perftest so memory intensive. (in such case it eats ~ 10MB/s)


there is "handmade" qpid-perftest core file @10.34.37.125:/root/fcluster/core.4518


1.2) then consequently qpidd on such node aborts cluster. on next attempt to rejoin cluster we see from logs:

2011-05-30 08:07:40 notice Cluster store state: dirty cluster-id=71798f90-8219-4bf2-941d-c9fbaae48233
2011-05-30 08:07:40 info Retrying cpg_join
2011-05-30 08:07:40 info Retrying cpg_join
2011-05-30 08:07:40 info Retrying cpg_join
2011-05-30 08:07:40 info Retrying cpg_join
2011-05-30 08:07:40 info Retrying cpg_join
2011-05-30 08:07:40 debug Shutting down CPG
2011-05-30 08:07:41 critical Unexpected error: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6)
2011-05-30 08:07:41 critical Unexpected error: Daemon startup failed: Cannot join CPG group mrg-qe-16-virtual-cluster: try again (6)

Because one of four qpidd nodes is randomly toggled - it fails to start and join cluster with the same results as previously described node.

The other two remaining nodes are then unresponsive. at this time they need to deal with load done by perftests which should be spread among four nodes
handmade core files of surviving node and aisexec @10.34.37.128
/root/fcluster/core.4153
/root/fcluster/core.4170








2.0 )
We lowered message size to 128 B ( to avoid situation described by bullet 1.1)

A = 10.34.37.124
B = 10.34.37.127
C = 10.34.37.128
D = 10.34.37.125

ALL NODES A,B,C,D: were running
while true; do qpid-perftest --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) \
 -s --base-name "$(uname -n)_001" --password guest --username guest \
 --size 128;sleep $(( ${RANDOM} % 3 )); done | tee ./random1_perftest.log

while true; do qpid-perftest --broker 192.168.5.$(( 1 + ${RANDOM} % 4 )) \
 -s --base-name "$(uname -n)_002" --password guest --username guest \
 --size 128;sleep $(( ${RANDOM} % 3 )); done | tee ./random2_perftest.log


ONLY NODE A:
while true; do date +%Y%m%d_%H%M; service qpidd stop ; \
 TIMEDOWN=$(( ${RANDOM} % 20 )); echo "TIMEDOWN=${TIMEDOWN}"; sleep ${TIMEDOWN}; service qpidd start; TIMEUP=$(( ${RANDOM} % 600 )); echo "TIMEUP=${TIMEUP}"; sleep ${TIMEUP}; done | tee service_restart_$(date +%Y%m%d_%H%M).log

ONLY NODES B,C,D:
while true; do qpid-cluster; sleep $(( ${RANDOM} % 7 )); \
done | tee ./log_qpid_cluster.log


2.1) possible OOM observed on node B
After less than 24 hours, we found that (excluded node A - which is forced to leave and reconnect with cluster)
one node running on 10.34.37.127 (B) is close to OOM - handmade core file dumped
B @10.34.37.127:/root/core.5046
remaining nodes also dumped by gdb
D @10.34.37.125:/root/core.26929
C @10.34.37.128:/root/core.5204






NODE B) /tmp/qpidd.log
...
011-05-31 05:29:36 error Execution exception: not-found: Unknown destination rdest (qp
id/broker/SemanticState.cpp:564)
2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) channel error 396276176
4 on 127.0.0.1:5672-127.0.0.1:51081(192.168.5.4:5171-9873 shadow) must be resolved with: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 : not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564)
2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:29:36 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:30:22 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.1:5542 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:30:22 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
...
2011-05-31 05:31:30 warning QueueCleaner task overran 1 times by 600ms (taking 829221000ns) on average.
2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.1:5731 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:24 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) apply config change to error 3962761764: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 resolved with 192.168.5.2:5046
2011-05-31 05:40:25 debug cluster(192.168.5.2:5046 READY/error) error 3962761764 must be resolved with 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:40:25 critical cluster(192.168.5.2:5046 READY/error) local error 3962761764 did not occur on member 192.168.5.3:5204: not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564)
2011-05-31 05:40:25 critical Error delivering frames: local error did not occur on all cluster members : not-found: Unknown destination rdest (qpid/broker/SemanticState.cpp:564) (qpid/cluster/ErrorCheck.cpp:89)
2011-05-31 05:40:25 notice cluster(192.168.5.2:5046 LEFT/error) leaving cluster mrg-qe-16-virtual-cluster
2011-05-31 05:40:28 debug Shutting down CPG
2011-05-31 05:40:36 notice Shut down





node A) /tmp/qpidd.log

2011-05-31 05:27:24 notice Store directory /var/lib/qpidd/rhm was pushed down (saved) into directory /var/lib/qpidd/_cluster.bak.00e4.
2011-05-31 05:27:24 notice Journal "TplStore": Created
2011-05-31 05:27:25 info SASL enabled
2011-05-31 05:27:25 notice Listening on TCP port 5672
2011-05-31 05:27:25 info SSL plugin not enabled, you must set --ssl-cert-db to enable it.
2011-05-31 05:27:25 info Policy file not specified. ACL Disabled, no ACL checking being done!
2011-05-31 05:27:25 debug cluster(192.168.5.1:5731 INIT) initial status map complete. 
2011-05-31 05:27:25 debug cluster(192.168.5.1:5731 INIT) elders: 192.168.5.2:5046 192.168.5.3:5204 192.168.5.4:5171 
2011-05-31 05:27:25 info cluster(192.168.5.1:5731 INIT) not active for links.
2011-05-31 05:27:25 notice cluster(192.168.5.1:5731 INIT) cluster-uuid = 71798f90-8219-4bf2-941d-c9fbaae48233
2011-05-31 05:27:25 notice cluster(192.168.5.1:5731 JOINER) joining cluster mrg-qe-16-virtual-cluster
2011-05-31 05:27:25 debug Shutting down CPG
2011-05-31 05:27:25 notice Shut down
2011-05-31 05:27:25 critical Unexpected error: Error writing to parent.






node C) /tmp/qpidd.log
2011-05-31 05:06:57 warning JournalInactive:TplStore task late and overran 2 times: lat
e 2ms, overran 2ms (taking 10000ns) on average.
2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:09 warning JournalInactive:TplStore task late and overran 2 times: lat
e 2ms, overran 2ms (taking 14000ns) on average.
2011-05-31 05:07:09 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:10 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:11 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)
2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:12 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:13 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:19 error cluster(192.168.5.3:5204 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:29 warning JournalInactive:TplStore task late and overran 2 times: lat
e 2ms, overran 2ms (taking 11500ns) on average.

...

2011-05-31 05:29:36 debug cluster(192.168.5.3:5204 READY) error 3962761764 did not occur locally.
2011-05-31 05:29:37 warning JournalInactive:TplStore task late and overran 3 times: late 2ms, overran 2ms (taking 9666ns) on average.






node D) /tmp/qpidd.log

2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:09 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:10 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:11 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Frame size too small 0 (qpid/framing/AMQFrame.cpp:101)
2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:12 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)
2011-05-31 05:07:13 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5
.2:5046-8732: framing-error: Framing version unsupported (qpid/framing/AMQFrame.cpp:93)
2011-05-31 05:07:19 error cluster(192.168.5.4:5171 READY) aborting connection 192.168.5.2:5046-8732: framing-error: Reserved bits not zero (qpid/framing/AMQFrame.cpp:112)

Comment 7 Alan Conway 2011-06-01 12:55:28 UTC
Created attachment 502255 [details]
How to run qpidd with cman for quorum

I'm wondering if the virtual cluster is suffering a virtual network partition & split-brain. I have seen this in another site running on VMs (bug 684783)
Can you try running the test on a 3 node cluster (or 5 node, has to be odd though) as described in the attached PDF? If this is split brain you will see failures but they should be due to loss of quorum.

Comment 8 ppecka 2011-06-01 16:59:30 UTC
(In reply to comment #7)
> Created attachment 502255 [details]
> How to run qpidd with cman for quorum
> 
> I'm wondering if the virtual cluster is suffering a virtual network partition &
> split-brain. I have seen this in another site running on VMs (bug 684783)
> Can you try running the test on a 3 node cluster (or 5 node, has to be odd
> though) as described in the attached PDF? If this is split brain you will see
> failures but they should be due to loss of quorum.


I made a pmap of leaked qpidd before it was killed, which shows that qpidd is using much more resources than other nodes. Location of all interesting core files can be found in comment #6. Since yesterday we are running "soft" version of our test, we are not running qpid-perftest when load on VM is over 4.00. But still here are two questions opened
1) why qpid-perftest leaks (if it's not caused by high IO)
2) what makes qpidd to leak 

I'll recheck with CMAN, on this cluster when our new "soft" test will be running for over 36 hours. two days from now


5046:   /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon
0000000000400000     64K r-x--  /usr/sbin/qpidd
000000000060f000     80K rw---  /usr/sbin/qpidd
000000000e831000 776136K rw---    [ anon ]
000000004087b000      4K -----    [ anon ]
000000004087c000  10240K rw---    [ anon ]
000000004189f000      4K -----    [ anon ]
00000000418a0000  10240K rw---    [ anon ]
00000000422a0000      4K -----    [ anon ]
00000000422a1000  10240K rw---    [ anon ]
0000000042ca1000      4K -----    [ anon ]
0000000042ca2000  10240K rw---    [ anon ]
00000000436a2000      4K -----    [ anon ]
00000000436a3000  10240K rw---    [ anon ]
00000000440a3000      4K -----    [ anon ]
00000000440a4000  10240K rw---    [ anon ]
0000000044aa4000      4K -----    [ anon ]
0000000044aa5000  10240K rw---    [ anon ]
00000000454a5000      4K -----    [ anon ]
00000000454a6000  10240K rw---    [ anon ]
0000000045ea6000      4K -----    [ anon ]
0000000045ea7000  10240K rw---    [ anon ]
0000003a43600000    100K r-x--  /usr/lib64/libqpidtypes.so.1.2.0
0000003a43619000   2044K -----  /usr/lib64/libqpidtypes.so.1.2.0
0000003a43818000      4K rw---  /usr/lib64/libqpidtypes.so.1.2.0
0000003a43a00000   2520K r-x--  /usr/lib64/libqpidcommon.so.5.0.0
0000003a43c76000   2044K -----  /usr/lib64/libqpidcommon.so.5.0.0
0000003a43e75000    124K rw---  /usr/lib64/libqpidcommon.so.5.0.0
0000003a43e94000      4K rw---    [ anon ]
0000003a44000000   2908K r-x--  /usr/lib64/libqpidbroker.so.5.0.0
0000003a442d7000   2044K -----  /usr/lib64/libqpidbroker.so.5.0.0
0000003a444d6000     92K rw---  /usr/lib64/libqpidbroker.so.5.0.0
0000003a444ed000     12K rw---    [ anon ]
0000003a66400000    112K r-x--  /lib64/ld-2.5.so
0000003a6661b000      4K r----  /lib64/ld-2.5.so
0000003a6661c000      4K rw---  /lib64/ld-2.5.so
0000003a66800000   1336K r-x--  /lib64/libc-2.5.so
0000003a6694e000   2048K -----  /lib64/libc-2.5.so
0000003a66b4e000     16K r----  /lib64/libc-2.5.so
0000003a66b52000      4K rw---  /lib64/libc-2.5.so
0000003a66b53000     20K rw---    [ anon ]
0000003a66c00000      8K r-x--  /lib64/libdl-2.5.so
0000003a66c02000   2048K -----  /lib64/libdl-2.5.so
0000003a66e02000      4K r----  /lib64/libdl-2.5.so
0000003a66e03000      4K rw---  /lib64/libdl-2.5.so
0000003a67000000     88K r-x--  /lib64/libpthread-2.5.so
0000003a67016000   2044K -----  /lib64/libpthread-2.5.so
0000003a67215000      4K r----  /lib64/libpthread-2.5.so
0000003a67216000      4K rw---  /lib64/libpthread-2.5.so
0000003a67217000     16K rw---    [ anon ]
0000003a67400000    520K r-x--  /lib64/libm-2.5.so
0000003a67482000   2044K -----  /lib64/libm-2.5.so
0000003a67681000      4K r----  /lib64/libm-2.5.so
0000003a67682000      4K rw---  /lib64/libm-2.5.so
0000003a67800000    216K r-x--  /usr/lib64/libboost_program_options.so.1.33.1
0000003a67836000   2044K -----  /usr/lib64/libboost_program_options.so.1.33.1
0000003a67a35000      8K rw---  /usr/lib64/libboost_program_options.so.1.33.1
0000003a68400000     28K r-x--  /lib64/librt-2.5.so
0000003a68407000   2048K -----  /lib64/librt-2.5.so
0000003a68607000      4K r----  /lib64/librt-2.5.so
0000003a68608000      4K rw---  /lib64/librt-2.5.so
0000003a68800000     64K r-x--  /usr/lib64/libboost_filesystem.so.1.33.1
0000003a68810000   2044K -----  /usr/lib64/libboost_filesystem.so.1.33.1
0000003a68a0f000      4K rw---  /usr/lib64/libboost_filesystem.so.1.33.1
0000003a69400000     68K r-x--  /lib64/libresolv-2.5.so
0000003a69411000   2048K -----  /lib64/libresolv-2.5.so
0000003a69611000      4K r----  /lib64/libresolv-2.5.so
0000003a69612000      4K rw---  /lib64/libresolv-2.5.so
0000003a69613000      8K rw---    [ anon ]
0000003a6c800000     52K r-x--  /lib64/libgcc_s-4.1.2-20080825.so.1
0000003a6c80d000   2048K -----  /lib64/libgcc_s-4.1.2-20080825.so.1
0000003a6ca0d000      4K rw---  /lib64/libgcc_s-4.1.2-20080825.so.1
0000003a6f800000    920K r-x--  /usr/lib64/libstdc++.so.6.0.8
0000003a6f8e6000   2044K -----  /usr/lib64/libstdc++.so.6.0.8
0000003a6fae5000     24K r----  /usr/lib64/libstdc++.so.6.0.8
0000003a6faeb000     12K rw---  /usr/lib64/libstdc++.so.6.0.8
0000003a6faee000     72K rw---    [ anon ]
0000003a76200000     36K r-x--  /lib64/libcrypt-2.5.so
0000003a76209000   2044K -----  /lib64/libcrypt-2.5.so
0000003a76408000      4K r----  /lib64/libcrypt-2.5.so
0000003a76409000      4K rw---  /lib64/libcrypt-2.5.so
0000003a7640a000    184K rw---    [ anon ]
0000003a79200000     12K r-x--  /usr/lib64/libplds4.so
0000003a79203000   2044K -----  /usr/lib64/libplds4.so
0000003a79402000      4K rw---  /usr/lib64/libplds4.so
0000003a79600000     16K r-x--  /usr/lib64/libplc4.so
0000003a79604000   2044K -----  /usr/lib64/libplc4.so
0000003a79803000      4K rw---  /usr/lib64/libplc4.so
0000003a79e00000    100K r-x--  /usr/lib64/libnssutil3.so
0000003a79e19000   2044K -----  /usr/lib64/libnssutil3.so
0000003a7a018000     24K rw---  /usr/lib64/libnssutil3.so
0000003a7a200000   1176K r-x--  /usr/lib64/libnss3.so
0000003a7a326000   2044K -----  /usr/lib64/libnss3.so
0000003a7a525000     28K rw---  /usr/lib64/libnss3.so
0000003a7a52c000      4K rw---    [ anon ]
0000003a7a600000    216K r-x--  /usr/lib64/libnspr4.so
0000003a7a636000   2048K -----  /usr/lib64/libnspr4.so
0000003a7a836000      8K rw---  /usr/lib64/libnspr4.so
0000003a7a838000     12K rw---    [ anon ]
0000003a7ae00000     12K r-x--  /lib64/libuuid.so.1.2
0000003a7ae03000   2048K -----  /lib64/libuuid.so.1.2
0000003a7b003000      4K rw---  /lib64/libuuid.so.1.2
0000003a7b600000    204K r-x--  /usr/lib64/libssl3.so
0000003a7b633000   2048K -----  /usr/lib64/libssl3.so
0000003a7b833000     12K rw---  /usr/lib64/libssl3.so
0000003a7c600000     96K r-x--  /usr/lib64/libsasl2.so.2.0.22
0000003a7c618000   2048K -----  /usr/lib64/libsasl2.so.2.0.22
0000003a7c818000      4K rw---  /usr/lib64/libsasl2.so.2.0.22
00002aaaaaaab000    196K rw---    [ anon ]
00002aaaaaadc000     24K rw-s-  /var/lib/qpidd/rhm/dat/__db.001
00002aaaaaae2000    272K rw-s-  /var/lib/qpidd/rhm/dat/__db.002
00002aaaaab26000    288K rw-s-  /var/lib/qpidd/rhm/dat/__db.003
00002aaaaab6e000    648K rw-s-  /var/lib/qpidd/rhm/dat/__db.004
00002aaaaac10000     16K rw-s-  /var/lib/qpidd/rhm/dat/__db.005
00002aaaaac14000   2932K rw-s-    [ shmid=0x2a0005 ]
00002aaaaaef1000    860K r-x--  /usr/lib64/sasl2/libsasldb.so.2.0.22
00002aaaaafc8000   2044K -----  /usr/lib64/sasl2/libsasldb.so.2.0.22
00002aaaab1c7000     16K rw---  /usr/lib64/sasl2/libsasldb.so.2.0.22
00002aaaab1cb000     16K r-x--  /usr/lib64/sasl2/libanonymous.so.2.0.22
00002aaaab1cf000   2044K -----  /usr/lib64/sasl2/libanonymous.so.2.0.22
00002aaaab3ce000      4K rw---  /usr/lib64/sasl2/libanonymous.so.2.0.22
00002aaaab3cf000     16K r-x--  /usr/lib64/sasl2/liblogin.so.2.0.22
00002aaaab3d3000   2044K -----  /usr/lib64/sasl2/liblogin.so.2.0.22
00002aaaab5d2000      4K rw---  /usr/lib64/sasl2/liblogin.so.2.0.22
00002aaaab5d3000     16K r-x--  /usr/lib64/sasl2/libplain.so.2.0.22
00002aaaab5d7000   2044K -----  /usr/lib64/sasl2/libplain.so.2.0.22
00002aaaab7d6000      4K rw---  /usr/lib64/sasl2/libplain.so.2.0.22
00002aaaac000000  65536K rw---    [ anon ]
00002aaab1801000    104K r-x--  /usr/lib64/qpid/client/sslconnector.so
00002aaab181b000   2044K -----  /usr/lib64/qpid/client/sslconnector.so
00002aaab1a1a000      8K rw---  /usr/lib64/qpid/client/sslconnector.so
00002aaab4000000  65536K rw---    [ anon ]
00002aaab8000000  65536K rw---    [ anon ]
00002aaac0000000  65536K rw---    [ anon ]
00002aaac4000000  65536K rw---    [ anon ]
00002aaacc000000 131072K rw---    [ anon ]
00002aaad4000000  65520K rw---    [ anon ]
00002aaad7ffc000     16K -----    [ anon ]
00002aaad8000000  65504K rw---    [ anon ]
00002aaadbff8000     32K -----    [ anon ]
00002aaadc000000  65536K rw---    [ anon ]
00002aaae0000000  65536K rw---    [ anon ]
00002aaae4000000  40772K rw---    [ anon ]
00002aaae67d1000  24764K -----    [ anon ]
00002aaae8000000 131072K rw---    [ anon ]
00002aaaf0000000 196608K rw---    [ anon ]
00002aaafc000000 196612K rw---    [ anon ]
00002aab0c000000  18192K rw---    [ anon ]
00002aab0d1c4000  47344K -----    [ anon ]
00002aab18000000  22800K rw---    [ anon ]
00002aab19644000  42736K -----    [ anon ]
00002aab1c000000 196612K rw---    [ anon ]
00002b772ae3f000     12K rw---    [ anon ]
00002b772ae54000     28K rw---    [ anon ]
00002b772ae5b000     40K r-x--  /usr/lib64/qpid/daemon/watchdog.so
00002b772ae65000   2048K -----  /usr/lib64/qpid/daemon/watchdog.so
00002b772b065000      4K rw---  /usr/lib64/qpid/daemon/watchdog.so
00002b772b066000    852K r-x--  /usr/lib64/qpid/daemon/cluster.so
00002b772b13b000   2048K -----  /usr/lib64/qpid/daemon/cluster.so
00002b772b33b000     36K rw---  /usr/lib64/qpid/daemon/cluster.so
00002b772b344000      4K rw---    [ anon ]
00002b772b358000     16K r-x--  /usr/lib64/openais/libcpg.so.2.0.0
00002b772b35c000   2044K -----  /usr/lib64/openais/libcpg.so.2.0.0
00002b772b55b000      4K rw---  /usr/lib64/openais/libcpg.so.2.0.0
00002b772b55c000     20K r-x--  /usr/lib64/libcman.so.2.0.115
00002b772b561000   2044K -----  /usr/lib64/libcman.so.2.0.115
00002b772b760000      4K rw---  /usr/lib64/libcman.so.2.0.115
00002b772b761000    852K r-x--  /usr/lib64/libqpidclient.so.5.0.0
00002b772b836000   2044K -----  /usr/lib64/libqpidclient.so.5.0.0
00002b772ba35000     28K rw---  /usr/lib64/libqpidclient.so.5.0.0
00002b772ba3c000      8K rw---    [ anon ]
00002b772ba3e000    104K r-x--  /usr/lib64/qpid/daemon/xml.so
00002b772ba58000   2048K -----  /usr/lib64/qpid/daemon/xml.so
00002b772bc58000      8K rw---  /usr/lib64/qpid/daemon/xml.so
00002b772bc6d000   3936K r-x--  /usr/lib64/libxerces-c.so.28.0
00002b772c045000   2044K -----  /usr/lib64/libxerces-c.so.28.0
00002b772c244000    272K rw---  /usr/lib64/libxerces-c.so.28.0
00002b772c288000      4K rw---    [ anon ]
00002b772c289000   3880K r-x--  /usr/lib64/libxqilla.so.3.0.0
00002b772c653000   2044K -----  /usr/lib64/libxqilla.so.3.0.0
00002b772c852000    412K rw---  /usr/lib64/libxqilla.so.3.0.0
00002b772c8b9000      4K rw---    [ anon ]
00002b772c8ba000     60K r-x--  /usr/lib64/qpid/daemon/replicating_listener.so
00002b772c8c9000   2044K -----  /usr/lib64/qpid/daemon/replicating_listener.so
00002b772cac8000      8K rw---  /usr/lib64/qpid/daemon/replicating_listener.so
00002b772caca000     48K r-x--  /usr/lib64/qpid/daemon/replication_exchange.so
00002b772cad6000   2048K -----  /usr/lib64/qpid/daemon/replication_exchange.so
00002b772ccd6000      4K rw---  /usr/lib64/qpid/daemon/replication_exchange.so
00002b772ccd7000     84K r-x--  /usr/lib64/qpid/daemon/ssl.so
00002b772ccec000   2044K -----  /usr/lib64/qpid/daemon/ssl.so
00002b772ceeb000      8K rw---  /usr/lib64/qpid/daemon/ssl.so
00002b772cf00000    164K r-x--  /usr/lib64/libsslcommon.so.5.0.0
00002b772cf29000   2044K -----  /usr/lib64/libsslcommon.so.5.0.0
00002b772d128000      8K rw---  /usr/lib64/libsslcommon.so.5.0.0
00002b772d12a000     80K r-x--  /usr/lib64/libz.so.1.2.3
00002b772d13e000   2044K -----  /usr/lib64/libz.so.1.2.3
00002b772d33d000      4K rw---  /usr/lib64/libz.so.1.2.3
00002b772d33e000    940K r-x--  /usr/lib64/qpid/daemon/msgstore.so
00002b772d429000   2048K -----  /usr/lib64/qpid/daemon/msgstore.so
00002b772d629000     24K rw---  /usr/lib64/qpid/daemon/msgstore.so
00002b772d62f000      4K rw---    [ anon ]
00002b772d643000   1060K r-x--  /usr/lib64/libdb_cxx-4.3.so
00002b772d74c000   2044K -----  /usr/lib64/libdb_cxx-4.3.so
00002b772d94b000     24K rw---  /usr/lib64/libdb_cxx-4.3.so
00002b772d951000      4K r-x--  /usr/lib64/libaio.so.1.0.1
00002b772d952000   2044K -----  /usr/lib64/libaio.so.1.0.1
00002b772db51000      4K rw---  /usr/lib64/libaio.so.1.0.1
00002b772db52000    216K r-x--  /usr/lib64/qpid/daemon/acl.so
00002b772db88000   2044K -----  /usr/lib64/qpid/daemon/acl.so
00002b772dd87000     12K rw---  /usr/lib64/qpid/daemon/acl.so
00007fffa4540000    528K rw---    [ stack ]
00007fffa45fc000     16K r-x--    [ anon ]
ffffffffff600000   8192K -----    [ anon ]
 total          2635896K

Comment 9 ppecka 2011-06-02 14:36:47 UTC
(In reply to comment #7)
> Created attachment 502255 [details]
> How to run qpidd with cman for quorum
> 
> I'm wondering if the virtual cluster is suffering a virtual network partition &
> split-brain. I have seen this in another site running on VMs (bug 684783)
> Can you try running the test on a 3 node cluster (or 5 node, has to be odd
> though) as described in the attached PDF? If this is split brain you will see
> failures but they should be due to loss of quorum.

In latest scenario: load is maintained to be =< 4.00, OOM haven't been observed for more than 30 hours. I'll retest with CMAN

Comment 10 Alan Conway 2011-06-22 19:46:00 UTC
any further developments?

Comment 11 ppecka 2011-06-29 14:45:04 UTC
no OOM observed with CMAN

Comment 12 ppecka 2011-07-21 16:54:20 UTC
By documentation CMAN is not mandatory to have when running cluster. Also the fact that when all connections are closed the memory consumption steady high for hours no matter that queues are empty. I believe this issue should be fixed.

Comment 13 Red Hat Bugzilla 2025-02-10 03:13:41 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.


Note You need to log in before you can comment on or make changes to this bug.