Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1151269

Summary: Qpid-ha broker memory bloat due to leak of memory with RDMA.
Product: Red Hat Enterprise MRG Reporter: euroford <an.euroford>
Component: qpid-cppAssignee: messaging-bugs <messaging-bugs>
Status: CLOSED UPSTREAM QA Contact: Messaging QE <messaging-qe-bugs>
Severity: urgent Docs Contact:
Priority: medium    
Version: 3.0CC: an.euroford, astitcher, jross, yanzhenjiao
Target Milestone: ---Keywords: Reproducer
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-10 03:43:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster.conf
none
qpidd.conf
none
valgrind log none

Description euroford 2014-10-10 01:57:43 UTC
Description of problem:
Broker memory bloat and crash at last.

Version-Release number of selected component (if applicable):
qpid-ha-0.22-49.el6
kernel-rt-3.10.33-rt32.51.el6rt

How reproducible:


Steps to Reproduce:
1.qpid-receive -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}' --print-content no  -f
2.qpid-send -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --messages 1000 --content-size 5120000 --send-rate 300 --report-total --report-header=no --timestamp=yes --sequence=no --durable False --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}'
3.wait until the broker memory bloat.

Actual results:
broker memory bloat and crash at last.

Expected results:
keep working.

Additional info:

Comment 1 Justin Ross 2014-10-10 11:08:37 UTC
Alan, please assess.

Comment 2 euroford 2014-10-10 14:36:02 UTC
In FDR IB env, not more than 10 minutes, qpid-ha will crash at 5MB messages@300Hz sending speed.

More bigger the message, more faster the qpid-ha will crash.

My server has 64MB RAM, I think qpid-ha will crash faster in less RAM servers.

Comment 3 euroford 2014-10-10 14:46:17 UTC
Created attachment 945661 [details]
cluster.conf

Comment 4 euroford 2014-10-10 14:47:31 UTC
Created attachment 945663 [details]
qpidd.conf

Comment 5 Alan Conway 2014-10-21 14:44:15 UTC
I tried to reproduce this with TCP. Running overnight I saw memory grow from 350M -> 650M but then appears to stabilize at that point. This is nothing like to the growth reported above. It seems likely that RDMA is likely a factor, further investigation with RDMA required.

Comment 6 euroford 2014-10-27 14:03:37 UTC
when qpid-ha works in IB env, it receives messages in RDMA mode, and relay these messages to replicate server in unbalance way without any flow control, it maybe the reason I think.

Comment 7 Andrew Stitcher 2014-10-27 14:46:04 UTC
Perhaps related is this long standing RDMA federation problem:
Bug 468932.

Comment 8 Alan Conway 2014-10-27 16:08:02 UTC
HA does use federation links but Bug 468932 says the leak is per link. euroford, does your test involve a lot of broker failures/disconnects? There would be a new link for each reconnect. From the description it sounds like you are just running sender/receiver against a static cluster with no failures, in which case bug a leak per link wouldn't explain the bloat. Even so there may still be a relationship between the bugs, but it would require another way of triggering the leak, or something else is happening that is causing a lot of reconnects.

Comment 9 euroford 2014-10-31 03:42:17 UTC
Hi Alan,
Yes, I just run sender/receiver against a two nodes qpid-ha cluster, all the method and configuration files were posted here.
Could you reproduce this bug in IB env?

Comment 10 Alan Conway 2014-10-31 14:44:13 UTC
OK, that's what I thought. So this does not on the surface look like it is directly caused by Bug 468932, but there may be some connection. I haven't had a chance to investigate on IB yet, I will update this bug as soon as I do.

Comment 11 euroford 2014-12-01 09:32:19 UTC
Created attachment 963194 [details]
valgrind log

I'm sure this bug is RDMA related.

Comment 12 euroford 2014-12-01 09:42:31 UTC
My hardware ENV is IBM flex system x240 + mellanox connectx-3 56Gbps dual port IB card, and use libibverbs-rocee-1.1.7-1.1.el6_5 + libmlx4-rocee-1.0.5-1.1.el6_5 in MRG repo.

Comment 13 Red Hat Bugzilla 2025-02-10 03:43:16 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.