Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1151269

Summary:

Qpid-ha broker memory bloat due to leak of memory with RDMA.

Product:

Red Hat Enterprise MRG

Reporter:

euroford <an.euroford>

Component:

qpid-cpp

Assignee:

messaging-bugs <messaging-bugs>

Status:

CLOSED UPSTREAM

QA Contact:

Messaging QE <messaging-qe-bugs>

Severity:

urgent

Docs Contact:

Priority:

medium

Version:

3.0

CC:

an.euroford, astitcher, jross, yanzhenjiao

Target Milestone:

---

Keywords:

Reproducer

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2025-02-10 03:43:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cluster.conf	none
qpidd.conf	none
valgrind log	none

Description euroford 2014-10-10 01:57:43 UTC

Description of problem:
Broker memory bloat and crash at last.

Version-Release number of selected component (if applicable):
qpid-ha-0.22-49.el6
kernel-rt-3.10.33-rt32.51.el6rt

How reproducible:


Steps to Reproduce:
1.qpid-receive -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}' --print-content no  -f
2.qpid-send -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --messages 1000 --content-size 5120000 --send-rate 300 --report-total --report-header=no --timestamp=yes --sequence=no --durable False --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}'
3.wait until the broker memory bloat.

Actual results:
broker memory bloat and crash at last.

Expected results:
keep working.

Additional info:

Comment 1 Justin Ross 2014-10-10 11:08:37 UTC

Alan, please assess.

Comment 2 euroford 2014-10-10 14:36:02 UTC

In FDR IB env, not more than 10 minutes, qpid-ha will crash at 5MB messages@300Hz sending speed.

More bigger the message, more faster the qpid-ha will crash.

My server has 64MB RAM, I think qpid-ha will crash faster in less RAM servers.

Comment 3 euroford 2014-10-10 14:46:17 UTC

Created attachment 945661 [details]
cluster.conf

Comment 4 euroford 2014-10-10 14:47:31 UTC

Created attachment 945663 [details]
qpidd.conf

Comment 5 Alan Conway 2014-10-21 14:44:15 UTC

I tried to reproduce this with TCP. Running overnight I saw memory grow from 350M -> 650M but then appears to stabilize at that point. This is nothing like to the growth reported above. It seems likely that RDMA is likely a factor, further investigation with RDMA required.

Comment 6 euroford 2014-10-27 14:03:37 UTC

when qpid-ha works in IB env, it receives messages in RDMA mode, and relay these messages to replicate server in unbalance way without any flow control, it maybe the reason I think.

Comment 7 Andrew Stitcher 2014-10-27 14:46:04 UTC

Perhaps related is this long standing RDMA federation problem:
Bug 468932.

Comment 8 Alan Conway 2014-10-27 16:08:02 UTC

HA does use federation links but Bug 468932 says the leak is per link. euroford, does your test involve a lot of broker failures/disconnects? There would be a new link for each reconnect. From the description it sounds like you are just running sender/receiver against a static cluster with no failures, in which case bug a leak per link wouldn't explain the bloat. Even so there may still be a relationship between the bugs, but it would require another way of triggering the leak, or something else is happening that is causing a lot of reconnects.

Comment 9 euroford 2014-10-31 03:42:17 UTC

Hi Alan,
Yes, I just run sender/receiver against a two nodes qpid-ha cluster, all the method and configuration files were posted here.
Could you reproduce this bug in IB env?

Comment 10 Alan Conway 2014-10-31 14:44:13 UTC

OK, that's what I thought. So this does not on the surface look like it is directly caused by Bug 468932, but there may be some connection. I haven't had a chance to investigate on IB yet, I will update this bug as soon as I do.

Comment 11 euroford 2014-12-01 09:32:19 UTC

Created attachment 963194 [details]
valgrind log

I'm sure this bug is RDMA related.

Comment 12 euroford 2014-12-01 09:42:31 UTC

My hardware ENV is IBM flex system x240 + mellanox connectx-3 56Gbps dual port IB card, and use libibverbs-rocee-1.1.7-1.1.el6_5 + libmlx4-rocee-1.0.5-1.1.el6_5 in MRG repo.

Comment 13 Red Hat Bugzilla 2025-02-10 03:43:16 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.