Bug 1151269
| Summary: | Qpid-ha broker memory bloat due to leak of memory with RDMA. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | euroford <an.euroford> | ||||||||
| Component: | qpid-cpp | Assignee: | messaging-bugs <messaging-bugs> | ||||||||
| Status: | CLOSED UPSTREAM | QA Contact: | Messaging QE <messaging-qe-bugs> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 3.0 | CC: | an.euroford, astitcher, jross, yanzhenjiao | ||||||||
| Target Milestone: | --- | Keywords: | Reproducer | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2025-02-10 03:43:16 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Alan, please assess. In FDR IB env, not more than 10 minutes, qpid-ha will crash at 5MB messages@300Hz sending speed. More bigger the message, more faster the qpid-ha will crash. My server has 64MB RAM, I think qpid-ha will crash faster in less RAM servers. Created attachment 945661 [details]
cluster.conf
Created attachment 945663 [details]
qpidd.conf
I tried to reproduce this with TCP. Running overnight I saw memory grow from 350M -> 650M but then appears to stabilize at that point. This is nothing like to the growth reported above. It seems likely that RDMA is likely a factor, further investigation with RDMA required. when qpid-ha works in IB env, it receives messages in RDMA mode, and relay these messages to replicate server in unbalance way without any flow control, it maybe the reason I think. Perhaps related is this long standing RDMA federation problem: Bug 468932. HA does use federation links but Bug 468932 says the leak is per link. euroford, does your test involve a lot of broker failures/disconnects? There would be a new link for each reconnect. From the description it sounds like you are just running sender/receiver against a static cluster with no failures, in which case bug a leak per link wouldn't explain the bloat. Even so there may still be a relationship between the bugs, but it would require another way of triggering the leak, or something else is happening that is causing a lot of reconnects. Hi Alan, Yes, I just run sender/receiver against a two nodes qpid-ha cluster, all the method and configuration files were posted here. Could you reproduce this bug in IB env? OK, that's what I thought. So this does not on the surface look like it is directly caused by Bug 468932, but there may be some connection. I haven't had a chance to investigate on IB yet, I will update this bug as soon as I do. Created attachment 963194 [details]
valgrind log
I'm sure this bug is RDMA related.
My hardware ENV is IBM flex system x240 + mellanox connectx-3 56Gbps dual port IB card, and use libibverbs-rocee-1.1.7-1.1.el6_5 + libmlx4-rocee-1.0.5-1.1.el6_5 in MRG repo. This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |
Description of problem: Broker memory bloat and crash at last. Version-Release number of selected component (if applicable): qpid-ha-0.22-49.el6 kernel-rt-3.10.33-rt32.51.el6rt How reproducible: Steps to Reproduce: 1.qpid-receive -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}' --print-content no -f 2.qpid-send -b 'amqp:rdma:qpid-ha-server-ip' -a 'benchmark-0;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}' --messages 1000 --content-size 5120000 --send-rate 300 --report-total --report-header=no --timestamp=yes --sequence=no --durable False --connection-options '{tcp-nodelay:true,reconnect:true,heartbeat:1}' 3.wait until the broker memory bloat. Actual results: broker memory bloat and crash at last. Expected results: keep working. Additional info: