Bug 1004411 - memory leak in newHA after active broker shutdown
memory leak in newHA after active broker shutdown
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
3.0
All Linux
medium Severity high
: 3.0
: ---
Assigned To: Alan Conway
Eric Sammons
: TestCaseProvided
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-04 11:02 EDT by Pavel Moravec
Modified: 2015-01-21 07:54 EST (History)
4 users (show)

See Also:
Fixed In Version: qpid-cpp-0.22-35
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-01-21 07:54:40 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Full valgrind log (37.46 KB, application/x-compressed-tar)
2013-09-04 11:05 EDT, Pavel Moravec
no flags Details

  None (edit)
Description Pavel Moravec 2013-09-04 11:02:47 EDT
Description of problem:
There is a memory leak in failing over HA link of ready broker, after active broker shutdown.


Version-Release number of selected component (if applicable):
qpid-cpp 0.22-12


How reproducible:
100%


Steps to Reproduce:
1. (in first terminal on node1)
service qpidd start
qpid-ha promote
qpid-send -a "testQueue; {create:always}" -m 1000 --content-size=100

2. (in second terminal on node2)
valgrind --leak-check=full --show-reachable=yes $(which qpidd) > valgrind.txt 2>&1

3. (in first terminal on node1)
service qpidd stop
sleep 10

4. (in second terminal on node2)
(stop valgrind)


Actual results:
==24677== 1,193,876 (480 direct, 1,193,396 indirect) bytes in 1 blocks are definitely lost in loss record 452 of 452
==24677==    at 0x4C285BC: operator new(unsigned long) (vg_replace_malloc.c:298)
==24677==    by 0x5062B58: qpid::broker::LinkRegistry::declare(std::string const&, std::string const&, unsigned short, std::string const&, bool, std::string const&, std::string const&, std::string const&, bool) (LinkRegistry.cpp:137)
==24677==    by 0x7A9D819: qpid::ha::Backup::setBrokerUrl(qpid::Url const&) (Backup.cpp:75)
==24677==    by 0x7ABE2E4: qpid::ha::HaBroker::setBrokerUrl(qpid::Url const&) (HaBroker.cpp:196)
==24677==    by 0x7AC1C7D: qpid::ha::HaBroker::initialize() (HaBroker.cpp:122)
==24677==    by 0x550F340: void qpid::(anonymous namespace)::each_plugin<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > >(boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > const&) (mem_fn_template.hpp:162)
==24677==    by 0x550F381: qpid::Plugin::initializeAll(qpid::Plugin::Target&) (Plugin.cpp:91)
==24677==    by 0x4FD3459: qpid::broker::Broker::Broker(qpid::broker::Broker::Options const&) (Broker.cpp:354)
==24677==    by 0x4072D5: qpid::broker::QpiddBroker::execute(qpid::broker::QpiddOptions*) (in /usr/sbin/qpidd)
==24677==    by 0x40CB03: qpid::broker::run_broker(int, char**, bool) (in /usr/sbin/qpidd)
==24677==    by 0x674BCDC: (below main) (in /lib64/libc-2.12.so)
==24677==
==24677== LEAK SUMMARY:
==24677==    definitely lost: 480 bytes in 1 blocks
==24677==    indirectly lost: 1,193,396 bytes in 10,337 blocks
==24677==      possibly lost: 1,253 bytes in 16 blocks
==24677==    still reachable: 78,604 bytes in 511 blocks
==24677==         suppressed: 0 bytes in 0 blocks
==24677==
==24677== For counts of detected and suppressed errors, rerun with: -v
==24677== ERROR SUMMARY: 15 errors from 15 contexts (suppressed: 130 from 9)


Expected results:
No definitely lost bytes.


Additional info:
creating the queue is essential for the reproducer, as well as waiting some time ("sleep 10") to let node2 broker to fail-over "qpid.ha.link.<uuid>" link first
Comment 1 Pavel Moravec 2013-09-04 11:05:26 EDT
Created attachment 793720 [details]
Full valgrind log

And I forgot to add /etc/qpid/qpidd.conf:

log-to-file=/tmp/qpidd.log
acl-file=/etc/qpid/qpidd.acl
ha-replicate=all
ha-cluster=yes
ha-brokers-url=train1,train2,train3
ha-username=guest
ha-password=guest
ha-mechanism=PLAIN

(acl-file has just "acl allow all all", authentication might not be required)
Comment 2 Pavel Moravec 2013-09-05 08:14:13 EDT
Playing with the reproducer little more:

- sufficient scenario is start&promote a broker, do some provisioning, start a backup broker under valgrind and once it gets "active" state, shut it down. 
- No primary broker shutdown is required. While provisioning updated from primary to backup is MUST.
- just the 480 bytes are lost, regardless of number of repetitions of any test case like above - that seems like the very first link creation is the guilty one

Anyway I feel valgrind is wrong here, as:
- it complains Link object to be forgotten
- that is crated from LinkRegistry.cpp (LinkRegistry::declare method) only, where it is wrapped in Link::shared_ptr. Therefore boost library should destroy Link object after removing the latest reference to it (in LinkRegistry::destroy) - moreover checking where&how Link::shared_ptr is used, I see no reason to leaving the Link object unreferenced..
- if there would be a leak in link itself, a simple federation route would trigger it as well - but I tested it is not the case; BUT Link usage outside link itself is only in Backup::setBrokerUrl where the shared_ptr reference is local variable.. Or could be the reference somehow mis-used in BrokerReplicator called there?
Comment 3 Alan Conway 2014-02-06 11:53:10 EST
Fixed upstream:

Commit 1565340 from Alan Conway in branch 'qpid/trunk'
[ https://svn.apache.org/r1565340 ]

QPID-5544: HA memory leak in backup broker after shutdown.

The memory leaks were due to shared_ptr cycles between the QueueReplicator,
Bridge and Link objects. This patch breaks the cycles using weak_ptrs
in the appropriate places.
Comment 5 Alan Conway 2014-02-07 10:35:38 EST
Done on branch: 

http://git.app.eng.bos.redhat.com/rh-qpid.git/log/?h=0.22-mrg-aconway-bz1004411

Note there are 2 commits on this branch:

aconway@mrg32 rh-qpid (0.22-mrg-aconway-bz1004411)$ git log 0.22-mrg..
dd15a27 QPID-5544: HA memory leak in backup broker after shutdown.
bbf82ec NO-JIRA: Minor refactor to improve code safety: calling shared_from_this on creation.
Comment 6 Leonid Zhaldybin 2014-02-24 11:32:53 EST
Tested on RHEL6.5, this issue has been fixed.

Packages used for testing:
python-qpid-0.22-11.el6
python-qpid-qmf-0.22-27.el6
qpid-cpp-client-0.22-35.el6
qpid-cpp-client-devel-0.22-35.el6
qpid-cpp-client-devel-docs-0.22-35.el6
qpid-cpp-client-rdma-0.22-35.el6
qpid-cpp-server-0.22-35.el6
qpid-cpp-server-devel-0.22-35.el6
qpid-cpp-server-ha-0.22-35.el6
qpid-cpp-server-linearstore-0.22-35.el6
qpid-cpp-server-rdma-0.22-35.el6
qpid-cpp-server-xml-0.22-35.el6
qpid-cpp-tar-0.22-11.el6
qpid-proton-c-0.6-1.el6
qpid-qmf-0.22-27.el6
qpid-qmf-devel-0.22-27.el6
qpid-tools-0.22-8.el6

-> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.