Bug 1136294

Summary: Memory leak when purging priority ring queue
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Zdenek Kraus <zkraus>
Severity: high Docs Contact:
Priority: high    
Version: 2.5CC: aconway, iboverma, jross, lzhaldyb, mcressma, pematous, zkraus
Target Milestone: 2.5.5   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-31 Doc Type: Bug Fix
Doc Text:
It was discovered that when messages were purged from a priority ring queue, the message counters were not properly maintained. This caused messages to be deleted incorrectly, which caused memory to be consumed unnecessarily. The message counters are now properly maintained for this particular scenario. Messages are purged properly, and memory no longer accumulates during repeated purging operations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-22 16:34:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1140368, 1140369    
Attachments:
Description Flags
Patch proposal none

Description Pavel Moravec 2014-09-02 09:46:53 UTC
Description of problem:
See Summary and reproducer. Valgrind does not show any leak. I suspect RingQueuePolicy accumulates the memory as gdb shows:

(gdb) p (*(qpid::broker::RingQueuePolicy *)(policy))
$11 = {<qpid::broker::QueuePolicy> = {_vptr.QueuePolicy = 0x7f62d2ebaef0, static defaultMaxSize = 104857600, maxCount = 5000, 
    maxSize = 104857600, type = "ring", count = 31409, size = 157045000, policyExceeded = true, queue = 0x7f62b405c640, 
..

See maxCount being smaller than count, and "queue" has some members with size_ in thousands.


Version-Release number of selected component (if applicable):
qpid-cpp-server 0.18-30


How reproducible:
100%


Steps to Reproduce:
(see https://access.redhat.com/site/solutions/70456 for purge_queue)

rm -rf /var/lib/qpidd/*
service qpidd restart

qpid-config add queue PrioRing --argument x-qpid-priorities=10 --max-queue-count=1000 --limit-policy=ring

while true; do
        qpid-send -a PrioRing -m 700 --content-size=50 #--ttl=10
        ./purge_queue PrioRing 0 > /dev/null 2>&1
        ps aux | grep qpidd | grep daemon
done


Actual results:
RSS and also CPU usage of qpidd process grows in every iteration.


Expected results:
RSS stabilises after a while, CPU usage as well (though that is rather a side effect)


Additional info:
- priority queue is essential for reproducer. Ring queue as well but just due to bz1136029
- it is probably a regression caused by bz860691
- another reproducer (worth to verify as well) relies on purging expired messages:

echo "queue-purge-interval=1" > /etc/qpidd.conf
rm -rf /var/lib/qpidd/*
service qpidd restart

qpid-config add queue PrioRing --argument x-qpid-priorities=10 --max-queue-count=10000 --limit-policy=ring

while true; do
        qpid-send -a PrioRing -m 7000 --content-size=50 --ttl=10
	sleep 1.3
	qpid-receive -a "PrioRing; {mode:browse}" -m 100000 | sed "s/X//g" | nl # should be empty output, just to confirm the queue is really empty
        ps aux | grep qpidd | grep daemon
done

Comment 1 Pavel Moravec 2014-09-03 12:28:02 UTC
Yet another reproducer, again relying on Queue::purge method: use QMF method reroute. Reproducer:

qpid-config add exchange fanout AltForQueue1
qpid-config add exchange fanout AltForQueue2
for q in Queue1 Queue2; do
	qpid-config add queue $q --argument x-qpid-priorities=10 --max-queue-count=5000 --limit-policy=ring --alternate-exchange=AltFor${q}
done
qpid-config bind AltForQueue2 Queue1
qpid-config bind AltForQueue1 Queue2

qpid-send -a Queue1 -m 3000 --content-size=100

while true; do
	for q in Queue1 Queue2; do
	        ps aux | grep qpid | grep daemon
		./reroute_queue $q AltFor${q} 0 false > /dev/null 2>&1
	done
done

Comment 2 Pavel Moravec 2014-09-03 12:59:20 UTC
Root cause in source code found.

Queue::purge method executes:

1) messages->removeIf. Here, PriorityQueue::removeIf calls "fifo.clean()".

2) dequeue(0, *qmsg); is called for all dequeued messages. Queue::dequeue executes:

            deleted = removed || messages->deleted(msg);
            if (deleted) {
              if (policy.get()) policy->dequeued(msg);

but PriorityQueue::deleted checks it has empty fifo so it returns false. And queue policy is _not_ updated, keeping itself reference to the message and not updating QMF counters.

resolving this fixes also bz1136029.

Comment 3 Pavel Moravec 2014-09-03 13:39:44 UTC
Quite probable patch (just building test packages):

diff -rup _qpid-cpp-0.18-30_orig/cpp/src/qpid/broker/PriorityQueue.cpp _qpid-cpp-0.18-30_new/cpp/src/qpid/broker/PriorityQueue.cpp
--- _qpid-cpp-0.18-30_orig/cpp/src/qpid/broker/PriorityQueue.cpp	2014-09-02 10:44:20.398153124 +0200
+++ _qpid-cpp-0.18-30_new/cpp/src/qpid/broker/PriorityQueue.cpp	2014-09-03 15:37:47.693868410 +0200
@@ -140,7 +140,7 @@ void PriorityQueue::removeIf(Predicate p
     for (int priority = 0; priority < levels; ++priority) {
         for (Deque::iterator i = messages[priority].begin(); i != messages[priority].end();) {
             if ((*i)->status == QueuedMessage::AVAILABLE && p(**i)) {
-                (*i)->status = QueuedMessage::DELETED; // Updates fifo index
+                (*i)->status = QueuedMessage::REMOVED; // Updates fifo index
                 i = messages[priority].erase(i);
                 clearCache();
             } else {

Comment 4 Pavel Moravec 2014-09-03 14:15:01 UTC
Created attachment 934111 [details]
Patch proposal

PriorityQueue::removeIf method has to mark removed messages as REMOVED and not DELETED (cf MessageDequeue::removeIf method), otherwise the messages are removed immediatelly and queue policy is not updated.

Comment 5 Alan Conway 2014-09-03 17:13:42 UTC
(In reply to Pavel Moravec from comment #4)
> Created attachment 934111 [details]
> Patch proposal
> 
> PriorityQueue::removeIf method has to mark removed messages as REMOVED and
> not DELETED (cf MessageDequeue::removeIf method), otherwise the messages are
> removed immediatelly and queue policy is not updated.

This patch looks good to me, provided that the testing bears it out.

Comment 6 Pavel Moravec 2014-09-03 17:50:34 UTC
*** Bug 1136029 has been marked as a duplicate of this bug. ***

Comment 8 Zdenek Kraus 2014-10-02 14:01:42 UTC
This was tested on RHEL5, RHEL6 && i686, x86_64 with following packages:

python-qpid-0.18-13
python-qpid-qmf-0.18-27
qpid-cpp-client-0.18-32
qpid-cpp-client-devel-0.18-32
qpid-cpp-client-devel-docs-0.18-32
qpid-cpp-client-ssl-0.18-32
qpid-cpp-server-0.18-32
qpid-cpp-server-devel-0.18-32
qpid-cpp-server-ssl-0.18-32
qpid-cpp-server-store-0.18-32
qpid-cpp-server-xml-0.18-32
qpid-java-client-0.18-8
qpid-java-common-0.18-8
qpid-java-example-0.18-8
qpid-jca-0.18-8
qpid-jca-xarecovery-0.18-8
qpid-qmf-0.18-27
qpid-tools-0.18-10


->VERIFIED

Comment 11 errata-xmlrpc 2014-10-22 16:34:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2014-1682.html