Bug 748738 - The pure-python QMF console unnecessarily retains references to query results -- cumin-web process leaks ~800kb/min
Summary: The pure-python QMF console unnecessarily retains references to query results...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-qmf
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: 2.1
: ---
Assignee: Darryl L. Pierce
QA Contact: Stanislav Graf
URL:
Whiteboard:
Depends On:
Blocks: 743350
TreeView+ depends on / blocked
 
Reported: 2011-10-25 08:30 UTC by David Sommerseth
Modified: 2016-05-22 23:33 UTC (History)
7 users (show)

Fixed In Version: qpid-qmf-0.10-11.el5
Doc Type: Bug Fix
Doc Text:
When running a QMF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.
Clone Of:
Environment:
Last Closed: 2012-01-23 17:29:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ps log on cumin processes (memory consuption overview) (5.17 KB, text/plain)
2011-10-25 08:30 UTC, David Sommerseth
no flags Details
ps log on patched cumin processes (5.73 KB, text/plain)
2011-10-25 11:45 UTC, Stanislav Graf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 743657 0 urgent CLOSED The pure-python QMF console unnecessarily retains references to query results 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2012:0045 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update 2012-01-23 22:22:58 UTC

Internal Links: 743657

Description David Sommerseth 2011-10-25 08:30:27 UTC
Created attachment 530025 [details]
ps log on cumin processes (memory consuption overview)

Description of problem:

It was noticed that cumin-web was leaking memory on a box after a while.  Closer investigation showed that it leaked approx 800kb per minute.  The configuration used had 4 cumin processes running in parallel, which after some days would use all physical RAM + swap.

A memory consumption overview is attached to this bz.

Version-Release number of selected component (if applicable):

[root@ooo ~]# rpm -q cumin
cumin-0.1.5068-1.el5
[root@ooo ~]# uname -a
Linux ooo 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@ooo ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.7 (Tikanga)


How reproducible:
Easily and reliable


Steps to Reproduce:
1. Start cumin and let it monitor an active environment (grid components, f.ex)
2. Run the following in a shell:
    while [ 1 ]; do  date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done
3. Observer the RSS column of the output

  
Actual results:
cumin-web eats up approx 800kb/min memory per cumin-web process

Expected results:
No memory leaks

Comment 2 Matthew Farrellee 2011-10-25 09:22:29 UTC
Is the root cause bug 743657? It should be possible to manually patch qmf/console.py and verify.

Comment 3 Stanislav Graf 2011-10-25 11:43:57 UTC
# diff -c console.py_backup /usr/lib64/python2.4/site-packages/qmf/console.py
*** console.py_backup	2011-10-25 12:48:36.945613700 +0200
--- /usr/lib64/python2.4/site-packages/qmf/console.py	2011-10-25 13:07:46.433213735 +0200
***************
*** 3284,3289 ****
--- 3284,3290 ----
        self.lock.acquire()
        try:
          self.contextMap.pop(sequence)
+         self.seqMgr._release(sequence)
        except KeyError:
          pass   # @todo - shouldn't happen, log a warning.
      finally:

# while [ 1 ]; do  date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done

Results in %:
    0min    5min   10min   15min
  100.00  100.00  100.00  100.00
  100.00  100.10  100.13  100.20
  100.00  100.12  100.16  100.21
  100.00  100.10  100.15  100.19
  100.00  100.08  100.13  100.19
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04

RSS =
    0min    5min   10min   15min
  100.00  100.00  100.00  100.00
  100.00  100.10  100.13  100.20
  100.00  100.12  100.16  100.21
  100.00  100.10  100.15  100.19
  100.00  100.08  100.13  100.19
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04

I see no leaking with patch.

Comment 4 Stanislav Graf 2011-10-25 11:45:31 UTC
Created attachment 530066 [details]
ps log on patched cumin processes

Comment 5 Stanislav Graf 2011-10-25 12:16:09 UTC
Repaired RSS (mistake in calculation):

  0min    5min   10min   15min   60min
100.00  100.00  100.00  100.00  100.00
100.00  102.62  103.39  104.23  105.02
100.00  102.32  103.31  104.12  105.17
100.00  102.35  103.11  104.15  104.73
100.00  102.20  103.49  104.14  105.02
100.00  100.40  100.48  100.68  100.75
100.00  100.13  100.13  100.13  100.17
100.00  100.36  100.36  100.36  100.36
100.00  100.04  100.06  100.31  100.83
100.00  101.24  101.24  101.24  102.73

Still max diff for 60min period is 2152kB.

Comment 8 Matthew Farrellee 2011-10-31 20:15:07 UTC
This is the EL5 counterpart to bug 743657.

Comment 10 Stanislav Graf 2011-11-09 23:07:28 UTC
Reproduce on RHEL5 i386/x86_64
condor-7.6.5-0.6.el5
cumin-0.1.5098-1.el5
qpid-qmf-0.10-10.el5

Cumin memory usage (VSZ) difference between 5min and 10min run:
i386:   (604824-579676)/5 = 5000 kB/min memory grow
x86_64: (816812-792412)/5 = 4900 kB/min memory grow

Verify on RHEL5 i386/x86_64 (update qpid-qmf)
condor-7.6.5-0.6.el5
cumin-0.1.5098-1.el5
qpid-qmf-0.10-11.el5

Cumin memory usage (VSZ) difference between 5min and 10min run:
i386:   (572748-572432)/5 =   63 kB/min memory grow
x86_64: (843320-843320)/5 =    0 kB/min memory grow

Verify patch presence:
i386:
# grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib/python2.4/site-packages/qmf/console.py
        self.contextMap.pop(sequence)
        self.seqMgr._release(sequence)
      except KeyError:
[00:05:34] ecode=0

x86_64
# grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib64/python2.4/site-packages/qmf/console.py
        self.contextMap.pop(sequence)
        self.seqMgr._release(sequence)
      except KeyError:
[00:05:35] ecode=0


---> VERIFIED

Comment 11 Darryl L. Pierce 2011-11-16 14:41:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
CAUSE:

After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.

CONSEQUENCE:

This caused old request details (such as object ids) to be held long past the life of the request and response.

FIX:

Explicitly call the sequence manager's cleanup method for a request once the request had been handled.

RESULT:

Fixes the memory leak by releasing the unneeded objects.

Comment 12 Darryl L. Pierce 2011-11-16 14:45:39 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,10 +1,12 @@
+When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems.
+
 CAUSE:
 
 After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.
 
 CONSEQUENCE:
 
-This caused old request details (such as object ids) to be held long past the life of the request and response.
+This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. 
 
 FIX:

Comment 13 Tomas Capek 2011-11-16 14:56:08 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,17 +1 @@
-When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems.
+When running a QMF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.-
-CAUSE:
-
-After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.
-
-CONSEQUENCE:
-
-This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. 
-
-FIX:
-
-Explicitly call the sequence manager's cleanup method for a request once the request had been handled.
-
-RESULT:
-
-Fixes the memory leak by releasing the unneeded objects.

Comment 14 errata-xmlrpc 2012-01-23 17:29:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html


Note You need to log in before you can comment on or make changes to this bug.