Bug 748738

Summary:

The pure-python QMF console unnecessarily retains references to query results -- cumin-web process leaks ~800kb/min

Product:

Red Hat Enterprise MRG

Reporter:

David Sommerseth <davids>

Component:

qpid-qmf

Assignee:

Darryl L. Pierce <dpierce>

Status:

CLOSED ERRATA

QA Contact:

Stanislav Graf <sgraf>

Severity:

high

Docs Contact:

Priority:

high

Version:

2.1

CC:

jneedle, jross, matt, mcressma, ovasik, sgraf, tross

Target Milestone:

2.1

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

qpid-qmf-0.10-11.el5

Doc Type:

Bug Fix

Doc Text:

When running a QMF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-01-23 17:29:32 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

743350

Attachments:

Description	Flags
ps log on cumin processes (memory consuption overview)	none
ps log on patched cumin processes	none

Description David Sommerseth 2011-10-25 08:30:27 UTC

Created attachment 530025 [details]
ps log on cumin processes (memory consuption overview)

Description of problem:

It was noticed that cumin-web was leaking memory on a box after a while.  Closer investigation showed that it leaked approx 800kb per minute.  The configuration used had 4 cumin processes running in parallel, which after some days would use all physical RAM + swap.

A memory consumption overview is attached to this bz.

Version-Release number of selected component (if applicable):

[root@ooo ~]# rpm -q cumin
cumin-0.1.5068-1.el5
[root@ooo ~]# uname -a
Linux ooo 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@ooo ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.7 (Tikanga)


How reproducible:
Easily and reliable


Steps to Reproduce:
1. Start cumin and let it monitor an active environment (grid components, f.ex)
2. Run the following in a shell:
    while [ 1 ]; do  date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done
3. Observer the RSS column of the output

  
Actual results:
cumin-web eats up approx 800kb/min memory per cumin-web process

Expected results:
No memory leaks

Comment 2 Matthew Farrellee 2011-10-25 09:22:29 UTC

Is the root cause bug 743657? It should be possible to manually patch qmf/console.py and verify.

Comment 3 Stanislav Graf 2011-10-25 11:43:57 UTC

# diff -c console.py_backup /usr/lib64/python2.4/site-packages/qmf/console.py
*** console.py_backup	2011-10-25 12:48:36.945613700 +0200
--- /usr/lib64/python2.4/site-packages/qmf/console.py	2011-10-25 13:07:46.433213735 +0200
***************
*** 3284,3289 ****
--- 3284,3290 ----
        self.lock.acquire()
        try:
          self.contextMap.pop(sequence)
+         self.seqMgr._release(sequence)
        except KeyError:
          pass   # @todo - shouldn't happen, log a warning.
      finally:

# while [ 1 ]; do  date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done

Results in %:
    0min    5min   10min   15min
  100.00  100.00  100.00  100.00
  100.00  100.10  100.13  100.20
  100.00  100.12  100.16  100.21
  100.00  100.10  100.15  100.19
  100.00  100.08  100.13  100.19
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04

RSS =
    0min    5min   10min   15min
  100.00  100.00  100.00  100.00
  100.00  100.10  100.13  100.20
  100.00  100.12  100.16  100.21
  100.00  100.10  100.15  100.19
  100.00  100.08  100.13  100.19
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04
  100.00  100.00  100.00  100.00
  100.00  100.04  100.04  100.04

I see no leaking with patch.

Comment 4 Stanislav Graf 2011-10-25 11:45:31 UTC

Created attachment 530066 [details]
ps log on patched cumin processes

Comment 5 Stanislav Graf 2011-10-25 12:16:09 UTC

Repaired RSS (mistake in calculation):

  0min    5min   10min   15min   60min
100.00  100.00  100.00  100.00  100.00
100.00  102.62  103.39  104.23  105.02
100.00  102.32  103.31  104.12  105.17
100.00  102.35  103.11  104.15  104.73
100.00  102.20  103.49  104.14  105.02
100.00  100.40  100.48  100.68  100.75
100.00  100.13  100.13  100.13  100.17
100.00  100.36  100.36  100.36  100.36
100.00  100.04  100.06  100.31  100.83
100.00  101.24  101.24  101.24  102.73

Still max diff for 60min period is 2152kB.

Comment 8 Matthew Farrellee 2011-10-31 20:15:07 UTC

This is the EL5 counterpart to bug 743657.

Comment 10 Stanislav Graf 2011-11-09 23:07:28 UTC

Reproduce on RHEL5 i386/x86_64
condor-7.6.5-0.6.el5
cumin-0.1.5098-1.el5
qpid-qmf-0.10-10.el5

Cumin memory usage (VSZ) difference between 5min and 10min run:
i386:   (604824-579676)/5 = 5000 kB/min memory grow
x86_64: (816812-792412)/5 = 4900 kB/min memory grow

Verify on RHEL5 i386/x86_64 (update qpid-qmf)
condor-7.6.5-0.6.el5
cumin-0.1.5098-1.el5
qpid-qmf-0.10-11.el5

Cumin memory usage (VSZ) difference between 5min and 10min run:
i386:   (572748-572432)/5 =   63 kB/min memory grow
x86_64: (843320-843320)/5 =    0 kB/min memory grow

Verify patch presence:
i386:
# grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib/python2.4/site-packages/qmf/console.py
        self.contextMap.pop(sequence)
        self.seqMgr._release(sequence)
      except KeyError:
[00:05:34] ecode=0

x86_64
# grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib64/python2.4/site-packages/qmf/console.py
        self.contextMap.pop(sequence)
        self.seqMgr._release(sequence)
      except KeyError:
[00:05:35] ecode=0


---> VERIFIED

Comment 11 Darryl L. Pierce 2011-11-16 14:41:24 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
CAUSE:

After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.

CONSEQUENCE:

This caused old request details (such as object ids) to be held long past the life of the request and response.

FIX:

Explicitly call the sequence manager's cleanup method for a request once the request had been handled.

RESULT:

Fixes the memory leak by releasing the unneeded objects.

Comment 12 Darryl L. Pierce 2011-11-16 14:45:39 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,10 +1,12 @@
+When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems.
+
 CAUSE:
 
 After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.
 
 CONSEQUENCE:
 
-This caused old request details (such as object ids) to be held long past the life of the request and response.
+This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. 
 
 FIX:

Comment 13 Tomas Capek 2011-11-16 14:56:08 UTC

    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,17 +1 @@
-When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems.
+When running a QMF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.-
-CAUSE:
-
-After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out.
-
-CONSEQUENCE:
-
-This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. 
-
-FIX:
-
-Explicitly call the sequence manager's cleanup method for a request once the request had been handled.
-
-RESULT:
-
-Fixes the memory leak by releasing the unneeded objects.

Comment 14 errata-xmlrpc 2012-01-23 17:29:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html