Created attachment 530025 [details] ps log on cumin processes (memory consuption overview) Description of problem: It was noticed that cumin-web was leaking memory on a box after a while. Closer investigation showed that it leaked approx 800kb per minute. The configuration used had 4 cumin processes running in parallel, which after some days would use all physical RAM + swap. A memory consumption overview is attached to this bz. Version-Release number of selected component (if applicable): [root@ooo ~]# rpm -q cumin cumin-0.1.5068-1.el5 [root@ooo ~]# uname -a Linux ooo 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux [root@ooo ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.7 (Tikanga) How reproducible: Easily and reliable Steps to Reproduce: 1. Start cumin and let it monitor an active environment (grid components, f.ex) 2. Run the following in a shell: while [ 1 ]; do date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done 3. Observer the RSS column of the output Actual results: cumin-web eats up approx 800kb/min memory per cumin-web process Expected results: No memory leaks
Is the root cause bug 743657? It should be possible to manually patch qmf/console.py and verify.
# diff -c console.py_backup /usr/lib64/python2.4/site-packages/qmf/console.py *** console.py_backup 2011-10-25 12:48:36.945613700 +0200 --- /usr/lib64/python2.4/site-packages/qmf/console.py 2011-10-25 13:07:46.433213735 +0200 *************** *** 3284,3289 **** --- 3284,3290 ---- self.lock.acquire() try: self.contextMap.pop(sequence) + self.seqMgr._release(sequence) except KeyError: pass # @todo - shouldn't happen, log a warning. finally: # while [ 1 ]; do date +@%H:%M; ps faxuw | grep ^cumin; sleep 300; done Results in %: 0min 5min 10min 15min 100.00 100.00 100.00 100.00 100.00 100.10 100.13 100.20 100.00 100.12 100.16 100.21 100.00 100.10 100.15 100.19 100.00 100.08 100.13 100.19 100.00 100.04 100.04 100.04 100.00 100.00 100.00 100.00 100.00 100.04 100.04 100.04 100.00 100.00 100.00 100.00 100.00 100.04 100.04 100.04 RSS = 0min 5min 10min 15min 100.00 100.00 100.00 100.00 100.00 100.10 100.13 100.20 100.00 100.12 100.16 100.21 100.00 100.10 100.15 100.19 100.00 100.08 100.13 100.19 100.00 100.04 100.04 100.04 100.00 100.00 100.00 100.00 100.00 100.04 100.04 100.04 100.00 100.00 100.00 100.00 100.00 100.04 100.04 100.04 I see no leaking with patch.
Created attachment 530066 [details] ps log on patched cumin processes
Repaired RSS (mistake in calculation): 0min 5min 10min 15min 60min 100.00 100.00 100.00 100.00 100.00 100.00 102.62 103.39 104.23 105.02 100.00 102.32 103.31 104.12 105.17 100.00 102.35 103.11 104.15 104.73 100.00 102.20 103.49 104.14 105.02 100.00 100.40 100.48 100.68 100.75 100.00 100.13 100.13 100.13 100.17 100.00 100.36 100.36 100.36 100.36 100.00 100.04 100.06 100.31 100.83 100.00 101.24 101.24 101.24 102.73 Still max diff for 60min period is 2152kB.
This is the EL5 counterpart to bug 743657.
Reproduce on RHEL5 i386/x86_64 condor-7.6.5-0.6.el5 cumin-0.1.5098-1.el5 qpid-qmf-0.10-10.el5 Cumin memory usage (VSZ) difference between 5min and 10min run: i386: (604824-579676)/5 = 5000 kB/min memory grow x86_64: (816812-792412)/5 = 4900 kB/min memory grow Verify on RHEL5 i386/x86_64 (update qpid-qmf) condor-7.6.5-0.6.el5 cumin-0.1.5098-1.el5 qpid-qmf-0.10-11.el5 Cumin memory usage (VSZ) difference between 5min and 10min run: i386: (572748-572432)/5 = 63 kB/min memory grow x86_64: (843320-843320)/5 = 0 kB/min memory grow Verify patch presence: i386: # grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib/python2.4/site-packages/qmf/console.py self.contextMap.pop(sequence) self.seqMgr._release(sequence) except KeyError: [00:05:34] ecode=0 x86_64 # grep -A 1 -B 1 'self.seqMgr._release(sequence)' /usr/lib64/python2.4/site-packages/qmf/console.py self.contextMap.pop(sequence) self.seqMgr._release(sequence) except KeyError: [00:05:35] ecode=0 ---> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: CAUSE: After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out. CONSEQUENCE: This caused old request details (such as object ids) to be held long past the life of the request and response. FIX: Explicitly call the sequence manager's cleanup method for a request once the request had been handled. RESULT: Fixes the memory leak by releasing the unneeded objects.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,10 +1,12 @@ +When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems. + CAUSE: After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out. CONSEQUENCE: -This caused old request details (such as object ids) to be held long past the life of the request and response. +This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. FIX:
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,17 +1 @@ -When running a QMF console application written in Python, after a period of time a noticeable amount of memory is lost due to a slow memory leak. After a long enough period the console would use up all available memory and experience problems. +When running a QMF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.- -CAUSE: - -After each request, objects were begin retained by the sequence manager. But after a request had been settled, the sequence manager was not being cleared out. - -CONSEQUENCE: - -This caused old request details (such as object ids) to be held long past the life of the request and response. The symptom of this is a linear increase in the amount of memory being leaked. - -FIX: - -Explicitly call the sequence manager's cleanup method for a request once the request had been handled. - -RESULT: - -Fixes the memory leak by releasing the unneeded objects.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0045.html