Bug 527233
| Summary: | shadow process bloat | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Jon Thomas <jthomas> |
| Component: | condor | Assignee: | Matthew Farrellee <matt> |
| Status: | CLOSED ERRATA | QA Contact: | Luigi Toscano <ltoscano> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 1.2 | CC: | iboverma, jneedle, lbrindle, ltoscano, matt, tao |
| Target Milestone: | 1.3 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
C: Shadow processes maintained a large amount of private (non-shared) memory.
C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs.
F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed.
R: Shadow memory usage was significantly reduced.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-10-14 16:08:33 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 527551 | ||
| Attachments: | |||
|
Description
Jon Thomas
2009-10-05 13:49:10 UTC
Created attachment 363737 [details]
7.2.4 memory test results of condor_shadow
Created attachment 363738 [details]
7.3.2 memory test results of condor_shadow
Created attachment 363739 [details]
7.4.0 memory test results of condor_shadow
Performed a simple memory test of 3 different condor builds comparing the results of the condor_shadow:<see attachments> Tests were: 1.) ps -aux | grep -i condor 2.) pmap <pid_of_condor_shadow> 3.) top -p <pid_of_condor_shadow> look at RES This ticket should actually reference: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=793 In conversing with the DanB from the UW, the observed results were ~200k bloat, which should only affect systems which have a large number of shadows running. (This is consistent with the data) From the results, it actually appears that the latest 7.4.0 performs better then it's predecessors, and this issue no longer exists in the latest upstream. == Pending further information == We probably only really care about memory the shadow can write to, and much of that is tied up in libraries, except... 7.2.4 0000000000831000 124K rw--- [ anon ] 0000000001412000 684K rw--- [ anon ] 7.3.2 0000000000887000 16K rw--- [ anon ] 0000000001ff7000 796K rw--- [ anon ] 7.4.0 0000000000872000 16K rw--- [ anon ] 0000000000944000 544K rw--- [ anon ] Looks like 7.2.4 ~= 7.3.2 > 7.4.0, with 7.4.0 being a good 250K smaller. fix should be in 7.4.0-0.6 can verify by running tests above against 7.4.0-0.5 Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Resolved issue with the size of the shadow process being too large (527233) Created attachment 370081 [details]
Test results (condor_shadow, 7.2.2, 7.3.x, 7.4.x)
Summary of the tests made on RHEL 4.8/5.4, i386/x86_64, condor 7.2.2, 7.3.1/7.3.2 and 7.4.1.
They shows:
- a strange behaviour on RHEL4/i386 (all the versions).
- values obtained from 7.4 (both RES from ps/top and the writable/private field from pmap) are almost the same of the ones from 7.2.
Thus the needinfo flag.
I don't understand the question. For now, It looks like no significant regression for RHEL4, though it is unfortunately large. It looks like no significant regression for RHEL5. It's unknown why 7.3.2 would be better than 7.4.1. This bug report is about the size of the shadow process. It was reported against MRG 1.1 (condor 7.2.x), and we all agree that condor 7.4 does not have significant regression, without improvements (that where in 7.3) == shadow uses (almost) the some memory == the bug is not fixed at all. I suggest either to move back it to ASSIGNED (and maybe change the "target milestone" to 1.3) or to close it as NOTABUG. What do you think about it? Notable differences, on el5 between 7.4.2-0.1 and 7.4.2-0.2:
7.4.2-0.1
2528 r-x-- condor_shadow
16 rw--- condor_shadow
16 rw--- [ anon ]
260 rw--- condor_shadow
900 rw--- [ anon ]
168 r-x-- libgsoapssl++.so.0.0.0
2048 ----- libgsoapssl++.so.0.0.0
8 rw--- libgsoapssl++.so.0.0.0
12 rw--- libstdc++.so.6.0.8
40 rw--- [ anon ]
mapped: 66104K writeable/private: 1964K shared: 0K
7.4.2-0.2
2312 r-x-- condor_shadow
12 rw--- condor_shadow
16 rw--- [ anon ]
396 rw--- [ anon ]
12 rw--- libstdc++.so.6.0.8
36 rw--- [ anon ]
mapped: 62892K writeable/private: 1184K shared: 0K
Created attachment 436310 [details]
Memory comparison (condor_shadow, 7.2.2, 7.3.x, 7.4.1, 7.4.4)
Update version of comparison among memory usage (see #c12 for the previous version).
This version adds results for condor 7.4.4-0.4.
Many improvements have been implemented, the usage of both writable/private memory returned by pmap and RES memory is decreased by 25% in the latest version.
Interesting enough, the worst values shown on RHEL4/32 do not appear anymore for jobs following the first one.
Given the results of the test on all supported RHEL architectures, I'm going to move the status of this bug as VERIFIED.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1 +1,3 @@
-Resolved issue with the size of the shadow process being too large (527233)+Resolved issue with the size of the shadow process being too large (527233)
+
+[silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]
Removing this bug from the Technical Notes since a more detailed description was not provided. Deleted Technical Notes Contents. Old Contents: Resolved issue with the size of the shadow process being too large (527233) [silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
C: Shadow processes maintained a large amount of private (non-shared) memory.
C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs.
F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed.
R: Shadow memory usage was significantly reduced.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html |