Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 527233 - shadow process bloat
shadow process bloat
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.2
All Linux
high Severity high
: 1.3
: ---
Assigned To: Matthew Farrellee
Luigi Toscano
:
Depends On:
Blocks: 527551
  Show dependency treegraph
 
Reported: 2009-10-05 09:49 EDT by Jon Thomas
Modified: 2018-10-27 10:09 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
C: Shadow processes maintained a large amount of private (non-shared) memory. C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs. F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed. R: Shadow memory usage was significantly reduced.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 12:08:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
7.2.4 memory test results of condor_shadow (7.35 KB, text/plain)
2009-10-05 16:23 EDT, Timothy St. Clair
no flags Details
7.3.2 memory test results of condor_shadow (7.46 KB, text/plain)
2009-10-05 16:24 EDT, Timothy St. Clair
no flags Details
7.4.0 memory test results of condor_shadow (6.68 KB, text/plain)
2009-10-05 16:24 EDT, Timothy St. Clair
no flags Details
Test results (condor_shadow, 7.2.2, 7.3.x, 7.4.x) (9.94 KB, application/vnd.oasis.opendocument.spreadsheet)
2009-11-18 08:34 EST, Luigi Toscano
no flags Details
Memory comparison (condor_shadow, 7.2.2, 7.3.x, 7.4.1, 7.4.4) (10.97 KB, application/vnd.oasis.opendocument.spreadsheet)
2010-08-03 12:00 EDT, Luigi Toscano
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 11:56:44 EDT

  None (edit)
Description Jon Thomas 2009-10-05 09:49:10 EDT
Size of shadow process is too large impacting performance

Shadow process bloat on the Central Manager/Scheduler node.

Additional info: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=787
Comment 1 Timothy St. Clair 2009-10-05 16:23:36 EDT
Created attachment 363737 [details]
7.2.4 memory test results of condor_shadow
Comment 2 Timothy St. Clair 2009-10-05 16:24:09 EDT
Created attachment 363738 [details]
7.3.2 memory test results of condor_shadow
Comment 3 Timothy St. Clair 2009-10-05 16:24:37 EDT
Created attachment 363739 [details]
7.4.0 memory test results of condor_shadow
Comment 4 Timothy St. Clair 2009-10-05 16:41:36 EDT
Performed a simple memory test of 3 different condor builds comparing the results of the condor_shadow:<see attachments> 

Tests were:
1.) ps -aux | grep -i condor
2.) pmap <pid_of_condor_shadow>
3.) top -p <pid_of_condor_shadow> look at RES

This ticket should actually reference:
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=793 

In conversing with the DanB from the UW, the observed results were ~200k bloat, which should only affect systems which have a large number of shadows running. (This is consistent with the data)

From the results, it actually appears that the latest 7.4.0 performs better then it's predecessors, and this issue no longer exists in the latest upstream.
== Pending further information ==
Comment 7 Matthew Farrellee 2009-10-05 19:53:49 EDT
We probably only really care about memory the shadow can write to, and much of that is tied up in libraries, except...

7.2.4

0000000000831000    124K rw---    [ anon ]
0000000001412000    684K rw---    [ anon ]

7.3.2

0000000000887000     16K rw---    [ anon ]
0000000001ff7000    796K rw---    [ anon ]

7.4.0

0000000000872000     16K rw---    [ anon ]
0000000000944000    544K rw---    [ anon ]


Looks like 7.2.4 ~= 7.3.2 > 7.4.0, with 7.4.0 being a good 250K smaller.
Comment 9 Timothy St. Clair 2009-10-06 13:58:19 EDT
fix should be in 7.4.0-0.6
can verify by running tests above against 7.4.0-0.5
Comment 11 Irina Boverman 2009-10-22 16:13:35 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Resolved issue with the size of the shadow process being too large (527233)
Comment 12 Luigi Toscano 2009-11-18 08:34:53 EST
Created attachment 370081 [details]
Test results (condor_shadow, 7.2.2, 7.3.x, 7.4.x)

Summary of the tests made on RHEL 4.8/5.4, i386/x86_64, condor 7.2.2, 7.3.1/7.3.2 and 7.4.1.
They shows:
- a strange behaviour on RHEL4/i386 (all the versions).
- values obtained from 7.4 (both RES from ps/top and the writable/private field from pmap) are almost the same of the ones from 7.2.

Thus the needinfo flag.
Comment 13 Matthew Farrellee 2009-11-18 23:48:29 EST
I don't understand the question.

For now,

It looks like no significant regression for RHEL4, though it is unfortunately large.

It looks like no significant regression for RHEL5.

It's unknown why 7.3.2 would be better than 7.4.1.
Comment 14 Luigi Toscano 2009-11-19 06:16:06 EST
This bug report is about the size of the shadow process.
It was reported against MRG 1.1 (condor 7.2.x), and we all agree that condor 7.4 does not have significant regression, without improvements (that where in 7.3) == shadow uses (almost) the some memory == the bug is not fixed at all.

I suggest either to move back it to ASSIGNED (and maybe change the "target milestone" to 1.3) or to close it as NOTABUG. What do you think about it?
Comment 29 Matthew Farrellee 2010-01-12 07:28:10 EST
Notable differences, on el5 between 7.4.2-0.1 and 7.4.2-0.2:

7.4.2-0.1
    2528 r-x-- condor_shadow
      16 rw--- condor_shadow
      16 rw---   [ anon ]
     260 rw--- condor_shadow
     900 rw---   [ anon ]
     168 r-x-- libgsoapssl++.so.0.0.0
    2048 ----- libgsoapssl++.so.0.0.0
       8 rw--- libgsoapssl++.so.0.0.0
      12 rw--- libstdc++.so.6.0.8
      40 rw---   [ anon ]
mapped: 66104K    writeable/private: 1964K    shared: 0K

7.4.2-0.2
    2312 r-x-- condor_shadow
      12 rw--- condor_shadow
      16 rw---   [ anon ]
     396 rw---   [ anon ]
      12 rw--- libstdc++.so.6.0.8
      36 rw---   [ anon ]
mapped: 62892K    writeable/private: 1184K    shared: 0K
Comment 31 Luigi Toscano 2010-08-03 12:00:15 EDT
Created attachment 436310 [details]
Memory comparison (condor_shadow, 7.2.2, 7.3.x, 7.4.1, 7.4.4)

Update version of comparison among memory usage (see #c12 for the previous version).
This version adds results for condor 7.4.4-0.4.

Many improvements have been implemented, the usage of both writable/private memory returned by pmap and RES memory is decreased by 25% in the latest version.

Interesting enough, the worst values shown on RHEL4/32 do not appear anymore for jobs following the first one.

Given the results of the test on all supported RHEL architectures, I'm going to move the status of this bug as VERIFIED.
Comment 32 Douglas Silas 2010-10-06 10:57:21 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-Resolved issue with the size of the shadow process being too large (527233)+Resolved issue with the size of the shadow process being too large (527233)
+
+[silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]
Comment 33 Douglas Silas 2010-10-11 06:13:08 EDT
Removing this bug from the Technical Notes since a more detailed description was not provided.
Comment 34 Douglas Silas 2010-10-11 06:13:08 EDT
Deleted Technical Notes Contents.

Old Contents:
Resolved issue with the size of the shadow process being too large (527233)

[silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]
Comment 35 Matthew Farrellee 2010-10-11 09:50:01 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Shadow processes maintained a large amount of private (non-shared) memory.
C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs.
F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed.
R: Shadow memory usage was significantly reduced.
Comment 37 errata-xmlrpc 2010-10-14 12:08:33 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.