Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 527233

Summary: shadow process bloat
Product: Red Hat Enterprise MRG Reporter: Jon Thomas <jthomas>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Luigi Toscano <ltoscano>
Severity: high Docs Contact:
Priority: high    
Version: 1.2CC: iboverma, jneedle, lbrindle, ltoscano, matt, tao
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
C: Shadow processes maintained a large amount of private (non-shared) memory. C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs. F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed. R: Shadow memory usage was significantly reduced.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 16:08:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    
Attachments:
Description Flags
7.2.4 memory test results of condor_shadow
none
7.3.2 memory test results of condor_shadow
none
7.4.0 memory test results of condor_shadow
none
Test results (condor_shadow, 7.2.2, 7.3.x, 7.4.x)
none
Memory comparison (condor_shadow, 7.2.2, 7.3.x, 7.4.1, 7.4.4) none

Description Jon Thomas 2009-10-05 13:49:10 UTC
Size of shadow process is too large impacting performance

Shadow process bloat on the Central Manager/Scheduler node.

Additional info: http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=787

Comment 1 Timothy St. Clair 2009-10-05 20:23:36 UTC
Created attachment 363737 [details]
7.2.4 memory test results of condor_shadow

Comment 2 Timothy St. Clair 2009-10-05 20:24:09 UTC
Created attachment 363738 [details]
7.3.2 memory test results of condor_shadow

Comment 3 Timothy St. Clair 2009-10-05 20:24:37 UTC
Created attachment 363739 [details]
7.4.0 memory test results of condor_shadow

Comment 4 Timothy St. Clair 2009-10-05 20:41:36 UTC
Performed a simple memory test of 3 different condor builds comparing the results of the condor_shadow:<see attachments> 

Tests were:
1.) ps -aux | grep -i condor
2.) pmap <pid_of_condor_shadow>
3.) top -p <pid_of_condor_shadow> look at RES

This ticket should actually reference:
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=793 

In conversing with the DanB from the UW, the observed results were ~200k bloat, which should only affect systems which have a large number of shadows running. (This is consistent with the data)

From the results, it actually appears that the latest 7.4.0 performs better then it's predecessors, and this issue no longer exists in the latest upstream.
== Pending further information ==

Comment 7 Matthew Farrellee 2009-10-05 23:53:49 UTC
We probably only really care about memory the shadow can write to, and much of that is tied up in libraries, except...

7.2.4

0000000000831000    124K rw---    [ anon ]
0000000001412000    684K rw---    [ anon ]

7.3.2

0000000000887000     16K rw---    [ anon ]
0000000001ff7000    796K rw---    [ anon ]

7.4.0

0000000000872000     16K rw---    [ anon ]
0000000000944000    544K rw---    [ anon ]


Looks like 7.2.4 ~= 7.3.2 > 7.4.0, with 7.4.0 being a good 250K smaller.

Comment 9 Timothy St. Clair 2009-10-06 17:58:19 UTC
fix should be in 7.4.0-0.6
can verify by running tests above against 7.4.0-0.5

Comment 11 Irina Boverman 2009-10-22 20:13:35 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Resolved issue with the size of the shadow process being too large (527233)

Comment 12 Luigi Toscano 2009-11-18 13:34:53 UTC
Created attachment 370081 [details]
Test results (condor_shadow, 7.2.2, 7.3.x, 7.4.x)

Summary of the tests made on RHEL 4.8/5.4, i386/x86_64, condor 7.2.2, 7.3.1/7.3.2 and 7.4.1.
They shows:
- a strange behaviour on RHEL4/i386 (all the versions).
- values obtained from 7.4 (both RES from ps/top and the writable/private field from pmap) are almost the same of the ones from 7.2.

Thus the needinfo flag.

Comment 13 Matthew Farrellee 2009-11-19 04:48:29 UTC
I don't understand the question.

For now,

It looks like no significant regression for RHEL4, though it is unfortunately large.

It looks like no significant regression for RHEL5.

It's unknown why 7.3.2 would be better than 7.4.1.

Comment 14 Luigi Toscano 2009-11-19 11:16:06 UTC
This bug report is about the size of the shadow process.
It was reported against MRG 1.1 (condor 7.2.x), and we all agree that condor 7.4 does not have significant regression, without improvements (that where in 7.3) == shadow uses (almost) the some memory == the bug is not fixed at all.

I suggest either to move back it to ASSIGNED (and maybe change the "target milestone" to 1.3) or to close it as NOTABUG. What do you think about it?

Comment 29 Matthew Farrellee 2010-01-12 12:28:10 UTC
Notable differences, on el5 between 7.4.2-0.1 and 7.4.2-0.2:

7.4.2-0.1
    2528 r-x-- condor_shadow
      16 rw--- condor_shadow
      16 rw---   [ anon ]
     260 rw--- condor_shadow
     900 rw---   [ anon ]
     168 r-x-- libgsoapssl++.so.0.0.0
    2048 ----- libgsoapssl++.so.0.0.0
       8 rw--- libgsoapssl++.so.0.0.0
      12 rw--- libstdc++.so.6.0.8
      40 rw---   [ anon ]
mapped: 66104K    writeable/private: 1964K    shared: 0K

7.4.2-0.2
    2312 r-x-- condor_shadow
      12 rw--- condor_shadow
      16 rw---   [ anon ]
     396 rw---   [ anon ]
      12 rw--- libstdc++.so.6.0.8
      36 rw---   [ anon ]
mapped: 62892K    writeable/private: 1184K    shared: 0K

Comment 31 Luigi Toscano 2010-08-03 16:00:15 UTC
Created attachment 436310 [details]
Memory comparison (condor_shadow, 7.2.2, 7.3.x, 7.4.1, 7.4.4)

Update version of comparison among memory usage (see #c12 for the previous version).
This version adds results for condor 7.4.4-0.4.

Many improvements have been implemented, the usage of both writable/private memory returned by pmap and RES memory is decreased by 25% in the latest version.

Interesting enough, the worst values shown on RHEL4/32 do not appear anymore for jobs following the first one.

Given the results of the test on all supported RHEL architectures, I'm going to move the status of this bug as VERIFIED.

Comment 32 Douglas Silas 2010-10-06 14:57:21 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-Resolved issue with the size of the shadow process being too large (527233)+Resolved issue with the size of the shadow process being too large (527233)
+
+[silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]

Comment 33 Douglas Silas 2010-10-11 10:13:08 UTC
Removing this bug from the Technical Notes since a more detailed description was not provided.

Comment 34 Douglas Silas 2010-10-11 10:13:08 UTC
Deleted Technical Notes Contents.

Old Contents:
Resolved issue with the size of the shadow process being too large (527233)

[silas: can someone providee a more detailed description of this bug, the fix and the result? Otherwise we may have to omit it. Thanks.]

Comment 35 Matthew Farrellee 2010-10-11 13:50:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Shadow processes maintained a large amount of private (non-shared) memory.
C: Submit nodes need to be high memory machines and could still end up in swap when running 20K+ jobs.
F: Shadow code analysis revealed a number of linked libraries holding private memory that were not necessary. The linkages were removed.
R: Shadow memory usage was significantly reduced.

Comment 37 errata-xmlrpc 2010-10-14 16:08:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html