Bug 742874

Summary: condor_schedd memory consumption (non-leak) with new classads
Product: Red Hat Enterprise MRG Reporter: Stanislav Graf <sgraf>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: POST --- QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: iboverma, matt, mkudlej, tstclair
Target Milestone: ---Keywords: Rebase
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
pmap for condor_schedd from MRG1
none
pmap for condor_schedd from MRG2
none
pmap for condor_schedd from MRG2 (rhel5)
none
el5 raw data
none
el5.7 raw data
none
heap profile for schedd from 100 job submission on 7.6
none
heap profile for schedd from 100 job submission on 7.4
none
heap profile for schedd from 200 job submission on 7.6 none

Description Stanislav Graf 2011-10-03 04:12:50 EDT
Description of problem:
I was reproducing Bug 725473 - "Mega should be upper-case M" which is about submitting 1 million jobs to condor. During reproduction I found out that with all submitted jobs condor_schedd is consuming 3.4GB (for MRG1) and 4.4GB (for MRG2) - the non-swapped physical memory that a task has used (RSS). See attachment for memory details and condor versions. I have reproduce this scenario twice and memory consumption was approximately the same.

Is the difference between MRG1-MRG2 and whole memory consumption ok? 

MRG1 results:
[root@local-rhel5x ~]# ps l $(pidof condor_schedd)
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    64  3075  2977  18   0 3785700 3590200 -   Ssl  ?          6:31 condor_schedd -f

MRG2 results:
[root@local-rhel6x ~]# ps l $(pidof condor_schedd)
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    64  7709  6506  20   0 4953760 4616580 -   Rsl  ?          8:22 condor_schedd -f

Version-Release number of selected component (if applicable):
condor.x86_64      7.4.5-0.8.el5         rhel-x86_64-server-5-mrg-grid-1
condor.x86_64      7.6.3-0.3.el6         rhel-x86_64-server-6-mrg-grid-2    

How reproducible:
100%

Steps to Reproduce:
1. Setup and start condor
2. Submit 10x100k = 1M jobs
condor_submit << EOF
Executable     = /bin/sleep      
Universe = vanilla               
args = 15h                       
queue 100000                     
EOF                              
3. Verify number of jobs through cumin web interface
4. Check RSS (the non-swapped physical memory that a task has used) with ps command
ps l $(pidof condor_schedd)
  
Actual results:
1. condor_schedd from MRG2 allocates 30% more memory than MRG1
2. condor_schedd allocates several gigabytes of memory

Expected results:
1. condor_schedd from MRG2 allocates the same amount of memory as MRG1
2. condor_schedd allocates smaller amount of memory

Additional info:
Comment 1 Stanislav Graf 2011-10-03 04:13:59 EDT
Created attachment 525994 [details]
pmap for condor_schedd from MRG1
Comment 2 Stanislav Graf 2011-10-03 04:14:47 EDT
Created attachment 525995 [details]
pmap for condor_schedd from MRG2
Comment 3 Matthew Farrellee 2011-10-03 08:58:56 EDT
This comparison is between condor versions, but also between EL versions,

condor.x86_64      7.4.5-0.8.el5         rhel-x86_64-server-5-mrg-grid-1
condor.x86_64      7.6.3-0.3.el6         rhel-x86_64-server-6-mrg-grid-2

Please provide rhel-x86_64-server-5-mrg-grid-2 data for comparison.
Comment 4 Stanislav Graf 2011-10-04 02:40:25 EDT
condor.x86_64       7.6.3-0.3.el5        rhel-x86_64-server-5-mrg-grid-2

[root@local-rhel5x ~]# ps l $(pidof condor_schedd)
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    64  3307  3265  25   0 4735140 4533688 -   Rsl  ?          9:03 condor_schedd -f
[root@local-rhel5x ~]# condor_restart -schedd
Sent "Restart" command to local schedd
...
wait until condor_schedd restarts and loads...
...
[root@local-rhel5x ~]# ps l $(pidof condor_schedd)
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4    64  3945  3265  25   0 4628860 4426728 -   Rsl  ?          1:43 condor_schedd -f

Submit of 100k jobs takes about 1m:20s. Doesn't matter if it's first or last submit.
Comment 5 Stanislav Graf 2011-10-04 08:45:05 EDT
Created attachment 526238 [details]
pmap for condor_schedd from MRG2 (rhel5)
Comment 7 Matthew Farrellee 2011-10-04 14:24:46 EDT
Created attachment 526294 [details]
el5 raw data

Notes - 7.6.4-0.6.el5 -

No local broker running or configuration for a remote broker.

jobs: echo -e "cmd=/bin/true\nhold=true\nqueue 100000" | condor_submit

condor:
empty   - mapped: 64600K    writeable/private: 1864K     shared: 0K
100K    - mapped: 558324K   writeable/private: 495588K   shared: 0K
200K    - mapped: 903208K   writeable/private: 840472K   shared: 0K
300K    - mapped: 1245816K  writeable/private: 1183080K  shared: 0K
1M      - mapped: 3641496K  writeable/private: 3578760K  shared: 0K
restart - mapped: 2019860K  writeable/private: 1957124K  shared: 0K 

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=TRUE:
empty   - mapped: 208844K   writeable/private: 43576K    shared: 0K
100K    - mapped: 730284K   writeable/private: 565016K   shared: 0K
200K    - mapped: 1103548K  writeable/private: 938280K   shared: 0K
300K    - mapped: 1414944K  writeable/private: 1249688K  shared: 0K
1M      - mapped: 4152100K  writeable/private: 4004396K  shared: 0K
restart - mapped: 4180548K  writeable/private: 3979576K  shared: 0K

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=FALSE:
empty   - mapped: 209108K   writeable/private: 43840K    shared: 0K
100K    -
200K    -
300K    -
1M      - mapped: 3960992K  writeable/private: 3858416K  shared: 0K
restart - mapped: 4259528K  writeable/private: 4138124K  shared: 0K
Comment 8 Matthew Farrellee 2011-10-04 16:48:47 EDT
(In reply to comment #7)
> Created attachment 526294 [details]
> el5 raw data
> 
> Notes - 7.6.4-0.6.el5 -
> 
> No local broker running or configuration for a remote broker.
> 
> jobs: echo -e "cmd=/bin/true\nhold=true\nqueue 100000" | condor_submit
> 
> condor:
> empty   - mapped: 64600K    writeable/private: 1864K     shared: 0K
> 100K    - mapped: 558324K   writeable/private: 495588K   shared: 0K
> 200K    - mapped: 903208K   writeable/private: 840472K   shared: 0K
> 300K    - mapped: 1245816K  writeable/private: 1183080K  shared: 0K
> 1M      - mapped: 3641496K  writeable/private: 3578760K  shared: 0K
> restart - mapped: 2019860K  writeable/private: 1957124K  shared: 0K 
> 
> condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=TRUE:
> empty   - mapped: 208844K   writeable/private: 43576K    shared: 0K
> 100K    - mapped: 730284K   writeable/private: 565016K   shared: 0K
> 200K    - mapped: 1103548K  writeable/private: 938280K   shared: 0K
> 300K    - mapped: 1414944K  writeable/private: 1249688K  shared: 0K
> 1M      - mapped: 4152100K  writeable/private: 4004396K  shared: 0K
> restart - mapped: 4180548K  writeable/private: 3979576K  shared: 0K
> 
> condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=FALSE:
> empty   - mapped: 209108K   writeable/private: 43840K    shared: 0K
> 100K    -
> 200K    -
> 300K    -
> 1M      - mapped: 3960992K  writeable/private: 3858416K  shared: 0K
> restart - mapped: 4259528K  writeable/private: 4138124K  shared: 0K

This was all on EL5.4! Best to recreate with EL5.7.
Comment 9 Matthew Farrellee 2011-10-05 14:21:19 EDT
Created attachment 526556 [details]
el5.7 raw data

redhat-release-5Server-5.7.0.3 - condor-7.6.4-0.6.el5 -

condor:
empty   - mapped: 64904K    writeable/private: 2132K     shared: 0K
100K    - mapped: 558488K   writeable/private: 495716K   shared: 0K
200K    - mapped: 903240K   writeable/private: 840468K   shared: 0K
300K    - mapped: 1245716K  writeable/private: 1182944K  shared: 0K
1M      - mapped: 3641792K  writeable/private: 3579020K  shared: 0K
restart - mapped: 3689772K  writeable/private: 3627000K  shared: 0K

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=TRUE:
empty   - mapped: 209140K   writeable/private: 43836K    shared: 0K
100K    - mapped: 698808K   writeable/private: 576932K   shared: 0K
200K    - mapped: 1158584K  writeable/private: 1025004K  shared: 0K
300K    - mapped: 1551296K  writeable/private: 1393452K  shared: 0K
1M      - mapped: 4204324K  writeable/private: 4092480K  shared: 0K
restart - mapped: 4381652K  writeable/private: 4194744K  shared: 0K

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=FALSE:
empty   - mapped: 209144K   writeable/private: 43840K    shared: 0K
100K    - mapped: 722336K   writeable/private: 565940K   shared: 0K
200K    - mapped: 1126144K  writeable/private: 969748K   shared: 0K
300K    - mapped: 1495376K  writeable/private: 1338980K  shared: 0K
1M      - mapped: 4056400K  writeable/private: 3900004K  shared: 0K
restart - mapped: 4311736K  writeable/private: 4125276K  shared: 0K
Comment 10 Matthew Farrellee 2011-10-05 15:27:04 EDT
redhat-release-5Server-5.7.0.3 - condor-7.6.4-0.6.el5 -

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=TRUE + qpidd:
empty   - mapped:  218264K  writeable/private:   44060K  shared: 0K
100K    - mapped:  739744K  writeable/private:  565540K  shared: 0K
200K    - mapped: 1113036K  writeable/private:  938832K  shared: 0K
300K    - mapped: 1483724K  writeable/private: 1358192K  shared: 0K
1M      - mapped: 4248524K  writeable/private: 4082704K  shared: 0K
restart - mapped: 4155808K  writeable/private: 3934476K  shared: 0K

condor + condor-qmf + QMF_PUBLISH_SUBMISSIONS=FALSE + qpidd:
empty   - mapped:  218592K  writeable/private:   44204K  shared: 0K
100K    - mapped:  678256K  writeable/private:  524580K  shared: 0K
200K    - mapped: 1072496K  writeable/private:  935228K  shared: 0K
300K    - mapped: 1467760K  writeable/private: 1301124K  shared: 0K
1M      - mapped: 3970416K  writeable/private: 3858880K  shared: 0K
restart - mapped: 4251748K  writeable/private: 4036804K  shared: 0K
Comment 11 Matthew Farrellee 2011-10-05 15:33:43 EDT
So far I'm not approaching the 4.8GB seen in comment 4. Suggestions welcome.
Comment 12 Stanislav Graf 2011-10-06 03:40:20 EDT
You used 1 million different jobs ( comment 7 ) than I did (original post).

So I did reproduction with submitting your jobs on RHEL5 both MRG1 and MRG2. And again MRG2 reports much more memory - it seems like MRG2 consumes 50% more than MRG1 in this case:

MRG2 on EL5
Njobs F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  
empty 4    64  1458  1369  15   0 238196 101472 -     
100K  4    64  1458  1369  19   0 674172 537476 -     
200k  4    64  1458  1369  19   0 1111028 907908 -    
300k  4    64  1458  1369  25   0 1483764 1281576 -   
1M    4    64  1458  1369  20   0 4100084 3897164 -   

MRG1 on EL5
Njobs F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  
empty 4    64  3548  3391  15   0 138540  9512 -      
100K  4    64  3548  3391  18   0 534096 405476 -     
200K  4    64  3548  3391  19   0 777904 649304 -     
300K  4    64  3548  3391  25   0 1024884 896264 -    
1M    4    64  3548  3391  25   0 2792672 2597664 -
Comment 13 Timothy St. Clair 2011-10-06 14:15:20 EDT
Created attachment 526766 [details]
heap profile for schedd from 100 job submission on 7.6
Comment 14 Timothy St. Clair 2011-10-06 14:15:49 EDT
Created attachment 526767 [details]
heap profile for schedd from 100 job submission on 7.4
Comment 15 Timothy St. Clair 2011-10-06 14:35:58 EDT
When performing a heap profile of ~100 Job run (on F15) there is only a marginal difference.

If we take the delta @ peak 1.973-1.885*1000000/100 ~= 880MB difference @ peak if the initial delta *is not* a 1-time pad (on F15)    
It's important to note this drains off rather quickly.  

IMHO, the real question should be, how is it's memory usage while it's churning away to zero, not during peak load.  Once the shedd is burning down the load should be decreased significantly.

Also of interest, when running leak detection on 7.6, I notice nothing appreciable, which means we are not leaking at runtime.  The peak delta may just be the difference between the two series, but I've not traced through the details.
Comment 16 Timothy St. Clair 2011-10-06 14:57:20 EDT
When looking @ peak delta, it becomes obvious the difference is from new classads (mmmm tastey crow).  But again, watching over time is more important then the burst.
Comment 17 Timothy St. Clair 2011-10-06 16:19:29 EDT
Created attachment 526791 [details]
heap profile for schedd from 200 job submission on 7.6
Comment 19 Matthew Farrellee 2011-10-11 08:47:03 EDT
Given no leak, this is a regression in memory performance between 7.4 and 7.6. It should be addressed as such.
Comment 20 Timothy St. Clair 2011-10-11 16:53:21 EDT
*** Bug 605635 has been marked as a duplicate of this bug. ***
Comment 21 Timothy St. Clair 2011-11-03 09:49:57 EDT
Tracking upstream: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2541
Comment 22 Timothy St. Clair 2011-11-17 12:41:45 EST
Modification made upstream and will be introduced in the 7.7.5 series.
Comment 23 Timothy St. Clair 2012-06-25 11:07:27 EDT
Modification is in 7.9.0
Comment 27 Mike McCune 2016-03-28 18:39:30 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions