Bug 872190

Summary: Very high memory usage during repo sync
Product: [Retired] Pulp Reporter: Preethi Thomas <pthomas>
Component: rpm-supportAssignee: Pradeep Kilambi <pkilambi>
Status: CLOSED NOTABUG QA Contact: Preethi Thomas <pthomas>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: MasterCC: jason.dobies, mhrivnak, pkilambi, rbarlow, skarmark
Target Milestone: ---Keywords: Triaged
Target Release: Sprint 42   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 873313 (view as bug list) Environment:
Last Closed: 2012-11-27 19:26:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 873313    

Description Preethi Thomas 2012-11-01 14:15:23 UTC
Description of problem:

Seems like memory usage is hovering around 100 % during repo sync. Real small repositories seems to be okay.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Run repo sync on a repo
2.
3.
  
Actual results:
In the following case I had already created repo once before and hence the packages existed on the server. Brand new sync after creating took over 8 minutes.
[root@preethi ~]# pulp-admin -u admin -p admin rpm repo create  --repo-id scientific --feed http://ftp1.scientificlinux.org/linux/scientific/6x/x86_64/updates/security/
Successfully created repository [scientific]

[root@preethi ~]# time pulp-admin rpm repo sync run  --repo-id scientific+----------------------------------------------------------------------+
                 Synchronizing Repository [scientific]
+----------------------------------------------------------------------+

This command may be exited by pressing ctrl+c without affecting the actual
operation on the server.

Downloading metadata...
[-]
... completed

Downloading repository content...
[==================================================] 100%
RPMs:       506/506 items
Delta RPMs: 0/0 items
Tree Files: 0/0 items
Files:      0/0 items
... completed

Importing errata...
[-]
... completed

Importing package groups/categories...
[\]
... completed

Publishing packages...
[==================================================] 100%
Packages: 0/0 items
... completed

Publishing distributions...
[==================================================] 100%
Distributions: 0/0 items
... completed

Generating metadata
[/]
... completed

Publishing repository over HTTP
[-]
... skipped

Publishing repository over HTTPS
[-]
... completed


real	4m7.349s
user	0m7.587s
sys	0m0.334s
[root@preethi ~]# 


Memory usage data at various stages of sync is below
Metadata download

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
22161 apache    20   0  389m 109m 6852 R 99.4  1.8   0:40.31 python             
22150 root      20   0  279m  20m 5992 S  3.3  0.3   0:02.36 pulp-admin         
20706 apache    20   0 1185m  44m 7392 S  2.0  0.7   1:30.09 httpd              
20707 apache    20   0  210m 9288 3220 S  0.7  0.2   0:00.27 httpd              
20710 apache    20   0  210m 9264 3212 S  0.7  0.2   0:00.27 httpd              
20711 apache    20   0  210m 9288 3232 S  0.7  0.2   0:00.27 httpd              
20712 apache    20   0  210m 9288 3228 S  0.7  0.2   0:00.27 httpd              
20713 apache    20   0  210m 9464 3376 S  0.7  0.2   0:00.27 httpd  


Rpm download

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
20706 apache    20   0 1192m  90m 7392 S 100.0  1.5   1:47.38 httpd             
22161 apache    20   0  523m 243m 6864 S  7.6  4.1   1:53.63 python             
 6477 mongodb   20   0 1486m 139m 105m S  0.7  2.3  62:05.24 mongod             
20714 apache    20   0  210m 9552 3484 S  0.7  0.2   0:00.62 httpd              
22150 root      20   0  279m  20m 5996 S  0.7  0.3   0:05.40 pulp-admin         
    1 root      20   0 61992 3296 1192 S  0.0  0.1   0:11.14 systemd       



Packagegroup download

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
20706 apache    20   0 2203m 759m 7496 S 116.3 12.7   2:55.52 httpd             
22150 root      20   0  279m  20m 5996 S  2.0  0.3   0:06.81 pulp-admin         
 6477 mongodb   20   0 1486m 228m 194m S  1.0  3.8  62:07.40 mongod             
20707 apache    20   0  210m 9552 3484 S  1.0  0.2   0:00.80 httpd              
20708 apache    20   0  210m 9552 3484 S  0.7  0.2   0:00.90 httpd              
20713 apache    20   0  210m 9464 3376 S  0.7  0.2   0:00.79 httpd              
Expected results:


Additional info:

Comment 1 Randy Barlow 2012-11-01 14:21:57 UTC
I also experienced this bug when trying to synchronize with the Red Hat CDN for RHEL 6. My system used ~3 GB and then failed due to exhaustion of memory.

Comment 2 Michael Hrivnak 2012-11-01 15:20:20 UTC
I am able to attribute the excessive memory use to grinder. Specifically, this change increased memory use by at least 10x:

http://git.fedorahosted.org/cgit/grinder.git/commit/?id=5727f5238b3202177a62687a97335092128731d6

You can duplicate this by doing the following in a python shell:

>>> from grinder.YumInfo import YumMetadataObj
>>> y = YumMetadataObj('reposci', 'http://ftp1.scientificlinux.org/linux/scientific/6x/x86_64/updates/security/')
>>> x = y.getDownloadItems()

You will see the python process using ~240MB of RAM, which is exactly what you see pulp do during a repo sync, at the end of the metadata retrieval stage. It's even easier to isolate because grinder spawns a new python process to do the metadata fetching, which is shown with PID 22161 in Preethi's output above. In her second example, you can see it with "243m" in the "RES" column.

Comment 3 Jay Dobies 2012-11-27 19:26:44 UTC
Closing out the master version of this, it's being handled in 2.0.