Description of problem: Seems like memory usage is hovering around 100 % during repo sync. Real small repositories seems to be okay. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Run repo sync on a repo 2. 3. Actual results: In the following case I had already created repo once before and hence the packages existed on the server. Brand new sync after creating took over 8 minutes. [root@preethi ~]# pulp-admin -u admin -p admin rpm repo create --repo-id scientific --feed http://ftp1.scientificlinux.org/linux/scientific/6x/x86_64/updates/security/ Successfully created repository [scientific] [root@preethi ~]# time pulp-admin rpm repo sync run --repo-id scientific+----------------------------------------------------------------------+ Synchronizing Repository [scientific] +----------------------------------------------------------------------+ This command may be exited by pressing ctrl+c without affecting the actual operation on the server. Downloading metadata... [-] ... completed Downloading repository content... [==================================================] 100% RPMs: 506/506 items Delta RPMs: 0/0 items Tree Files: 0/0 items Files: 0/0 items ... completed Importing errata... [-] ... completed Importing package groups/categories... [\] ... completed Publishing packages... [==================================================] 100% Packages: 0/0 items ... completed Publishing distributions... [==================================================] 100% Distributions: 0/0 items ... completed Generating metadata [/] ... completed Publishing repository over HTTP [-] ... skipped Publishing repository over HTTPS [-] ... completed real 4m7.349s user 0m7.587s sys 0m0.334s [root@preethi ~]# Memory usage data at various stages of sync is below Metadata download PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22161 apache 20 0 389m 109m 6852 R 99.4 1.8 0:40.31 python 22150 root 20 0 279m 20m 5992 S 3.3 0.3 0:02.36 pulp-admin 20706 apache 20 0 1185m 44m 7392 S 2.0 0.7 1:30.09 httpd 20707 apache 20 0 210m 9288 3220 S 0.7 0.2 0:00.27 httpd 20710 apache 20 0 210m 9264 3212 S 0.7 0.2 0:00.27 httpd 20711 apache 20 0 210m 9288 3232 S 0.7 0.2 0:00.27 httpd 20712 apache 20 0 210m 9288 3228 S 0.7 0.2 0:00.27 httpd 20713 apache 20 0 210m 9464 3376 S 0.7 0.2 0:00.27 httpd Rpm download PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20706 apache 20 0 1192m 90m 7392 S 100.0 1.5 1:47.38 httpd 22161 apache 20 0 523m 243m 6864 S 7.6 4.1 1:53.63 python 6477 mongodb 20 0 1486m 139m 105m S 0.7 2.3 62:05.24 mongod 20714 apache 20 0 210m 9552 3484 S 0.7 0.2 0:00.62 httpd 22150 root 20 0 279m 20m 5996 S 0.7 0.3 0:05.40 pulp-admin 1 root 20 0 61992 3296 1192 S 0.0 0.1 0:11.14 systemd Packagegroup download PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20706 apache 20 0 2203m 759m 7496 S 116.3 12.7 2:55.52 httpd 22150 root 20 0 279m 20m 5996 S 2.0 0.3 0:06.81 pulp-admin 6477 mongodb 20 0 1486m 228m 194m S 1.0 3.8 62:07.40 mongod 20707 apache 20 0 210m 9552 3484 S 1.0 0.2 0:00.80 httpd 20708 apache 20 0 210m 9552 3484 S 0.7 0.2 0:00.90 httpd 20713 apache 20 0 210m 9464 3376 S 0.7 0.2 0:00.79 httpd Expected results: Additional info:
I also experienced this bug when trying to synchronize with the Red Hat CDN for RHEL 6. My system used ~3 GB and then failed due to exhaustion of memory.
I am able to attribute the excessive memory use to grinder. Specifically, this change increased memory use by at least 10x: http://git.fedorahosted.org/cgit/grinder.git/commit/?id=5727f5238b3202177a62687a97335092128731d6 You can duplicate this by doing the following in a python shell: >>> from grinder.YumInfo import YumMetadataObj >>> y = YumMetadataObj('reposci', 'http://ftp1.scientificlinux.org/linux/scientific/6x/x86_64/updates/security/') >>> x = y.getDownloadItems() You will see the python process using ~240MB of RAM, which is exactly what you see pulp do during a repo sync, at the end of the metadata retrieval stage. It's even easier to isolate because grinder spawns a new python process to do the metadata fetching, which is shown with PID 22161 in Preethi's output above. In her second example, you can see it with "243m" in the "RES" column.
Closing out the master version of this, it's being handled in 2.0.