Bug 737523

Summary: RHUA ran out of memory and triggered the oom killer
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: James Slagle <jslagle>
Component: RHUAAssignee: John Matthews <jmatthew>
Status: CLOSED ERRATA QA Contact: dgao
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0.2CC: dgao, igulina, jmatthew, kbidarka, sghai, tsanders
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-01 22:05:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 746803    
Attachments:
Description Flags
httpd log
none
httpd log
none
system log
none
pulp log
none
mongodb log snippet
none
. none

Description James Slagle 2011-09-12 12:29:21 UTC
Default setup of our RHUA 2.0 in EC2.  Instance type is c1.xlarge, with has 7 GB of memory, and 8 cpu's.

The initial sync of all 72 repos and 2 CDS's completed w/o error, however on future syncs, it kept consuming more memory and eventually triggered the kernel OOM killer.  Several different processes ended up triggering the oom killer (can be seen in the attached log).  Obvioulsy, this left the system in a non-functional state.

Attaching /var/log/messages, and contents of /var/log/pulp, /var/log/httpd, and /var/log/mongodb.

Comment 1 James Slagle 2011-09-12 12:30:39 UTC
Created attachment 522687 [details]
httpd log

Comment 2 James Slagle 2011-09-12 12:33:24 UTC
Created attachment 522688 [details]
httpd log

Comment 3 James Slagle 2011-09-12 12:34:00 UTC
Created attachment 522689 [details]
system log

Comment 4 James Slagle 2011-09-12 12:34:44 UTC
Created attachment 522690 [details]
pulp log

Comment 5 James Slagle 2011-09-12 12:40:23 UTC
Created attachment 522691 [details]
mongodb log snippet

Full mongodb log is too bit to attach.  However, it is full of the errors seen in this snippet.  Lots and lots of messages all similar to:

Sat Sep 10 08:07:11 [conn544]  pulp_database.repos unindex: key too large to index, skipping pulp_database.repos.$packagegroups_-1
Sat Sep 10 08:07:11 [conn544]  pulp_database.repos Btree::insert: key too large to index, skipping pulp_database.repos.$packagegroups_-1 3275 { : { mrg-grid: { mandatory_package_names: [ "condor", "mrg-release" ], description: "Red Hat Enterprise MRG Grid packages", repo_defined: true, default: true, display_order: 1024, immutable: true, user_visible: true, translated_name: {}, translated_description: {}, conditional_package_names: {}, default_package_names: {}, optional_package_names: [ "condor-aviary", "wallaby-utils", "python-boto", "wso2-wsf-cpp-devel", "wso2-rampart", "condor-vm-gahp", "ruby-wallaby", "condor-ec2-enhanced-hooks", "ruby-spqr", "spqr-gen", "condor-ec2-enhanced", "libyaml-devel", "wso2-rampart-devel", "condor-classads", "PyYAML", "condor-job-hooks", "wso2-axis2-devel", "condor-wallaby-client", "wso2-wsf-cpp", "wallaby", "python-condorec2e", "wso2-axis2", "condor-wallaby-base-db", "libyaml", "condor-kbdd", "condor-wallaby-tools", "python-condorutils", "python-wallabyclient", "ruby-rhubarb", "ruby-sqlite3", "condor-qmf", "condor-low-latency" ], langonly: null, _id: "mrg-grid", id: "mrg-grid", name: "MRG Grid" }, mrg-management: { mandatory_package_names: [ "cumin" ], description: "Red Hat Enterprise MRG Management packages", repo_defined: true, default: true, display_order: 1024, immutable: true, user_visible: true, translated_name: {}, translated_description: {}, conditional_package_names: {}, default_package_names: {}, optional_package_names: {}, langonly: null, _id: "mrg-management", id: "mrg-management", name: "MRG Management" }, mrg-realtime: { mandatory_package_names: [ "rtctl", "tuna", "kernel-rt", "rtcheck", "rt-setup" ], description: "Red Hat Enterprise MRG Realtime packages", repo_defined: true, default: true, display_order: 1024, immutable: true, user_visible: true, translated_name: {}, translated_description: {}, conditional_package_names: {}, default_package_names: {}, optional_package_names: [ "kernel-rt-trace-devel", "kernel-rt-debug", "kernel-rt-debug-devel", "kernel-rt-vanilla", "python-schedutils", "oscilloscope", "rteval-loads", "rt-tests", "kernel-rt-vanilla-devel", "kernel-rt-trace", "kernel-rt-firmware", "kernel-rt-doc", "python-numeric", "ibm-prtm", "python-linux-procfs", "kernel-rt-devel", "rteval" ], langonly: null, _id: "mrg-realtime", id: "mrg-realtime", name: "MRG Realtime" }, mrg-messaging: { mandatory_package_names: [ "qpid-cpp-server-store", "mrg-release" ], description: "Red Hat Enterprise MRG Messaging packages", repo_defined: true, default: true, display_order: 1024, immutable: true, user_visible: true, translated_name: {}, translated_description: {}, conditional_package_names: {}, default_package_names: [ "qpid-cpp-client-devel", "qpid-cpp-server-xml", "rhm-docs", "sesame", "ruby-sas

Comment 6 James Slagle 2011-09-12 12:41:16 UTC
Last comment should say "Full mongodb log is too big to attach."

Comment 7 John Matthews 2011-09-12 16:42:44 UTC

OOM killer invoked at 9/10 @08:27:16
  Sep 10 08:27:16 rhui2 kernel: mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0  
  ...
  Sep 10 08:27:16 rhui2 kernel: Out of memory: kill process 1971 (httpd) score 371268 or a child
  Sep 10 08:27:16 rhui2 kernel: Out of memory: kill process 1971 (httpd) score 371268 or a child
Below is cumulative memory usage on 9/10

12:00:02 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
12:10:01 AM     36000   7095376     99.50     18612   3010224    755408     10.59
12:20:01 AM     35636   7095740     99.50     18796   3010236    755408     10.59
12:30:01 AM     35380   7095996     99.50     18956   3010248    755408     10.59
12:40:01 AM     35148   7096228     99.51     19156   3010252    755408     10.59
12:50:01 AM     35032   7096344     99.51     19340   3010272    755408     10.59
01:00:01 AM     35024   7096352     99.51     19508   3010280    755408     10.59
01:10:01 AM     33900   7097476     99.52     19688   3011072    745164     10.45
01:20:01 AM     37132   7094244     99.48     19844   3006860    745164     10.45
01:30:01 AM     36636   7094740     99.49     20012   3006876    745164     10.45
01:40:01 AM     36264   7095112     99.49     20220   3006888    745404     10.45
01:50:01 AM     36264   7095112     99.49     20412   3006900    745404     10.45
02:00:01 AM     36396   7094980     99.49     20588   3006908    745404     10.45
02:10:01 AM     34852   7096524     99.51     12816   2316984   1149144     16.11
02:20:01 AM     72776   7058600     98.98     10984   2389616    892552     12.52
02:30:01 AM     37308   7094068     99.48     12760   2025244    822884     11.54
02:40:01 AM     33576   7097800     99.53     34488   1776272   1159716     16.26
02:50:01 AM     36312   7095064     99.49     66172   1774420   1077428     15.11
03:00:01 AM     34452   7096924     99.52     64508   1676116   1094396     15.35
03:10:01 AM     42028   7089348     99.41    178308   1402668   1104860     15.49

03:10:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
03:20:02 AM     35460   7095916     99.50    147940   1477392   1023184     14.35
03:30:01 AM    296552   6834824     95.84    137652   1194420    920648     12.91
03:40:01 AM     35440   7095936     99.50    122224   1330656   1094532     15.35
03:50:01 AM     36328   7095048     99.49    124244   1294960    910768     12.77
04:00:01 AM     35868   7095508     99.50     23112   1122988   1159696     16.26
04:10:02 AM     36904   7094472     99.48      7496   1344064    786256     11.03
04:20:01 AM     39136   7092240     99.45      6804   1343544    786384     11.03
04:30:01 AM     40360   7091016     99.43      6940   1342164    786384     11.03
04:40:01 AM     36152   7095224     99.49      6948   1346624    786384     11.03
04:50:01 AM     33440   7097936     99.53      7192   1348876    786384     11.03
05:00:01 AM     43484   7087892     99.39      6984   1339256    786384     11.03
05:10:01 AM     36904   7094472     99.48      7868   1344432    786380     11.03
05:20:01 AM     41120   7090256     99.42      8496   1339680    786380     11.03
05:30:01 AM     33432   7097944     99.53      9084   1346964    786380     11.03
05:40:01 AM     35960   7095416     99.50      9824   1343496    755648     10.60
05:50:01 AM     35240   7096136     99.51      9984   1343512    755648     10.60
06:00:01 AM     34968   7096408     99.51     10184   1343516    755648     10.60
06:10:01 AM     35356   7096020     99.50     10360   1343536    755648     10.60
06:20:01 AM     34612   7096764     99.51     10536   1343548    755648     10.60

06:20:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
06:30:01 AM     34232   7097144     99.52     10728   1343556    755648     10.60
06:40:01 AM     34496   7096880     99.52     10928   1343568    755648     10.60
06:50:01 AM     33620   7097756     99.53     11088   1343580    755648     10.60
07:00:01 AM     33488   7097888     99.53     11280   1343584    755648     10.60
07:10:01 AM     36728   7094648     99.48     11476   1340412    755920     10.60
07:20:01 AM     36480   7094896     99.49     11664   1340408    755920     10.60
07:30:01 AM     35860   7095516     99.50     11840   1340436    755920     10.60
07:40:01 AM     35356   7096020     99.50     12016   1340448    755920     10.60
07:50:02 AM     35348   7096028     99.50     12184   1340456    755920     10.60
08:00:01 AM     35100   7096276     99.51     12384   1340472    755920     10.60
08:10:01 AM     33988   7097388     99.52     22464    876028    968044     13.57
08:20:01 AM     36240   7095136     99.49     51496    877952   1126204     15.79
08:30:01 AM   6795940    335436      4.70      6236     36492    397352      5.57
08:40:01 AM   6795708    335668      4.71      6428     36688    397360      5.57

Comment 8 John Matthews 2011-09-12 16:49:32 UTC
Notice, oom-kill happened at 08:27 on 9/10
On 08:09 a mod_wsgi process crashed, looks like it complained about running out of disk space:

As of 12:47pm @ 9/12 disk space looks OK
[root@rhui2 log]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvde1            5.7G  3.6G  2.0G  65% /
none                  3.5G     0  3.5G   0% /dev/shm
/dev/xvdl1            296G   99G  182G  36% /var/lib/pulp
/dev/xvdm1             20G  1.2G   18G   7% /var/lib/mongodb
/root/deploy/RHEL-6.1-RHUI-2.0-20110727.2-Server-x86_64-DVD1.iso
                       62M   62M     0 100% /mnt/rhuiso




Sep 10 08:09:02 rhui2 )    : abrt: detected unhandled Python exception in mod_wsgi

from /var/log/httpd/error_log-20110911

*** glibc detected *** (wsgi:pulp)    : double free or corruption (!prev): 0x00007f06f5355480 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75716)[0x7f07605ef716]
/usr/lib64/libnsspem.so(+0x18e72)[0x7f07392dce72]
/usr/lib64/libnsspem.so(+0xa4f4)[0x7f07392ce4f4]
/usr/lib64/libnsspem.so(+0xa64d)[0x7f07392ce64d]
/usr/lib64/libnsspem.so(+0x17e0f)[0x7f07392dbe0f]
/usr/lib64/libnsspem.so(+0x13df1)[0x7f07392d7df1]
/usr/lib64/libnss3.so(PK11_Sign+0xb1)[0x7f075d512b71]
/usr/lib64/libssl3.so(+0xe620)[0x7f075da41620]
/usr/lib64/libssl3.so(+0xf159)[0x7f075da42159]
/usr/lib64/libssl3.so(+0x12860)[0x7f075da45860]
/usr/lib64/libssl3.so(+0x13e30)[0x7f075da46e30]
/usr/lib64/libssl3.so(+0x148cc)[0x7f075da478cc]
/usr/lib64/libssl3.so(+0x17169)[0x7f075da4a169]
/usr/lib64/libssl3.so(+0x1cf15)[0x7f075da4ff15]
/usr/lib64/libssl3.so(SSL_ForceHandshake+0x9f)[0x7f075da5167f]
/usr/lib64/libcurl.so.4(Curl_nss_connect+0xb70)[0x7f074e974e10]
/usr/lib64/libcurl.so.4(Curl_ssl_connect+0x2b)[0x7f074e96c34b]
/usr/lib64/libcurl.so.4(Curl_http_connect+0x5b)[0x7f074e94ae5b]
/usr/lib64/libcurl.so.4(Curl_protocol_connect+0xd2)[0x7f074e951612]
/usr/lib64/libcurl.so.4(Curl_connect+0x2fc)[0x7f074e957abc]
/usr/lib64/libcurl.so.4(Curl_perform+0x80)[0x7f074e95fa50]
/usr/lib64/python2.6/site-packages/pycurl.so(+0x864b)[0x7f074eb8f64b]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5176)[0x7f07554edad6]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCodeEx+0x92c)[0x7f07554ef81c]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5304)[0x7f07554edc64]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCodeEx+0x92c)[0x7f07554ef81c]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5304)[0x7f07554edc64]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x63ef)[0x7f07554eed4f]

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 499, in <module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 496, in main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 453, in process
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe


[Sat Sep 10 08:09:17 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
[Sat Sep 10 08:09:17 2011] [error]     self.run()
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib64/python2.6/threading.py", line 484, in run
[Sat Sep 10 08:09:17 2011] [error]     self.__target(*self.__args, **self.__kwargs)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
[Sat Sep 10 08:09:17 2011] [error]     result = self.callable(*self.args, **self.kwargs)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
[Sat Sep 10 08:09:17 2011] [error]     progress_callback, synchronizer, max_speed, threads)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content
[Sat Sep 10 08:09:17 2011] [error]     progress_callback, max_speed, threads)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync
[Sat Sep 10 08:09:17 2011] [error]     report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 463, in fetchYumRepo
[Sat Sep 10 08:09:17 2011] [error]     self.yumFetch.getRepoData()
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 146, in getRepoData
[Sat Sep 10 08:09:17 2011] [error]     ftypefile = self.repo.retrieveMD(ftype)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1551, in retrieveMD
[Sat Sep 10 08:09:17 2011] [error]     return self._retrieveMD(mdtype)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 1607, in _retrieveMD
[Sat Sep 10 08:09:17 2011] [error]     size=thisdata.size)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 837, in _getFile
[Sat Sep 10 08:09:17 2011] [error]     size=size
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/mirror.py", line 408, in urlgrab
[Sat Sep 10 08:09:17 2011] [error]     return self._mirror_try(func, url, kw)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/mirror.py", line 394, in _mirror_try
[Sat Sep 10 08:09:17 2011] [error]     return func_ref( *(fullurl,), **kwargs )
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 985, in urlgrab
[Sat Sep 10 08:09:17 2011] [error]     return self._retry(opts, retryfunc, url, filename)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 886, in _retry
[Sat Sep 10 08:09:17 2011] [error]     r = apply(func, (opts,) + args, {})
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 971, in retryfunc
[Sat Sep 10 08:09:17 2011] [error]     fo = PyCurlFileObject(url, filename, opts)
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1066, in __init__
[Sat Sep 10 08:09:17 2011] [error]     self._do_open()
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1360, in _do_open
[Sat Sep 10 08:09:17 2011] [error]     self._do_grab()
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1490, in _do_grab
[Sat Sep 10 08:09:17 2011] [error]     self._do_perform()
[Sat Sep 10 08:09:17 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1285, in _do_perform
[Sat Sep 10 08:09:17 2011] [error]     raise KeyboardInterrupt
[Sat Sep 10 08:09:17 2011] [error] KeyboardInterrupt




[Sat Sep 10 08:09:01 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:09:02 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1097, in _retrieve
[Sat Sep 10 08:09:02 2011] [error]     self.fo.write(buf)
[Sat Sep 10 08:09:02 2011] [error] IOError: [Errno 28] No space left on device



[Sat Sep 10 08:09:18 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:09:18 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1097, in _retrieve
[Sat Sep 10 08:09:18 2011] [error]     self.fo.write(buf)
[Sat Sep 10 08:09:18 2011] [error] IOError: [Errno 28] No space left on device
[Sat Sep 10 08:09:18 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:09:18 2011] [error]   File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1097, in _retrieve
[Sat Sep 1[Sat Sep 10 08:18:12 2011] [error] 2011-09-10 08:18:12,770 pulp.server.tasking.task:ERROR: Task failed: Task cafe976e-dadc-11e0-ad65-12313f023076: _sync(rhel-server-5-5Server-i386, )
[Sat Sep 10 08:18:12 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
[Sat Sep 10 08:18:12 2011] [error]     result = self.callable(*self.args, **self.kwargs)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
[Sat Sep 10 08:18:12 2011] [error]     progress_callback, synchronizer, max_speed, threads)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 308, in fetch_content
[Sat Sep 10 08:18:12 2011] [error]     added_packages = synchronizer.add_packages_from_dir(repo_dir, repo_id, skip_dict)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 131, in add_packages_from_dir
[Sat Sep 10 08:18:12 2011] [error]     package_list = pulp.server.util.get_repo_packages(dir)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/util.py", line 227, in get_repo_packages
[Sat Sep 10 08:18:12 2011] [error]     r.sack.populate(r, 'metadata', None, 0)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 186, in populate
[Sat Sep 10 08:18:12 2011] [error]     dobj = repo_cache_function(xml, csum)
[Sat Sep 10 08:18:12 2011] [error]   File "/usr/lib64/python2.6/site-packages/sqlitecachec.py", line 46, in getPrimary
[Sat Sep 10 08:18:12 2011] [error]     self.repoid))
[Sat Sep 10 08:18:12 2011] [error] TypeError: Can not create requiresname index: disk I/O error
[Sat Sep 10 08:18:12 2011] [error] 
[Sat Sep 10 08:19:24 2011] [error] 2011-09-10 08:19:24,909 pulp.server.tasking.task:ERROR: Task failed: Task cada2d1e-dadc-11e0-ac77-12313f023076: _sync(rhel-server-5-5.7-x86_64, )
[Sat Sep 10 08:19:24 2011] [error] Traceback (most recent call last):
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
[Sat Sep 10 08:19:24 2011] [error]     result = self.callable(*self.args, **self.kwargs)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
[Sat Sep 10 08:19:24 2011] [error]     progress_callback, synchronizer, max_speed, threads)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content
[Sat Sep 10 08:19:24 2011] [error]     progress_callback, max_speed, threads)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync
[Sat Sep 10 08:19:24 2011] [error]     report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 469, in fetchYumRepo
[Sat Sep 10 08:19:24 2011] [error]     self.prepareRPMS()
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 293, in prepareRPMS
[Sat Sep 10 08:19:24 2011] [error]     pkglist = self.yumFetch.getPackageList(newest=self.newest)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 88, in getPackageList
[Sat Sep 10 08:19:24 2011] [error]     sack.populate(self.repo, 'metadata', None, 0)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib/python2.6/site-packages/yum/yumRepo.py", line 186, in populate
[Sat Sep 10 08:19:24 2011] [error]     dobj = repo_cache_function(xml, csum)
[Sat Sep 10 08:19:24 2011] [error]   File "/usr/lib64/python2.6/site-packages/sqlitecachec.py", line 46, in getPrimary
[Sat Sep 10 08:19:24 2011] [error]     self.repoid))
[Sat Sep 10 08:19:24 2011] [error] TypeError: Can not create packagename index: disk I/O 


[Sat Sep 10 08:09:11 2011] [error] IOError: [Errno 28] No space left on device

Comment 9 James Slagle 2011-09-12 18:43:11 UTC
I've got a test RHUA setup where you can watch the memory continue to grow during a sync.  You can watch the memory usage slowly climb throughout the sync, sometimes it will drop, but the baseline continues to climb.

To repo it, first, clear out the existing repos
rm -rf /var/lib/pulp/packages/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.6/i386/os/Packages/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.6/i386/os/repodata/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.6/x86_64/os/Packages/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.6/x86_64/os/repodata/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.7/i386/os/Packages/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.7/i386/os/repodata/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.7/x86_64/os/Packages/*
/var/lib/pulp/repos/content/dist/rhel/rhui/server/5/5.7/x86_64/os/repodata/*

Then sync those 4 repos from rhui-manager:

rhui-manager, s, sr, 37, 38, 39, 40, c, y.


The public hostname is ec2-107-20-207-64.compute-1.amazonaws.com, and the firewall is not setup so you can access it from anywhere.

Comment 10 James Slagle 2011-09-13 00:04:43 UTC
It's possible that the fix for https://bugzilla.redhat.com/show_bug.cgi?id=737531 will also resolve this issue.  I'll test with the updated grinder tomorrow.

Comment 11 James Slagle 2011-09-19 14:10:34 UTC
It doesn't appear that the fixes for https://bugzilla.redhat.com/show_bug.cgi?id=737531 or https://bugzilla.redhat.com/show_bug.cgi?id=738376 have had any effect on this issue.  

I'm stilling seeing memory usage increase after each repo sync for the main wsgi process.  

In fact, now that those 2 fixes are in and we're actually able to finish sync's and hit our next schedule, this problem is much worse.  After 2 days of letting the scheduled syncs run, I'm seeing the wsgi process consume 4767 MB of memory.  It will not be long before the OOM killer is triggered again.

Comment 12 John Matthews 2011-10-26 13:30:29 UTC
We moved all of the yum metadata fetch to it's own ActiveObject, this will cause yum metadata operations to execute in their own process.  After execution the process is destroyed, this should work around any memory issues yum was contributing.

Moving this to MODIFIED.

Commit introducing the change below
http://git.fedorahosted.org/git/?p=grinder.git;a=commitdiff;h=c77356360f8374310422e2e9ad03270476a58c4c

Comment 13 dgao 2011-10-27 22:07:32 UTC
Created attachment 530577 [details]
.

Comment 14 dgao 2011-10-27 22:09:06 UTC
The above attachment verifies the bug on pulp from the 10/24 RHUI build.

Comment 18 errata-xmlrpc 2017-03-01 22:05:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0367