Hide Forgot
This issue was initially detected on RHEL6 install of 0.0.159 and subsequently detected on fedora 13. In both instances, failures are detected when the automation tries to install another package after a successful package install. Gofer agent will become unresponsive, causing heartbeat message to stop sending, and consequently hangs the install CLI command. One possible quick fix is to implement a timeout so in the case of gofer agent become unresponsive, the CLI command won't hang indefinitely. agent.log on RHEL6: 2011-03-31 14:24:57,627 [ERROR][test_consumer_remote] __call__() @ dispatcher.py:357 - [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid' Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/gofer/messaging/dispatcher.py", line 354, in __call__ retval = method(*args, **keywords) File "/usr/lib/gofer/plugins/pulp.py", line 183, in bind bind_data['repo'], bind_data['host_urls'], bind_data['gpg_keys']) File "/usr/lib/python2.6/site-packages/pulp/client/repolib.py", line 81, in bind File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 169, in acquire File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 57, in open IOError: [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid' 2011-03-31 14:37:20,419 [INFO][test_consumer_remote] dispatch() @ dispatcher.py:389 - dispatching:{ "classname": "Admin", "kws": {}, "args": [], "method": "hello", "auth": { "secret": null, "uuid": "test_consumer_remote" } } agent.log on f13: 2011-04-01 09:31:17,114 [ERROR][Actions] __call__() @ action.py:117 - Enqueue capacity threshold exceeded on queue "717c5be2-58f8-4afd-9d91-410c61e78445:0.0". (JournalImpl.cpp:621)(501) Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/gofer/agent/action.py", line 115, in __call__ self.target() File "/usr/lib/gofer/plugins/pulp.py", line 76, in heartbeat p.send(topic, ttl=delay, heartbeat=body) File "/usr/lib/python2.6/site-packages/gofer/messaging/producer.py", line 56, in send sender = self.session().sender(address) File "<string>", line 6, in sender File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 576, in sender sender._ewait(lambda: sender.linked) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 783, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 550, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 194, in _ewait self.check_error() File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 187, in check_error raise self.error ConnectionError: Enqueue capacity threshold exceeded on queue "717c5be2-58f8-4af
The "Too many open files" appears to be YUM/rpmlib related: [root@pulp-qe-rhel6 ~]# lsof |grep 18867 | grep -v "\.so" python 18867 root cwd DIR 253,0 4096 2 / python 18867 root rtd DIR 253,0 4096 2 / python 18867 root txt REG 253,0 8936 786094 /usr/bin/python python 18867 root mem REG 253,0 753664 396336 /var/lib/rpm/__db.004 python 18867 root mem REG 253,0 1318912 396311 /var/lib/rpm/__db.003 python 18867 root mem REG 253,0 229376 396301 /var/lib/rpm/__db.002 python 18867 root mem REG 253,0 99158752 785437 /usr/lib/locale/locale-archive python 18867 root mem REG 253,0 24576 396300 /var/lib/rpm/__db.001 python 18867 root DEL REG 253,0 1045287 /tmp/ffiUnpDvC python 18867 root 0u CHR 1,3 0t0 3641 /dev/null python 18867 root 1u CHR 1,3 0t0 3641 /dev/null python 18867 root 2u CHR 1,3 0t0 3641 /dev/null python 18867 root 3u CHR 1,3 0t0 3641 /dev/null python 18867 root 4r FIFO 0,8 0t0 145675 pipe python 18867 root 5w FIFO 0,8 0t0 145675 pipe python 18867 root 6w REG 253,0 210719 397009 /var/log/gofer/agent.log python 18867 root 7w REG 253,0 276246 397012 /var/log/pulp/client.log python 18867 root 8u REG 253,0 4096 1045287 /tmp/ffiUnpDvC (deleted) python 18867 root 9u IPv4 145676 0t0 TCP unused:54114->unused:amqp (ESTABLISHED) python 18867 root 10r FIFO 0,8 0t0 145681 pipe python 18867 root 11r CHR 1,9 0t0 3646 /dev/urandom python 18867 root 12w FIFO 0,8 0t0 145681 pipe python 18867 root 13u IPv4 145682 0t0 TCP unused:54115->unused:amqp (ESTABLISHED) python 18867 root 14r FIFO 0,8 0t0 145687 pipe python 18867 root 15w FIFO 0,8 0t0 145687 pipe python 18867 root 16u unix 0xffff88012e761680 0t0 145691 socket python 18867 root 17w REG 253,0 63548 396308 /var/log/yum.log python 18867 root 18ur REG 253,0 15557632 397226 /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910efe5f9035566ed3c1643d0ad3-primary.sqlite python 18867 root 19ur REG 253,0 30720 397395 /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite python 18867 root 20ur REG 253,0 33792 397404 /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite python 18867 root 21ur REG 253,0 20480 397410 /var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite python 18867 root 22ur REG 253,0 24023040 397432 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833f8204b601c83294ebce8973bbb06d7d71387edc6e15-primary.xml.gz.sqlite python 18867 root 23r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 24r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 25r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 26r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 27r REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 28w REG 253,0 63548 396308 /var/log/yum.log python 18867 root 29r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 30ur REG 253,0 20480 523906 /var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite python 18867 root 31r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 32r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 33r REG 253,0 1290240 391690 /var/lib/rpm/Providename python 18867 root 34u IPv4 147907 0t0 TCP unused:39891->a23-1-44-218.deploy.akamaitechnologies.com:https (CLOSE_WAIT) python 18867 root 35r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 36ur REG 253,0 15557632 397226 /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910efe5f9035566ed3c1643d0ad3-primary.sqlite python 18867 root 37ur REG 253,0 30720 397395 /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite python 18867 root 38ur REG 253,0 33792 397404 /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite python 18867 root 39ur REG 253,0 22528 523933 /var/cache/yum/x86_64/6Server/package_group_test_repo/primary.sqlite python 18867 root 40ur REG 253,0 24023040 397432 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833f8204b601c83294ebce8973bbb06d7d71387edc6e15-primary.xml.gz.sqlite python 18867 root 41r REG 253,0 1290240 391690 /var/lib/rpm/Providename python 18867 root 42r REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 43u REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 44u REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 45u REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 46w REG 253,0 63548 396308 /var/log/yum.log python 18867 root 47ur REG 253,0 15557632 397226 /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910efe5f9035566ed3c1643d0ad3-primary.sqlite python 18867 root 48ur REG 253,0 30720 397395 /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite python 18867 root 49u REG 253,0 0 1045703 /tmp/tmprdE8y6 (deleted) python 18867 root 50u REG 253,0 143360 392241 /var/lib/yum/history/history-2011-01-17.sqlite python 18867 root 51ur REG 253,0 33792 397404 /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite python 18867 root 52ur REG 253,0 24023040 397432 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833f8204b601c83294ebce8973bbb06d7d71387edc6e15-primary.xml.gz.sqlite python 18867 root 53u REG 253,0 12288 392246 /var/lib/rpm/Triggername python 18867 root 54u REG 253,0 16384 392243 /var/lib/rpm/Group python 18867 root 55u REG 253,0 204800 392244 /var/lib/rpm/Requirename python 18867 root 56u REG 253,0 1290240 391690 /var/lib/rpm/Providename python 18867 root 57u REG 253,0 409600 392247 /var/lib/rpm/Dirnames python 18867 root 58u REG 253,0 135168 392248 /var/lib/rpm/Requireversion python 18867 root 59u REG 253,0 512000 392249 /var/lib/rpm/Provideversion python 18867 root 60u REG 253,0 16384 392250 /var/lib/rpm/Installtid python 18867 root 61u REG 253,0 49152 392251 /var/lib/rpm/Sigmd5 python 18867 root 62u REG 253,0 77824 392252 /var/lib/rpm/Sha1header python 18867 root 63u REG 253,0 2621440 392253 /var/lib/rpm/Filedigests python 18867 root 64r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 65r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 66r REG 253,0 1290240 391690 /var/lib/rpm/Providename python 18867 root 67r REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 68u REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 69u REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 70u REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 74u REG 253,0 0 1046024 /tmp/tmpj5Fz2b (deleted) python 18867 root 75u REG 253,0 143360 392241 /var/lib/yum/history/history-2011-01-17.sqlite python 18867 root 78u REG 253,0 12288 392246 /var/lib/rpm/Triggername python 18867 root 79u REG 253,0 16384 392243 /var/lib/rpm/Group python 18867 root 80u REG 253,0 204800 392244 /var/lib/rpm/Requirename python 18867 root 81u REG 253,0 1290240 391690 /var/lib/rpm/Providename python 18867 root 82u REG 253,0 409600 392247 /var/lib/rpm/Dirnames python 18867 root 83u REG 253,0 135168 392248 /var/lib/rpm/Requireversion python 18867 root 84u REG 253,0 512000 392249 /var/lib/rpm/Provideversion python 18867 root 85u REG 253,0 16384 392250 /var/lib/rpm/Installtid python 18867 root 86u REG 253,0 49152 392251 /var/lib/rpm/Sigmd5 python 18867 root 87u REG 253,0 77824 392252 /var/lib/rpm/Sha1header python 18867 root 88u REG 253,0 2621440 392253 /var/lib/rpm/Filedigests [root@pulp-qe-rhel6 ~]# lsof |grep 18867 | grep -v "\.so"|more python 18867 root cwd DIR 253,0 4096 2 / python 18867 root rtd DIR 253,0 4096 2 / python 18867 root txt REG 253,0 8936 786094 /usr/bin/python python 18867 root mem REG 253,0 753664 396336 /var/lib/rpm/__db.004 python 18867 root mem REG 253,0 1318912 396311 /var/lib/rpm/__db.003 python 18867 root mem REG 253,0 229376 396301 /var/lib/rpm/__db.002 python 18867 root mem REG 253,0 99158752 785437 /usr/lib/locale/locale-archive python 18867 root mem REG 253,0 24576 396300 /var/lib/rpm/__db.001 python 18867 root DEL REG 253,0 1045287 /tmp/ffiUnpDvC python 18867 root 0u CHR 1,3 0t0 3641 /dev/null python 18867 root 1u CHR 1,3 0t0 3641 /dev/null python 18867 root 2u CHR 1,3 0t0 3641 /dev/null python 18867 root 3u CHR 1,3 0t0 3641 /dev/null python 18867 root 4r FIFO 0,8 0t0 145675 pipe python 18867 root 5w FIFO 0,8 0t0 145675 pipe python 18867 root 6w REG 253,0 210719 397009 /var/log/gofer/agent.log python 18867 root 7w REG 253,0 276246 397012 /var/log/pulp/client.log python 18867 root 8u REG 253,0 4096 1045287 /tmp/ffiUnpDvC (deleted) python 18867 root 9u IPv4 145676 0t0 TCP unused:54114->unused:amqp (ESTABLISHED) python 18867 root 10r FIFO 0,8 0t0 145681 pipe python 18867 root 11r CHR 1,9 0t0 3646 /dev/urandom python 18867 root 12w FIFO 0,8 0t0 145681 pipe python 18867 root 13u IPv4 145682 0t0 TCP unused:54115->unused:amqp (ESTABLISHED) python 18867 root 14r FIFO 0,8 0t0 145687 pipe python 18867 root 15w FIFO 0,8 0t0 145687 pipe python 18867 root 16u unix 0xffff88012e761680 0t0 145691 socket python 18867 root 17w REG 253,0 63548 396308 /var/log/yum.log python 18867 root 18ur REG 253,0 15557632 397226 /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910efe5f9035566ed3c164 3d0ad3-primary.sqlite python 18867 root 19ur REG 253,0 30720 397395 /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite python 18867 root 20ur REG 253,0 33792 397404 /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite python 18867 root 21ur REG 253,0 20480 397410 /var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite python 18867 root 22ur REG 253,0 24023040 397432 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833f8204b601c83294eb ce8973bbb06d7d71387edc6e15-primary.xml.gz.sqlite python 18867 root 23r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 24r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 25r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 26r REG 253,0 24576 391689 /var/lib/rpm/Name python 18867 root 27r REG 253,0 1531904 391702 /var/lib/rpm/Basenames python 18867 root 28w REG 253,0 63548 396308 /var/log/yum.log python 18867 root 29r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 30ur REG 253,0 20480 523906 /var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite python 18867 root 31r REG 253,0 19197952 391688 /var/lib/rpm/Packages python 18867 root 32r REG 253,0 24576 391689 /var/lib/rpm/Name
Here's a better paste. Sorry for the mess ^^ :/ [root@pulp-qe-rhel6 ~]# cat /tmp/goferd.18867.lsof|sort|cut -c45-120 253,0 0 1046108 /tmp/goferd.18867.lsof /dev/null /dev/null /dev/null /dev/null /dev/urandom / / pipe pipe pipe pipe pipe pipe unused:54114->unused:amqp (ESTABLISHED) unused:54115->unused:amqp (ESTABLISHED) unused:39891->a23-1-44-218.deploy.akamaitechnologies.com:https (CLOSE_WAIT) /tmp/tmprdE8y6 (deleted) /tmp/tmpj5Fz2b (deleted) /tmp/ffiUnpDvC /var/lib/rpm/Triggername /var/lib/rpm/Triggername /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/__db.003 /var/lib/rpm/Requireversion /var/lib/rpm/Requireversion /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/lib/rpm/Group /var/lib/rpm/Group /var/lib/rpm/Installtid /var/lib/rpm/Installtid /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Requirename /var/lib/rpm/Requirename /var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite /var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite /var/log/gofer/agent.log /var/cache/yum/x86_64/6Server/package_group_test_repo/primary.sqlite /var/lib/rpm/__db.002 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/__db.001 /var/lib/rpm/Filedigests /var/lib/rpm/Filedigests /var/log/pulp/client.log /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/lib/rpm/Dirnames /var/lib/rpm/Dirnames /tmp/ffiUnpDvC (deleted) /var/lib/rpm/Sigmd5 /var/lib/rpm/Sigmd5 /var/lib/rpm/Provideversion /var/lib/rpm/Provideversion /var/log/yum.log /var/log/yum.log /var/log/yum.log /var/lib/rpm/__db.004 /var/lib/rpm/Sha1header /var/lib/rpm/Sha1header /usr/bin/python /usr/lib/locale/locale-archive socket
I think what happens here is that the bind invokes RepoLib stuff which is concurrency protected by a lockfile. I suspect that when we get "too many open files", this lock acquisition hangs. I'll verify/fix this. But, since gofer executes RMI calls in a single thread, when the bind hangs, it blocks following RMI calls like Package.install().
Running a test on a freshly bounced goferd, we leak ~10 files descriptors per package install RMI request.
Test program: import os from time import sleep from yum import YumBase def install(packageinfo=['zsh']): installed = [] yb = YumBase() for info in packageinfo: if isinstance(info, list): pkgs = yb.pkgSack.returnNewestByNameArch(tuple(info)) else: pkgs = yb.pkgSack.returnNewestByName(info) for p in pkgs: installed.append(str(p)) yb.tsInfo.addInstall(p) yb.resolveDeps() yb.processTransaction() if reboot_suggested: cfg_assumeyes = cfg.client.assumeyes if cfg_assumeyes in ["True", "False"]: assumeyes = eval(cfg_assumeyes) else: assumeyes = assumeyes if assumeyes is True: self.__schedule_reboot() return (installed, {'reboot_performed' :True}) else: return (installed, {'reboot_performed' :False}) return (installed, None) if __name__ == '__main__': print os.getpid() for i in range(0,10): print i sleep(1) for i in range(0,10): try: install() except Exception, e: print e print 'lsof now' sleep(5) RESULTS: [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 7 63 601 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 18 162 1840 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 29 261 3129 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 51 459 5707 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 73 657 8285 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 95 855 10863 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 106 954 12152 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 117 1053 13441