this was caused by a cli installed cmd: 2011-03-31 14:24:57,627 [ERROR][test_consumer_remote] __call__() @ dispatcher.py:357 - [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid' Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/gofer/messaging/dispatcher.py", line 354, in __call__ retval = method(*args, **keywords) File "/usr/lib/gofer/plugins/pulp.py", line 183, in bind bind_data['repo'], bind_data['host_urls'], bind_data['gpg_keys']) File "/usr/lib/python2.6/site-packages/pulp/client/repolib.py", line 81, in bind File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 169, in acquire File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 57, in open IOError: [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid' [root@pulp-qe-rhel6-client gofer]# lsof |grep 3378|grep "/var/lib/rpm" python 3378 root mem REG 253,0 753664 135488 /var/lib/rpm/__db.004 python 3378 root mem REG 253,0 1318912 135487 /var/lib/rpm/__db.003 python 3378 root mem REG 253,0 229376 135486 /var/lib/rpm/__db.002 python 3378 root mem REG 253,0 24576 135485 /var/lib/rpm/__db.001 python 3378 root 18r REG 253,0 20103168 130824 /var/lib/rpm/Packages python 3378 root 19r REG 253,0 24576 130825 /var/lib/rpm/Name python 3378 root 20r REG 253,0 1290240 130826 /var/lib/rpm/Providename python 3378 root 22u REG 253,0 20103168 130824 /var/lib/rpm/Packages python 3378 root 23u REG 253,0 24576 130825 /var/lib/rpm/Name python 3378 root 28u REG 253,0 20103168 130824 /var/lib/rpm/Packages python 3378 root 29r REG 253,0 20103168 130824 /var/lib/rpm/Packages python 3378 root 30r REG 253,0 24576 130825 /var/lib/rpm/Name python 3378 root 31r REG 253,0 1290240 130826 /var/lib/rpm/Providename python 3378 root 32r REG 253,0 1536000 130838 /var/lib/rpm/Basenames python 3378 root 33u REG 253,0 20103168 130824 /var/lib/rpm/Packages python 3378 root 34u REG 253,0 24576 130825 /var/lib/rpm/Name python 3378 root 35u REG 253,0 1536000 130838 /var/lib/rpm/Basenames python 3378 root 43u REG 253,0 12288 131382 /var/lib/rpm/Triggername python 3378 root 44u REG 253,0 16384 131379 /var/lib/rpm/Group python 3378 root 45u REG 253,0 204800 131380 /var/lib/rpm/Requirename python 3378 root 46u REG 253,0 1290240 130826 /var/lib/rpm/Providename python 3378 root 47u REG 253,0 413696 131383 /var/lib/rpm/Dirnames python 3378 root 48u REG 253,0 139264 131384 /var/lib/rpm/Requireversion python 3378 root 49u REG 253,0 520192 131385 /var/lib/rpm/Provideversion python 3378 root 50u REG 253,0 16384 131386 /var/lib/rpm/Installtid python 3378 root 51u REG 253,0 49152 131387 /var/lib/rpm/Sigmd5 python 3378 root 52u REG 253,0 81920 131388 /var/lib/rpm/Sha1header python 3378 root 53u REG 253,0 2621440 131389 /var/lib/rpm/Filedigests . . . [root@pulp-qe-rhel6-client gofer]# lsof |grep 3378|grep "/var/lib/rpm"| wc -l 693 [root@pulp-qe-rhel6-client gofer]# lsof | grep "3378"| wc -l 1137
Here's a better paste. [root@pulp-qe-rhel6 ~]# cat /tmp/goferd.18867.lsof|sort|cut -c45-120 253,0 0 1046108 /tmp/goferd.18867.lsof /dev/null /dev/null /dev/null /dev/null /dev/urandom / / pipe pipe pipe pipe pipe pipe unused:54114->unused:amqp (ESTABLISHED) unused:54115->unused:amqp (ESTABLISHED) unused:39891->a23-1-44-218.deploy.akamaitechnologies.com:https (CLOSE_WAIT) /tmp/tmprdE8y6 (deleted) /tmp/tmpj5Fz2b (deleted) /tmp/ffiUnpDvC /var/lib/rpm/Triggername /var/lib/rpm/Triggername /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/__db.003 /var/lib/rpm/Requireversion /var/lib/rpm/Requireversion /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/lib/rpm/Group /var/lib/rpm/Group /var/lib/rpm/Installtid /var/lib/rpm/Installtid /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Requirename /var/lib/rpm/Requirename /var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite /var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite /var/log/gofer/agent.log /var/cache/yum/x86_64/6Server/package_group_test_repo/primary.sqlite /var/lib/rpm/__db.002 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/__db.001 /var/lib/rpm/Filedigests /var/lib/rpm/Filedigests /var/log/pulp/client.log /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/lib/rpm/Dirnames /var/lib/rpm/Dirnames /tmp/ffiUnpDvC (deleted) /var/lib/rpm/Sigmd5 /var/lib/rpm/Sigmd5 /var/lib/rpm/Provideversion /var/lib/rpm/Provideversion /var/log/yum.log /var/log/yum.log /var/log/yum.log /var/lib/rpm/__db.004 /var/lib/rpm/Sha1header /var/lib/rpm/Sha1header /usr/bin/python /usr/lib/locale/locale-archive socket
I think what happens here is that the bind invokes RepoLib stuff which is concurrency protected by a lockfile. I suspect that when we get "too many open files", this lock acquisition hangs. I'll verify/fix this. But, since gofer executes RMI calls in a single thread, when the bind hangs, it blocks following RMI calls like Package.install().
Looks like yum is leaking ~10 file descriptors per yum run: /var/lib/rpm /var/yum/cache I assume (python) rpm is responsible for /var/lib/rpm. I'll get with Florian Festi about the RPM leaks and Seth and/or James about yum. =========================== Test program: import os from time import sleep from yum import YumBase def install(packageinfo=['zsh']): installed = [] yb = YumBase() for info in packageinfo: if isinstance(info, list): pkgs = yb.pkgSack.returnNewestByNameArch(tuple(info)) else: pkgs = yb.pkgSack.returnNewestByName(info) for p in pkgs: installed.append(str(p)) yb.tsInfo.addInstall(p) yb.resolveDeps() yb.processTransaction() if reboot_suggested: cfg_assumeyes = cfg.client.assumeyes if cfg_assumeyes in ["True", "False"]: assumeyes = eval(cfg_assumeyes) else: assumeyes = assumeyes if assumeyes is True: self.__schedule_reboot() return (installed, {'reboot_performed' :True}) else: return (installed, {'reboot_performed' :False}) return (installed, None) if __name__ == '__main__': print os.getpid() for i in range(0,10): print i sleep(1) for i in range(0,10): try: install() except Exception, e: print e print 'lsof now' sleep(5) RESULTS: [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 7 63 601 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 18 162 1840 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 29 261 3129 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 51 459 5707 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 73 657 8285 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 95 855 10863 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 106 954 12152 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 117 1053 13441
This appears to cure the leak in the test program: --- pyleak.py.orig 2011-04-05 16:32:00.000000000 +0300 +++ pyleak.py 2011-04-05 16:32:04.000000000 +0300 @@ -17,6 +17,7 @@ yb.tsInfo.addInstall(p) yb.resolveDeps() yb.processTransaction() + yb.closeRpmDB() if reboot_suggested: cfg_assumeyes = cfg.client.assumeyes if cfg_assumeyes in ["True", "False"]: But that shouldn't really be necessary as yb should fall out of scope and take any rpm(db) references down with it. Needs further investigation...
Looks like after adding: yb.processTransaction() + yb.closeRpmDB() Yum is leaking (1) file descriptor to: /var/log/yum.log which I would assume is a yum issue.
The remaining leak is in the logging setup. The function setFileLog() at loggingLevels.py:240 adds a FileHandler every time it's called. Which means, every time we use YumBase(), another FileHandler is added.
Added hack to purge yum FileHandlers until yum fixes this.
build: 0.161
On F13 [root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l 22 [root@pulp-nightly pulp]# rpm -e patb [root@pulp-nightly pulp]# pulp-admin package install --consumerid=fred -n patb Created task id: 85749b75-7b34-11e0-9720-5452004fdd6d Task is scheduled for: 2011-05-10 14:37 Waiting: [\] [[[u'patb-0.1-1.x86_64'], None]] installed on fred [root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l 23 The leaked file descriptor: python 24692 root 32w REG 253,0 1210293 531303 /var/log/yum.log [root@pulp-nightly pulp]# ps -elf | grep "24692" 0 S root 1581 32178 0 80 0 - 25800 pipe_w 14:34 pts/0 00:00:00 grep 24692 5 S root 24692 1 0 80 0 - 213854 futex_ 10:25 ? 00:02:24 python /usr/bin/goferd So it seems like the agent is still leaking 1 file descriptor per install command.
When goferd starts, there are no file descriptors for /var/log/yum.log. After the first package install, lsof will show (1) open file for /var/log/yum.log. But, subsequent package installs don't result in additional open files for /var/log/yum.log beyond the initial one. They are cleaned up by our hack/patch. So, we don't seem to be leaking one per package install (beyond the first time).
build: 0.200
verified [root@preethi ~]# rpm -q pulp pulp-0.0.213-1.fc14.noarch [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l0 [root@preethi upload]# [root@preethi upload]# pulp-admin package install -n zsh --consumerid=preethi Created task id: 72d8d48c-b87a-11e0-a230-002564a85a58 Task is scheduled for: None Waiting: [\] [[['zsh-4.3.10-6.fc14.x86_64'], None]] installed on preethi [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l 1 [root@preethi upload]# [root@preethi upload]# pulp-admin package install -n screen --consumerid=preethi Created task id: e9da265c-b87f-11e0-b2b6-002564a85a58 Task is scheduled for: None Waiting: [\] [[['screen-4.0.3-15.fc12.x86_64', 'screen-4.0.3-15.fc12.x86_64'], None]] installed on preethi [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l 1 [root@preethi upload]# ps -elf | grep "16724" 5 S root 16724 1 0 80 0 - 229033 futex_ 14:01 ? 00:00:22 python /usr/bin/goferd 0 S root 17258 14908 0 80 0 - 25844 pipe_w 14:45 pts/0 00:00:00 grep --color=auto 16724
Closing with Community Release 15 pulp-0.0.223-4.