Bug 692888
Summary: | gofer becomes unresponsive w/ Too many open file error | ||
---|---|---|---|
Product: | [Retired] Pulp | Reporter: | dgao |
Component: | user-experience | Assignee: | Jeff Ortel <jortel> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | dgao |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | pmatilai, pthomas, skarmark, tsanders |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | Sprint 25 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-08-16 14:02:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 641987 |
Description
dgao
2011-04-01 14:17:58 UTC
Here's a better paste. [root@pulp-qe-rhel6 ~]# cat /tmp/goferd.18867.lsof|sort|cut -c45-120 253,0 0 1046108 /tmp/goferd.18867.lsof /dev/null /dev/null /dev/null /dev/null /dev/urandom / / pipe pipe pipe pipe pipe pipe unused:54114->unused:amqp (ESTABLISHED) unused:54115->unused:amqp (ESTABLISHED) unused:39891->a23-1-44-218.deploy.akamaitechnologies.com:https (CLOSE_WAIT) /tmp/tmprdE8y6 (deleted) /tmp/tmpj5Fz2b (deleted) /tmp/ffiUnpDvC /var/lib/rpm/Triggername /var/lib/rpm/Triggername /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/Providename /var/lib/rpm/__db.003 /var/lib/rpm/Requireversion /var/lib/rpm/Requireversion /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/yum/history/history-2011-01-17.sqlite /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/lib/rpm/Basenames /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef /var/lib/rpm/Group /var/lib/rpm/Group /var/lib/rpm/Installtid /var/lib/rpm/Installtid /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Packages /var/lib/rpm/Requirename /var/lib/rpm/Requirename /var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite /var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite /var/log/gofer/agent.log /var/cache/yum/x86_64/6Server/package_group_test_repo/primary.sqlite /var/lib/rpm/__db.002 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833 /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/Name /var/lib/rpm/__db.001 /var/lib/rpm/Filedigests /var/lib/rpm/Filedigests /var/log/pulp/client.log /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite /var/lib/rpm/Dirnames /var/lib/rpm/Dirnames /tmp/ffiUnpDvC (deleted) /var/lib/rpm/Sigmd5 /var/lib/rpm/Sigmd5 /var/lib/rpm/Provideversion /var/lib/rpm/Provideversion /var/log/yum.log /var/log/yum.log /var/log/yum.log /var/lib/rpm/__db.004 /var/lib/rpm/Sha1header /var/lib/rpm/Sha1header /usr/bin/python /usr/lib/locale/locale-archive socket I think what happens here is that the bind invokes RepoLib stuff which is concurrency protected by a lockfile. I suspect that when we get "too many open files", this lock acquisition hangs. I'll verify/fix this. But, since gofer executes RMI calls in a single thread, when the bind hangs, it blocks following RMI calls like Package.install(). Looks like yum is leaking ~10 file descriptors per yum run: /var/lib/rpm /var/yum/cache I assume (python) rpm is responsible for /var/lib/rpm. I'll get with Florian Festi about the RPM leaks and Seth and/or James about yum. =========================== Test program: import os from time import sleep from yum import YumBase def install(packageinfo=['zsh']): installed = [] yb = YumBase() for info in packageinfo: if isinstance(info, list): pkgs = yb.pkgSack.returnNewestByNameArch(tuple(info)) else: pkgs = yb.pkgSack.returnNewestByName(info) for p in pkgs: installed.append(str(p)) yb.tsInfo.addInstall(p) yb.resolveDeps() yb.processTransaction() if reboot_suggested: cfg_assumeyes = cfg.client.assumeyes if cfg_assumeyes in ["True", "False"]: assumeyes = eval(cfg_assumeyes) else: assumeyes = assumeyes if assumeyes is True: self.__schedule_reboot() return (installed, {'reboot_performed' :True}) else: return (installed, {'reboot_performed' :False}) return (installed, None) if __name__ == '__main__': print os.getpid() for i in range(0,10): print i sleep(1) for i in range(0,10): try: install() except Exception, e: print e print 'lsof now' sleep(5) RESULTS: [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 7 63 601 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 18 162 1840 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 29 261 3129 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 40 360 4418 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 51 459 5707 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 73 657 8285 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 95 855 10863 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 106 954 12152 [jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc 117 1053 13441 This appears to cure the leak in the test program: --- pyleak.py.orig 2011-04-05 16:32:00.000000000 +0300 +++ pyleak.py 2011-04-05 16:32:04.000000000 +0300 @@ -17,6 +17,7 @@ yb.tsInfo.addInstall(p) yb.resolveDeps() yb.processTransaction() + yb.closeRpmDB() if reboot_suggested: cfg_assumeyes = cfg.client.assumeyes if cfg_assumeyes in ["True", "False"]: But that shouldn't really be necessary as yb should fall out of scope and take any rpm(db) references down with it. Needs further investigation... Looks like after adding: yb.processTransaction() + yb.closeRpmDB() Yum is leaking (1) file descriptor to: /var/log/yum.log which I would assume is a yum issue. The remaining leak is in the logging setup. The function setFileLog() at loggingLevels.py:240 adds a FileHandler every time it's called. Which means, every time we use YumBase(), another FileHandler is added. Added hack to purge yum FileHandlers until yum fixes this. build: 0.161 On F13 [root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l 22 [root@pulp-nightly pulp]# rpm -e patb [root@pulp-nightly pulp]# pulp-admin package install --consumerid=fred -n patb Created task id: 85749b75-7b34-11e0-9720-5452004fdd6d Task is scheduled for: 2011-05-10 14:37 Waiting: [\] [[[u'patb-0.1-1.x86_64'], None]] installed on fred [root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l 23 The leaked file descriptor: python 24692 root 32w REG 253,0 1210293 531303 /var/log/yum.log [root@pulp-nightly pulp]# ps -elf | grep "24692" 0 S root 1581 32178 0 80 0 - 25800 pipe_w 14:34 pts/0 00:00:00 grep 24692 5 S root 24692 1 0 80 0 - 213854 futex_ 10:25 ? 00:02:24 python /usr/bin/goferd So it seems like the agent is still leaking 1 file descriptor per install command. When goferd starts, there are no file descriptors for /var/log/yum.log. After the first package install, lsof will show (1) open file for /var/log/yum.log. But, subsequent package installs don't result in additional open files for /var/log/yum.log beyond the initial one. They are cleaned up by our hack/patch. So, we don't seem to be leaking one per package install (beyond the first time). build: 0.200 verified [root@preethi ~]# rpm -q pulp pulp-0.0.213-1.fc14.noarch [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l0 [root@preethi upload]# [root@preethi upload]# pulp-admin package install -n zsh --consumerid=preethi Created task id: 72d8d48c-b87a-11e0-a230-002564a85a58 Task is scheduled for: None Waiting: [\] [[['zsh-4.3.10-6.fc14.x86_64'], None]] installed on preethi [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l 1 [root@preethi upload]# [root@preethi upload]# pulp-admin package install -n screen --consumerid=preethi Created task id: e9da265c-b87f-11e0-b2b6-002564a85a58 Task is scheduled for: None Waiting: [\] [[['screen-4.0.3-15.fc12.x86_64', 'screen-4.0.3-15.fc12.x86_64'], None]] installed on preethi [root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l 1 [root@preethi upload]# ps -elf | grep "16724" 5 S root 16724 1 0 80 0 - 229033 futex_ 14:01 ? 00:00:22 python /usr/bin/goferd 0 S root 17258 14908 0 80 0 - 25844 pipe_w 14:45 pts/0 00:00:00 grep --color=auto 16724 Closing with Community Release 15 pulp-0.0.223-4. |