Bug 692888 - gofer becomes unresponsive w/ Too many open file error
Summary: gofer becomes unresponsive w/ Too many open file error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: user-experience
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: Sprint 25
Assignee: Jeff Ortel
QA Contact: dgao
URL:
Whiteboard:
Depends On:
Blocks: pulp-verified
TreeView+ depends on / blocked
 
Reported: 2011-04-01 14:17 UTC by dgao
Modified: 2013-09-09 16:29 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-08-16 14:02:30 UTC
Embargoed:


Attachments (Terms of Use)

Description dgao 2011-04-01 14:17:58 UTC
this was caused by a cli installed cmd:

2011-03-31 14:24:57,627 [ERROR][test_consumer_remote] __call__() @ dispatcher.py:357 - [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/gofer/messaging/dispatcher.py", line 354, in __call__
    retval = method(*args, **keywords)
  File "/usr/lib/gofer/plugins/pulp.py", line 183, in bind
    bind_data['repo'], bind_data['host_urls'], bind_data['gpg_keys'])
  File "/usr/lib/python2.6/site-packages/pulp/client/repolib.py", line 81, in bind
  File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 169, in acquire
  File "/usr/lib/python2.6/site-packages/pulp/client/lock.py", line 57, in open
IOError: [Errno 24] Too many open files: '/var/run/subsys/pulp/repolib.pid'


[root@pulp-qe-rhel6-client gofer]# lsof |grep 3378|grep "/var/lib/rpm"
python     3378      root  mem       REG              253,0   753664     135488 /var/lib/rpm/__db.004
python     3378      root  mem       REG              253,0  1318912     135487 /var/lib/rpm/__db.003
python     3378      root  mem       REG              253,0   229376     135486 /var/lib/rpm/__db.002
python     3378      root  mem       REG              253,0    24576     135485 /var/lib/rpm/__db.001
python     3378      root   18r      REG              253,0 20103168     130824 /var/lib/rpm/Packages
python     3378      root   19r      REG              253,0    24576     130825 /var/lib/rpm/Name
python     3378      root   20r      REG              253,0  1290240     130826 /var/lib/rpm/Providename
python     3378      root   22u      REG              253,0 20103168     130824 /var/lib/rpm/Packages
python     3378      root   23u      REG              253,0    24576     130825 /var/lib/rpm/Name
python     3378      root   28u      REG              253,0 20103168     130824 /var/lib/rpm/Packages
python     3378      root   29r      REG              253,0 20103168     130824 /var/lib/rpm/Packages
python     3378      root   30r      REG              253,0    24576     130825 /var/lib/rpm/Name
python     3378      root   31r      REG              253,0  1290240     130826 /var/lib/rpm/Providename
python     3378      root   32r      REG              253,0  1536000     130838 /var/lib/rpm/Basenames
python     3378      root   33u      REG              253,0 20103168     130824 /var/lib/rpm/Packages
python     3378      root   34u      REG              253,0    24576     130825 /var/lib/rpm/Name
python     3378      root   35u      REG              253,0  1536000     130838 /var/lib/rpm/Basenames
python     3378      root   43u      REG              253,0    12288     131382 /var/lib/rpm/Triggername
python     3378      root   44u      REG              253,0    16384     131379 /var/lib/rpm/Group
python     3378      root   45u      REG              253,0   204800     131380 /var/lib/rpm/Requirename
python     3378      root   46u      REG              253,0  1290240     130826 /var/lib/rpm/Providename
python     3378      root   47u      REG              253,0   413696     131383 /var/lib/rpm/Dirnames
python     3378      root   48u      REG              253,0   139264     131384 /var/lib/rpm/Requireversion
python     3378      root   49u      REG              253,0   520192     131385 /var/lib/rpm/Provideversion
python     3378      root   50u      REG              253,0    16384     131386 /var/lib/rpm/Installtid
python     3378      root   51u      REG              253,0    49152     131387 /var/lib/rpm/Sigmd5
python     3378      root   52u      REG              253,0    81920     131388 /var/lib/rpm/Sha1header
python     3378      root   53u      REG              253,0  2621440     131389 /var/lib/rpm/Filedigests

.
.
.
[root@pulp-qe-rhel6-client gofer]# lsof |grep 3378|grep "/var/lib/rpm"| wc -l
693
[root@pulp-qe-rhel6-client gofer]# lsof | grep "3378"| wc -l
1137

Comment 1 Jeff Ortel 2011-04-05 13:06:01 UTC
Here's a better paste.

[root@pulp-qe-rhel6 ~]# cat /tmp/goferd.18867.lsof|sort|cut -c45-120
 253,0        0    1046108 /tmp/goferd.18867.lsof
/dev/null
/dev/null
/dev/null
/dev/null
/dev/urandom
/
/
pipe
pipe
pipe
pipe
pipe
pipe
unused:54114->unused:amqp (ESTABLISHED)
unused:54115->unused:amqp (ESTABLISHED)
unused:39891->a23-1-44-218.deploy.akamaitechnologies.com:https (CLOSE_WAIT)
/tmp/tmprdE8y6 (deleted)
/tmp/tmpj5Fz2b (deleted)
/tmp/ffiUnpDvC
/var/lib/rpm/Triggername
/var/lib/rpm/Triggername
/var/lib/rpm/Providename
/var/lib/rpm/Providename
/var/lib/rpm/Providename
/var/lib/rpm/Providename
/var/lib/rpm/Providename
/var/lib/rpm/__db.003
/var/lib/rpm/Requireversion
/var/lib/rpm/Requireversion
/var/lib/yum/history/history-2011-01-17.sqlite
/var/lib/yum/history/history-2011-01-17.sqlite
/var/lib/rpm/Basenames
/var/lib/rpm/Basenames
/var/lib/rpm/Basenames
/var/lib/rpm/Basenames
/var/lib/rpm/Basenames
/var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef
/var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef
/var/cache/yum/x86_64/6Server/epel/8633060596fe26eb3061583aa8fde96a3f69910ef
/var/lib/rpm/Group
/var/lib/rpm/Group
/var/lib/rpm/Installtid
/var/lib/rpm/Installtid
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Packages
/var/lib/rpm/Requirename
/var/lib/rpm/Requirename
/var/cache/yum/x86_64/6Server/package_test_repo/primary.sqlite
/var/cache/yum/x86_64/6Server/errata_pkg_upload_repo/primary.sqlite
/var/log/gofer/agent.log
/var/cache/yum/x86_64/6Server/package_group_test_repo/primary.sqlite
/var/lib/rpm/__db.002
/var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833
/var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833
/var/cache/yum/x86_64/6Server/rhel-x86_64-server-6.0.z/8c584af3a1bc5d9a36833
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/Name
/var/lib/rpm/__db.001
/var/lib/rpm/Filedigests
/var/lib/rpm/Filedigests
/var/log/pulp/client.log
/var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite
/var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite
/var/cache/yum/x86_64/6Server/f13test/primary.xml.gz.sqlite
/var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite
/var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite
/var/cache/yum/x86_64/6Server/fedora-pulp-testing/primary.xml.gz.sqlite
/var/lib/rpm/Dirnames
/var/lib/rpm/Dirnames
/tmp/ffiUnpDvC (deleted)
/var/lib/rpm/Sigmd5
/var/lib/rpm/Sigmd5
/var/lib/rpm/Provideversion
/var/lib/rpm/Provideversion
/var/log/yum.log
/var/log/yum.log
/var/log/yum.log
/var/lib/rpm/__db.004
/var/lib/rpm/Sha1header
/var/lib/rpm/Sha1header
/usr/bin/python
/usr/lib/locale/locale-archive
socket

Comment 2 Jeff Ortel 2011-04-05 13:06:24 UTC
I think what happens here is that the bind invokes RepoLib stuff which is
concurrency protected by a lockfile.  I suspect that when we get "too many open
files", this lock acquisition hangs.  I'll verify/fix this.  But, since gofer
executes RMI calls in a single thread, when the bind hangs, it blocks following
RMI calls like Package.install().

Comment 3 Jeff Ortel 2011-04-05 13:10:22 UTC
Looks like yum is leaking ~10 file descriptors per yum run:

/var/lib/rpm
/var/yum/cache

I assume (python) rpm is responsible for /var/lib/rpm.  I'll get with Florian Festi about the RPM leaks and Seth and/or James about yum.

===========================

Test program:

import os
from time import sleep
from yum import YumBase

def install(packageinfo=['zsh']):
    installed = []
    yb = YumBase()
    for info in packageinfo:
        if isinstance(info, list):
            pkgs = yb.pkgSack.returnNewestByNameArch(tuple(info))
        else:
            pkgs = yb.pkgSack.returnNewestByName(info)
        for p in pkgs:
            installed.append(str(p))
            yb.tsInfo.addInstall(p)
    yb.resolveDeps()
    yb.processTransaction()
    if reboot_suggested:
        cfg_assumeyes = cfg.client.assumeyes
        if cfg_assumeyes in ["True", "False"]:
            assumeyes = eval(cfg_assumeyes)
        else:
            assumeyes = assumeyes
        if assumeyes is True:
            self.__schedule_reboot()
            return (installed, {'reboot_performed' :True})
        else:
            return (installed, {'reboot_performed' :False})
    return (installed, None)

if __name__ == '__main__':
    print os.getpid()
    for i in range(0,10):
        print i
        sleep(1)
    for i in range(0,10):
        try:
            install()
        except Exception, e:
            print e
        print 'lsof now'
        sleep(5)


RESULTS:

[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
      7      63     601
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     18     162    1840
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     29     261    3129
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     40     360    4418
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     40     360    4418
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     51     459    5707
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     73     657    8285
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
     95     855   10863
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
    106     954   12152
[jortel@localhost tmp]$ lsof | grep 6755 |grep -v "\.so" |wc
    117    1053   13441

Comment 4 Panu Matilainen 2011-04-05 13:33:47 UTC
This appears to cure the leak in the test program:

--- pyleak.py.orig	2011-04-05 16:32:00.000000000 +0300
+++ pyleak.py	2011-04-05 16:32:04.000000000 +0300
@@ -17,6 +17,7 @@
             yb.tsInfo.addInstall(p)
     yb.resolveDeps()
     yb.processTransaction()
+    yb.closeRpmDB()
     if reboot_suggested:
         cfg_assumeyes = cfg.client.assumeyes
         if cfg_assumeyes in ["True", "False"]:

But that shouldn't really be necessary as yb should fall out of scope and take any rpm(db) references down with it. Needs further investigation...

Comment 5 Jeff Ortel 2011-04-05 13:39:11 UTC
Looks like after adding:

     yb.processTransaction()
+   yb.closeRpmDB()

Yum is leaking (1) file descriptor to: /var/log/yum.log which I would assume is a yum issue.

Comment 6 Jeff Ortel 2011-04-05 15:11:59 UTC
The remaining leak is in the logging setup.
The function setFileLog() at loggingLevels.py:240 adds a FileHandler every time it's called.  Which means, every time we use YumBase(), another FileHandler is added.

Comment 7 Jeff Ortel 2011-04-05 15:44:24 UTC
Added hack to purge yum FileHandlers until yum fixes this.

Comment 8 Jeff Ortel 2011-04-06 20:49:20 UTC
build: 0.161

Comment 9 dgao 2011-05-10 19:13:17 UTC
On F13

[root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l
22
[root@pulp-nightly pulp]# rpm -e patb
[root@pulp-nightly pulp]# pulp-admin package install --consumerid=fred -n patb
Created task id: 85749b75-7b34-11e0-9720-5452004fdd6d
Task is scheduled for: 2011-05-10 14:37
Waiting: [\] 
[[[u'patb-0.1-1.x86_64'], None]] installed on fred

[root@pulp-nightly pulp]# lsof | grep "python" | grep "yum.log" | wc -l
23

The leaked file descriptor: 
python    24692    root   32w      REG              253,0   1210293     531303 /var/log/yum.log



[root@pulp-nightly pulp]# ps -elf | grep "24692"
0 S root      1581 32178  0  80   0 - 25800 pipe_w 14:34 pts/0    00:00:00 grep 24692
5 S root     24692     1  0  80   0 - 213854 futex_ 10:25 ?       00:02:24 python /usr/bin/goferd


So it seems like the agent is still leaking 1 file descriptor per install command.

Comment 10 Jeff Ortel 2011-06-28 20:05:54 UTC
When goferd starts, there are no file descriptors for /var/log/yum.log.  After the first package install, lsof will show (1) open file for /var/log/yum.log.  But, subsequent package installs don't result in additional open files for /var/log/yum.log beyond the initial one.  They are cleaned up by our hack/patch.  So, we don't seem to be leaking one per package install (beyond the first time).

Comment 11 Jeff Ortel 2011-06-29 23:25:05 UTC
build: 0.200

Comment 12 Preethi Thomas 2011-07-27 18:37:47 UTC
verified
[root@preethi ~]# rpm -q pulp
pulp-0.0.213-1.fc14.noarch

[root@preethi upload]#  lsof | grep "python" | grep "yum.log" | wc -l0
[root@preethi upload]# 
[root@preethi upload]# pulp-admin package install -n zsh --consumerid=preethi
Created task id: 72d8d48c-b87a-11e0-a230-002564a85a58
Task is scheduled for: None
Waiting: [\] 
[[['zsh-4.3.10-6.fc14.x86_64'], None]] installed on preethi

[root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l
1
[root@preethi upload]# 
[root@preethi upload]# pulp-admin package install -n screen --consumerid=preethi 
Created task id: e9da265c-b87f-11e0-b2b6-002564a85a58
Task is scheduled for: None
Waiting: [\] 
[[['screen-4.0.3-15.fc12.x86_64', 'screen-4.0.3-15.fc12.x86_64'], None]] installed on preethi

[root@preethi upload]# lsof | grep "python" | grep "yum.log" | wc -l
1


[root@preethi upload]#  ps -elf | grep "16724"
5 S root     16724     1  0  80   0 - 229033 futex_ 14:01 ?       00:00:22 python /usr/bin/goferd
0 S root     17258 14908  0  80   0 - 25844 pipe_w 14:45 pts/0    00:00:00 grep --color=auto 16724

Comment 13 Preethi Thomas 2011-08-16 14:02:30 UTC
Closing with Community Release 15

pulp-0.0.223-4.


Note You need to log in before you can comment on or make changes to this bug.