Bug 735350 - Too many open files error sync'ing repos
Summary: Too many open files error sync'ing repos
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: user-experience
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: John Matthews
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On: 734780
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-02 13:05 UTC by Preethi Thomas
Modified: 2013-09-09 16:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 734780
Environment:
Last Closed: 2012-02-24 20:15:30 UTC
Embargoed:


Attachments (Terms of Use)

Description Preethi Thomas 2011-09-02 13:05:49 UTC
+++ This bug was initially created as a clone of Bug #734780 +++

Cloning this as I saw this on pulp when I had repos with scheduled syncs.

RHUA 2.0 defaults to sync'ing 4 repos at once.  It seems as if the repos are large (RHEL), many of the sync's start to error out with a Too many open files error.

Traceback:
2011-08-31 01:05:15,779 1671:140266130777856: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 02009591-d35c-11e0-9eb3-123139078252: _sync(rhui-6-mrg-g-2.0-srpms-6Server-x86_64, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7f9234632d90>, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7f9234632b90>>)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
    progress_callback, synchronizer, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content
    progress_callback, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync
    report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files


[root@domU-12-31-39-07-82-52 pulp]# ulimit -n
1024

Once all the sync'ing is done, whether or successful or error'd out, you can sync individual repositories again just fine.  So, you would have to be on a system monitoring it during a scheduled sync to debug the problem live.  

One time when the system was experiencing the problem a ps aux | grep grinder showed 434 distinct processes.

--- Additional comment from jslagle on 2011-08-31 08:30:50 EDT ---

1 sync running at a time results in 36 grinder processes

--- Additional comment from jslagle on 2011-08-31 08:44:11 EDT ---

The grinder processes look like:
apache   16266  0.0  0.0  75096  5524 ?        S    07:59   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc

--- Additional comment from jslagle on 2011-08-31 09:10:48 EDT ---

Created attachment 520815 [details]
output of 'pulp repo list'

Attaching a list of all repos i'm trying to sync

--- Additional comment from jslagle on 2011-08-31 13:37:14 EDT ---

2 commands you can use to capture lsof output from every grinder and every httpd process

ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt

ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

--- Additional comment from jslagle on 2011-08-31 13:38:25 EDT ---

ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt
ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

--- Additional comment from jslagle on 2011-09-01 08:21:06 EDT ---

Got this error on the CDS during a CDS sync as well.

Error performing repo sync
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 156, in sync
    self._sync_repo(base_url, repo)
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 332, in _sync_repo
    fetch.fetchYumRepo(repo_path)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files

There were 479 grinder processes according to ps.  That output is attached as grinder-ps.txt.

I ran lsof -p for each grinder pid, that output is attached as grinder-lsof.txt.

A quick look through the lsof output and I don't see anything out of the ordinary, such as one process holding on to an exorbitantly large number of files.  I think the issue is why are there so many grinder processes to begin with.

--- Additional comment from jslagle on 2011-09-01 08:22:18 EDT ---

Created attachment 521000 [details]
grinder lsof output

--- Additional comment from jslagle on 2011-09-01 08:23:33 EDT ---

Created attachment 521001 [details]
grinder ps output

--- Additional comment from jslagle on 2011-09-01 08:24:41 EDT ---

Created attachment 521002 [details]
gofer log from the CDS

--- Additional comment from jslagle on 2011-09-01 08:34:00 EDT ---

Created attachment 521003 [details]
ps tree

Attaching full ps output from the system in tree format.

Comment 1 John Matthews 2011-09-08 16:45:59 UTC
Fixed in grinder 0.111

Commit here:
http://git.fedorahosted.org/git/?p=grinder.git;a=commitdiff;h=ec9448cb0ff2bb5915372083022c409b5d80562d

Comment 2 Preethi Thomas 2011-09-23 14:33:23 UTC
verified

[root@preethi playpen]# rpm -q grinder
grinder-0.0.117-1.fc15.noarch

[root@preethi playpen]# rpm -q pulp
pulp-0.0.233-1.fc15.noarch

Comment 3 Preethi Thomas 2012-02-24 20:15:30 UTC
Pulp v1.0 is released
Closed Current Release.

Comment 4 Preethi Thomas 2012-02-24 20:17:06 UTC
Pulp v1.0 is released.


Note You need to log in before you can comment on or make changes to this bug.