Bug 734780 - Too many open files error sync'ing repos
Summary: Too many open files error sync'ing repos
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Update Infrastructure for Cloud Providers
Classification: Red Hat
Component: RHUA
Version: 2.0.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jeff Ortel
QA Contact: mkovacik
URL:
Whiteboard:
Depends On:
Blocks: 735350 tracker-rhui-2.0.1
TreeView+ depends on / blocked
 
Reported: 2011-08-31 12:24 UTC by James Slagle
Modified: 2017-03-01 22:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Do not document
Clone Of:
: 735350 (view as bug list)
Environment:
Last Closed: 2017-03-01 22:05:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of 'pulp repo list' (59.23 KB, text/plain)
2011-08-31 13:10 UTC, James Slagle
no flags Details
grinder lsof output (3.06 MB, text/plain)
2011-09-01 12:22 UTC, James Slagle
no flags Details
grinder ps output (64.88 KB, text/plain)
2011-09-01 12:23 UTC, James Slagle
no flags Details
gofer log from the CDS (2.09 MB, application/octet-stream)
2011-09-01 12:24 UTC, James Slagle
no flags Details
ps tree (72.32 KB, text/plain)
2011-09-01 12:34 UTC, James Slagle
no flags Details
lsof output from cds1 (348.30 KB, application/octet-stream)
2011-09-02 22:00 UTC, James Slagle
no flags Details
lsof output from cds2 (320.57 KB, application/octet-stream)
2011-09-02 22:01 UTC, James Slagle
no flags Details
gofer log from CDS 1 (5.43 MB, application/octet-stream)
2011-09-02 22:02 UTC, James Slagle
no flags Details
gofer log from CDS 2 (4.32 MB, application/octet-stream)
2011-09-02 22:03 UTC, James Slagle
no flags Details
ps tree CDS 1 (10.15 KB, application/octet-stream)
2011-09-02 22:04 UTC, James Slagle
no flags Details
ps tree CDS 2 (9.23 KB, application/octet-stream)
2011-09-02 22:04 UTC, James Slagle
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0367 0 normal SHIPPED_LIVE Red Hat Update Infrastructure 3.0 Release 2017-03-02 03:05:22 UTC

Description James Slagle 2011-08-31 12:24:31 UTC
RHUA 2.0 defaults to sync'ing 4 repos at once.  It seems as if the repos are large (RHEL), many of the sync's start to error out with a Too many open files error.

Traceback:
2011-08-31 01:05:15,779 1671:140266130777856: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 02009591-d35c-11e0-9eb3-123139078252: _sync(rhui-6-mrg-g-2.0-srpms-6Server-x86_64, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7f9234632d90>, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7f9234632b90>>)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
    progress_callback, synchronizer, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content
    progress_callback, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync
    report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files


[root@domU-12-31-39-07-82-52 pulp]# ulimit -n
1024

Once all the sync'ing is done, whether or successful or error'd out, you can sync individual repositories again just fine.  So, you would have to be on a system monitoring it during a scheduled sync to debug the problem live.  

One time when the system was experiencing the problem a ps aux | grep grinder showed 434 distinct processes.

Comment 1 James Slagle 2011-08-31 12:30:50 UTC
1 sync running at a time results in 36 grinder processes

Comment 2 James Slagle 2011-08-31 12:44:11 UTC
The grinder processes look like:
apache   16266  0.0  0.0  75096  5524 ?        S    07:59   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc

Comment 3 James Slagle 2011-08-31 13:10:48 UTC
Created attachment 520815 [details]
output of 'pulp repo list'

Attaching a list of all repos i'm trying to sync

Comment 4 James Slagle 2011-08-31 17:37:14 UTC
2 commands you can use to capture lsof output from every grinder and every httpd process

ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt

ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

Comment 5 James Slagle 2011-08-31 17:38:25 UTC
ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt
ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

Comment 6 James Slagle 2011-09-01 12:21:06 UTC
Got this error on the CDS during a CDS sync as well.

Error performing repo sync
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 156, in sync
    self._sync_repo(base_url, repo)
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 332, in _sync_repo
    fetch.fetchYumRepo(repo_path)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files

There were 479 grinder processes according to ps.  That output is attached as grinder-ps.txt.

I ran lsof -p for each grinder pid, that output is attached as grinder-lsof.txt.

A quick look through the lsof output and I don't see anything out of the ordinary, such as one process holding on to an exorbitantly large number of files.  I think the issue is why are there so many grinder processes to begin with.

Comment 7 James Slagle 2011-09-01 12:22:18 UTC
Created attachment 521000 [details]
grinder lsof output

Comment 8 James Slagle 2011-09-01 12:23:33 UTC
Created attachment 521001 [details]
grinder ps output

Comment 9 James Slagle 2011-09-01 12:24:41 UTC
Created attachment 521002 [details]
gofer log from the CDS

Comment 10 James Slagle 2011-09-01 12:34:00 UTC
Created attachment 521003 [details]
ps tree

Attaching full ps output from the system in tree format.

Comment 11 James Slagle 2011-09-02 21:57:28 UTC
I updated the CDS to grinder-0.0.111-1.el6.noarch.rpm which has the latest fix, however I got this error again on both CDS's.  However, there weren't a large number of grinder process this time. There is one goferd process that has 1000+ pipes open.

You can see the traceback in the attached gofer logs (one from each CDS).  I also captured the lsof output (lsof.output attachment).  The goferd process is pid 2261 and 2659 on the 2nd CDS.

Comment 12 James Slagle 2011-09-02 22:00:41 UTC
Created attachment 521294 [details]
lsof output from cds1

Comment 13 James Slagle 2011-09-02 22:01:12 UTC
Created attachment 521295 [details]
lsof output from cds2

Comment 14 James Slagle 2011-09-02 22:02:41 UTC
Created attachment 521296 [details]
gofer log from CDS 1

Comment 15 James Slagle 2011-09-02 22:03:57 UTC
Created attachment 521297 [details]
gofer log from CDS 2

Comment 16 James Slagle 2011-09-02 22:04:22 UTC
Created attachment 521298 [details]
ps tree CDS 1

Comment 17 James Slagle 2011-09-02 22:04:39 UTC
Created attachment 521299 [details]
ps tree CDS 2

Comment 18 Jeff Ortel 2011-09-06 15:07:30 UTC
We discontinued using Popen.kill() for python 2.4 compat.  This was replaced by killing the child process manually.  At this time we also needed to close the stdin & stdout pipes manually and did not.

Fixed: c5eb8995f72c470c326a0b80e1b984a1db6aec3a

Sorry, forgot to add the bug # to the commit comment.

Comment 19 Jeff Ortel 2011-09-06 17:10:14 UTC
Fix included in grinder 0.112.

Comment 20 wes hayutin 2011-10-17 20:01:36 UTC
set tracker bug. 746803

Comment 22 Sachin Ghai 2011-10-21 14:13:37 UTC


I started a sync of multiple repos at on go and checked the grinder processes:

-= Repository Synchronization Status =-

Last Refreshed: 19:32:07
(updated every 5 seconds, ctrl+c to exit)

Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
Red Hat Enterprise Linux Server 5 (RPMs) (5.6-i386)
In Progress                  10-20-2011 20:07             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.6-x86_64)
In Progress                  10-20-2011 20:14             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.7-i386)
In Progress                  10-20-2011 20:07             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.7-x86_64)
In Progress                  10-20-2011 20:15             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5Server-i386)
Awaiting Execution           10-20-2011 20:39             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5Server-x86_64)
Awaiting Execution           10-21-2011 03:15             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 01:15             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 01:11             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 04:54             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 05:18             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 02:43             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 03:04             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 00:18             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 00:39             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 01:03             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 01:01             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 00:47             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 01:01             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 15:42             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 15:40             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 15:40             Success    


                                  Connected: dhcp201-152.englab.pnq.redhat.com
------------------------------------------------------------------------------


grinder processes: total of 17
==================
[root@dhcp201-152 /]# ps aux | grep grinder
apache   28729  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28730  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28731  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28732  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28733  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28734  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28737  0.1  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28738  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28740  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28741  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28742  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28743  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28745  0.0  0.6  75544  6652 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28746  0.0  0.6  75544  6652 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28747  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28748  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     29068  0.0  0.0 103236   816 pts/8    S+   19:31   0:00 grep grinder
[root@dhcp201-152 /]# ps aux | grep grinder | wc -l
17
[root@dhcp201-152 /]# 


since grinder processed shouldn't be more than 20 so I think the number of total grinder processes as here 17 are good.





Sync is completely finished and haven't seen the issue related to Too many open files during or after repo sync. 


Will check the same for goferd process during cds sync.

Comment 23 Sachin Ghai 2011-10-22 03:31:40 UTC
I started the CDS sync for two CDS nodes and in parallel and Haven't see the issue related to too many open files on CDS nodes as well.


Here is status :

-= CDS Synchronization Status =-

Last Refreshed: 08:50:55
(updated every 5 seconds, ctrl+c to exit)


cds171 ...................................................... [  UP  ]
cds207 ...................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds171
In Progress                  Never                        Never       

cds207
In Progress                  10-22-2011 07:16             Success      


                                  Connected: dhcp201-152.englab.pnq.redhat.com
------------------------------------------------------------------------------



Here is the number of grinder processes running on CDS-1 (dhcp201-171)

from CDS207

[root@dhcp201-171 ~]# ps aux | grep grinder
root     17558  0.4  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17559  0.3  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17560  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17561  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17562  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17563  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17564  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17565  0.3  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17566  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17567  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17571  0.0  0.0 103240   824 pts/0    S+   08:46   0:00 grep grinder
[root@dhcp201-171 ~]# ps aux | grep grinder| wc -l
11
[root@dhcp201-171 ~]# 


from cds207:
==============

[root@dhcp201-207 ~]# ps aux | grep grinder
root      5749  0.8  1.8 319300 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5750  0.9  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5751  0.9  1.8 319296 18704 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5752  0.9  1.8 319304 18716 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5753  0.8  1.8 319296 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5754  0.9  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5755  0.8  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5756  0.8  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5757  0.9  1.8 319304 18716 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5758  0.8  1.8 319300 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5772  0.0  0.0 103240   824 pts/0    S+   08:52   0:00 grep grinder
[root@dhcp201-207 ~]# ps aux | grep grinder | wc -l
11
[root@dhcp201-207 ~]#

Comment 24 James Slagle 2011-10-31 19:16:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not document

Comment 28 errata-xmlrpc 2017-03-01 22:05:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0367


Note You need to log in before you can comment on or make changes to this bug.