Bug 734780

Summary: Too many open files error sync'ing repos
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: James Slagle <jslagle>
Component: RHUAAssignee: Jeff Ortel <jortel>
Status: CLOSED ERRATA QA Contact: mkovacik
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0.1CC: igulina, jortel, kbidarka, sghai, tsanders
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Do not document
Story Points: ---
Clone Of:
: 735350 (view as bug list) Environment:
Last Closed: 2017-03-01 22:05:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 735350, 746803    
Attachments:
Description Flags
output of 'pulp repo list'
none
grinder lsof output
none
grinder ps output
none
gofer log from the CDS
none
ps tree
none
lsof output from cds1
none
lsof output from cds2
none
gofer log from CDS 1
none
gofer log from CDS 2
none
ps tree CDS 1
none
ps tree CDS 2 none

Description James Slagle 2011-08-31 12:24:31 UTC
RHUA 2.0 defaults to sync'ing 4 repos at once.  It seems as if the repos are large (RHEL), many of the sync's start to error out with a Too many open files error.

Traceback:
2011-08-31 01:05:15,779 1671:140266130777856: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 02009591-d35c-11e0-9eb3-123139078252: _sync(rhui-6-mrg-g-2.0-srpms-6Server-x86_64, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7f9234632d90>, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7f9234632b90>>)
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync
    progress_callback, synchronizer, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content
    progress_callback, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync
    report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files


[root@domU-12-31-39-07-82-52 pulp]# ulimit -n
1024

Once all the sync'ing is done, whether or successful or error'd out, you can sync individual repositories again just fine.  So, you would have to be on a system monitoring it during a scheduled sync to debug the problem live.  

One time when the system was experiencing the problem a ps aux | grep grinder showed 434 distinct processes.

Comment 1 James Slagle 2011-08-31 12:30:50 UTC
1 sync running at a time results in 36 grinder processes

Comment 2 James Slagle 2011-08-31 12:44:11 UTC
The grinder processes look like:
apache   16266  0.0  0.0  75096  5524 ?        S    07:59   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc

Comment 3 James Slagle 2011-08-31 13:10:48 UTC
Created attachment 520815 [details]
output of 'pulp repo list'

Attaching a list of all repos i'm trying to sync

Comment 4 James Slagle 2011-08-31 17:37:14 UTC
2 commands you can use to capture lsof output from every grinder and every httpd process

ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt

ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

Comment 5 James Slagle 2011-08-31 17:38:25 UTC
ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt
ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt

Comment 6 James Slagle 2011-09-01 12:21:06 UTC
Got this error on the CDS during a CDS sync as well.

Error performing repo sync
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 156, in sync
    self._sync_repo(base_url, repo)
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 332, in _sync_repo
    fetch.fetchYumRepo(repo_path)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo
    self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__
    wt = WorkerThread(self, fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__
    self.fetcher = ActiveObject(fetcher)
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__
    self.__spawn()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn
    stdout=PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__
    errread, errwrite) = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles
    p2cread, p2cwrite = os.pipe()
OSError: [Errno 24] Too many open files

There were 479 grinder processes according to ps.  That output is attached as grinder-ps.txt.

I ran lsof -p for each grinder pid, that output is attached as grinder-lsof.txt.

A quick look through the lsof output and I don't see anything out of the ordinary, such as one process holding on to an exorbitantly large number of files.  I think the issue is why are there so many grinder processes to begin with.

Comment 7 James Slagle 2011-09-01 12:22:18 UTC
Created attachment 521000 [details]
grinder lsof output

Comment 8 James Slagle 2011-09-01 12:23:33 UTC
Created attachment 521001 [details]
grinder ps output

Comment 9 James Slagle 2011-09-01 12:24:41 UTC
Created attachment 521002 [details]
gofer log from the CDS

Comment 10 James Slagle 2011-09-01 12:34:00 UTC
Created attachment 521003 [details]
ps tree

Attaching full ps output from the system in tree format.

Comment 11 James Slagle 2011-09-02 21:57:28 UTC
I updated the CDS to grinder-0.0.111-1.el6.noarch.rpm which has the latest fix, however I got this error again on both CDS's.  However, there weren't a large number of grinder process this time. There is one goferd process that has 1000+ pipes open.

You can see the traceback in the attached gofer logs (one from each CDS).  I also captured the lsof output (lsof.output attachment).  The goferd process is pid 2261 and 2659 on the 2nd CDS.

Comment 12 James Slagle 2011-09-02 22:00:41 UTC
Created attachment 521294 [details]
lsof output from cds1

Comment 13 James Slagle 2011-09-02 22:01:12 UTC
Created attachment 521295 [details]
lsof output from cds2

Comment 14 James Slagle 2011-09-02 22:02:41 UTC
Created attachment 521296 [details]
gofer log from CDS 1

Comment 15 James Slagle 2011-09-02 22:03:57 UTC
Created attachment 521297 [details]
gofer log from CDS 2

Comment 16 James Slagle 2011-09-02 22:04:22 UTC
Created attachment 521298 [details]
ps tree CDS 1

Comment 17 James Slagle 2011-09-02 22:04:39 UTC
Created attachment 521299 [details]
ps tree CDS 2

Comment 18 Jeff Ortel 2011-09-06 15:07:30 UTC
We discontinued using Popen.kill() for python 2.4 compat.  This was replaced by killing the child process manually.  At this time we also needed to close the stdin & stdout pipes manually and did not.

Fixed: c5eb8995f72c470c326a0b80e1b984a1db6aec3a

Sorry, forgot to add the bug # to the commit comment.

Comment 19 Jeff Ortel 2011-09-06 17:10:14 UTC
Fix included in grinder 0.112.

Comment 20 wes hayutin 2011-10-17 20:01:36 UTC
set tracker bug. 746803

Comment 22 Sachin Ghai 2011-10-21 14:13:37 UTC


I started a sync of multiple repos at on go and checked the grinder processes:

-= Repository Synchronization Status =-

Last Refreshed: 19:32:07
(updated every 5 seconds, ctrl+c to exit)

Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
Red Hat Enterprise Linux Server 5 (RPMs) (5.6-i386)
In Progress                  10-20-2011 20:07             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.6-x86_64)
In Progress                  10-20-2011 20:14             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.7-i386)
In Progress                  10-20-2011 20:07             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5.7-x86_64)
In Progress                  10-20-2011 20:15             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5Server-i386)
Awaiting Execution           10-20-2011 20:39             Success    

Red Hat Enterprise Linux Server 5 (RPMs) (5Server-x86_64)
Awaiting Execution           10-21-2011 03:15             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 01:15             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 01:11             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 04:54             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 05:18             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 02:43             Success    

Red Hat Enterprise Linux Server 6 (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 03:04             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 00:18             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 00:39             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 01:03             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 01:01             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 00:47             Success    

Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 01:01             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.0-i386)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.0-x86_64)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.1-i386)
Awaiting Execution           10-21-2011 15:39             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6.1-x86_64)
Awaiting Execution           10-21-2011 15:42             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6Server-i386)
Awaiting Execution           10-21-2011 15:40             Success    

Red Hat Update Infrastructure 2.0 (RPMs) (6Server-x86_64)
Awaiting Execution           10-21-2011 15:40             Success    


                                  Connected: dhcp201-152.englab.pnq.redhat.com
------------------------------------------------------------------------------


grinder processes: total of 17
==================
[root@dhcp201-152 /]# ps aux | grep grinder
apache   28729  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28730  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28731  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28732  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28733  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28734  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28737  0.1  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28738  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28740  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28741  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28742  0.0  0.6  75552  6660 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28743  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28745  0.0  0.6  75544  6652 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28746  0.0  0.6  75544  6652 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28747  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
apache   28748  0.0  0.6  75548  6656 ?        S    19:30   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     29068  0.0  0.0 103236   816 pts/8    S+   19:31   0:00 grep grinder
[root@dhcp201-152 /]# ps aux | grep grinder | wc -l
17
[root@dhcp201-152 /]# 


since grinder processed shouldn't be more than 20 so I think the number of total grinder processes as here 17 are good.





Sync is completely finished and haven't seen the issue related to Too many open files during or after repo sync. 


Will check the same for goferd process during cds sync.

Comment 23 Sachin Ghai 2011-10-22 03:31:40 UTC
I started the CDS sync for two CDS nodes and in parallel and Haven't see the issue related to too many open files on CDS nodes as well.


Here is status :

-= CDS Synchronization Status =-

Last Refreshed: 08:50:55
(updated every 5 seconds, ctrl+c to exit)


cds171 ...................................................... [  UP  ]
cds207 ...................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds171
In Progress                  Never                        Never       

cds207
In Progress                  10-22-2011 07:16             Success      


                                  Connected: dhcp201-152.englab.pnq.redhat.com
------------------------------------------------------------------------------



Here is the number of grinder processes running on CDS-1 (dhcp201-171)

from CDS207

[root@dhcp201-171 ~]# ps aux | grep grinder
root     17558  0.4  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17559  0.3  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17560  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17561  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17562  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17563  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17564  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17565  0.3  0.6 174464  6812 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17566  0.2  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17567  0.3  0.6 174460  6808 ?        S    08:46   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root     17571  0.0  0.0 103240   824 pts/0    S+   08:46   0:00 grep grinder
[root@dhcp201-171 ~]# ps aux | grep grinder| wc -l
11
[root@dhcp201-171 ~]# 


from cds207:
==============

[root@dhcp201-207 ~]# ps aux | grep grinder
root      5749  0.8  1.8 319300 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5750  0.9  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5751  0.9  1.8 319296 18704 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5752  0.9  1.8 319304 18716 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5753  0.8  1.8 319296 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5754  0.9  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5755  0.8  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5756  0.8  1.8 319304 18712 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5757  0.9  1.8 319304 18716 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5758  0.8  1.8 319300 18708 ?        D    08:51   0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
root      5772  0.0  0.0 103240   824 pts/0    S+   08:52   0:00 grep grinder
[root@dhcp201-207 ~]# ps aux | grep grinder | wc -l
11
[root@dhcp201-207 ~]#

Comment 24 James Slagle 2011-10-31 19:16:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not document

Comment 28 errata-xmlrpc 2017-03-01 22:05:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0367