RHUA 2.0 defaults to sync'ing 4 repos at once. It seems as if the repos are large (RHEL), many of the sync's start to error out with a Too many open files error. Traceback: 2011-08-31 01:05:15,779 1671:140266130777856: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 02009591-d35c-11e0-9eb3-123139078252: _sync(rhui-6-mrg-g-2.0-srpms-6Server-x86_64, synchronizer=<pulp.server.api.synchronizers.YumSynchronizer object at 0x7f9234632d90>, progress_callback=<bound method RepoSyncTask.progress_callback of <pulp.server.api.repo_sync_task.RepoSyncTask object at 0x7f9234632b90>>) Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run result = self.callable(*self.args, **self.kwargs) File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 225, in _sync progress_callback, synchronizer, max_speed, threads) File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 303, in fetch_content progress_callback, max_speed, threads) File "/usr/lib/python2.6/site-packages/pulp/server/api/synchronizers.py", line 431, in sync report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback) File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback) File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__ wt = WorkerThread(self, fetcher) File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__ self.fetcher = ActiveObject(fetcher) File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__ self.__spawn() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn stdout=PIPE) File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__ errread, errwrite) = self._get_handles(stdin, stdout, stderr) File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles p2cread, p2cwrite = os.pipe() OSError: [Errno 24] Too many open files [root@domU-12-31-39-07-82-52 pulp]# ulimit -n 1024 Once all the sync'ing is done, whether or successful or error'd out, you can sync individual repositories again just fine. So, you would have to be on a system monitoring it during a scheduled sync to debug the problem live. One time when the system was experiencing the problem a ps aux | grep grinder showed 434 distinct processes.
1 sync running at a time results in 36 grinder processes
The grinder processes look like: apache 16266 0.0 0.0 75096 5524 ? S 07:59 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc
Created attachment 520815 [details] output of 'pulp repo list' Attaching a list of all repos i'm trying to sync
2 commands you can use to capture lsof output from every grinder and every httpd process ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt
ps aux | grep grinder | grep -v grep | for i in `awk '{print $2}'`; do lsof -p $i; done > grinder-lsof.txt ps aux | egrep ^apache | grep httpd | for i in `awk '{print $2}'`; do lsof -p $i; done > httpd-lsof.txt
Got this error on the CDS during a CDS sync as well. Error performing repo sync Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 156, in sync self._sync_repo(base_url, repo) File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 332, in _sync_repo fetch.fetchYumRepo(repo_path) File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 454, in fetchYumRepo self.fetchPkgs = ParallelFetch(self.yumFetch, self.numThreads, callback=callback) File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 61, in __init__ wt = WorkerThread(self, fetcher) File "/usr/lib/python2.6/site-packages/grinder/ParallelFetch.py", line 303, in __init__ self.fetcher = ActiveObject(fetcher) File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 98, in __init__ self.__spawn() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 151, in __spawn stdout=PIPE) File "/usr/lib64/python2.6/subprocess.py", line 632, in __init__ errread, errwrite) = self._get_handles(stdin, stdout, stderr) File "/usr/lib64/python2.6/subprocess.py", line 1047, in _get_handles p2cread, p2cwrite = os.pipe() OSError: [Errno 24] Too many open files There were 479 grinder processes according to ps. That output is attached as grinder-ps.txt. I ran lsof -p for each grinder pid, that output is attached as grinder-lsof.txt. A quick look through the lsof output and I don't see anything out of the ordinary, such as one process holding on to an exorbitantly large number of files. I think the issue is why are there so many grinder processes to begin with.
Created attachment 521000 [details] grinder lsof output
Created attachment 521001 [details] grinder ps output
Created attachment 521002 [details] gofer log from the CDS
Created attachment 521003 [details] ps tree Attaching full ps output from the system in tree format.
I updated the CDS to grinder-0.0.111-1.el6.noarch.rpm which has the latest fix, however I got this error again on both CDS's. However, there weren't a large number of grinder process this time. There is one goferd process that has 1000+ pipes open. You can see the traceback in the attached gofer logs (one from each CDS). I also captured the lsof output (lsof.output attachment). The goferd process is pid 2261 and 2659 on the 2nd CDS.
Created attachment 521294 [details] lsof output from cds1
Created attachment 521295 [details] lsof output from cds2
Created attachment 521296 [details] gofer log from CDS 1
Created attachment 521297 [details] gofer log from CDS 2
Created attachment 521298 [details] ps tree CDS 1
Created attachment 521299 [details] ps tree CDS 2
We discontinued using Popen.kill() for python 2.4 compat. This was replaced by killing the child process manually. At this time we also needed to close the stdin & stdout pipes manually and did not. Fixed: c5eb8995f72c470c326a0b80e1b984a1db6aec3a Sorry, forgot to add the bug # to the commit comment.
Fix included in grinder 0.112.
set tracker bug. 746803
I started a sync of multiple repos at on go and checked the grinder processes: -= Repository Synchronization Status =- Last Refreshed: 19:32:07 (updated every 5 seconds, ctrl+c to exit) Next Sync Last Sync Last Result ------------------------------------------------------------------------------ Red Hat Enterprise Linux Server 5 (RPMs) (5.6-i386) In Progress 10-20-2011 20:07 Success Red Hat Enterprise Linux Server 5 (RPMs) (5.6-x86_64) In Progress 10-20-2011 20:14 Success Red Hat Enterprise Linux Server 5 (RPMs) (5.7-i386) In Progress 10-20-2011 20:07 Success Red Hat Enterprise Linux Server 5 (RPMs) (5.7-x86_64) In Progress 10-20-2011 20:15 Success Red Hat Enterprise Linux Server 5 (RPMs) (5Server-i386) Awaiting Execution 10-20-2011 20:39 Success Red Hat Enterprise Linux Server 5 (RPMs) (5Server-x86_64) Awaiting Execution 10-21-2011 03:15 Success Red Hat Enterprise Linux Server 6 (RPMs) (6.0-i386) Awaiting Execution 10-21-2011 01:15 Success Red Hat Enterprise Linux Server 6 (RPMs) (6.0-x86_64) Awaiting Execution 10-21-2011 01:11 Success Red Hat Enterprise Linux Server 6 (RPMs) (6.1-i386) Awaiting Execution 10-21-2011 04:54 Success Red Hat Enterprise Linux Server 6 (RPMs) (6.1-x86_64) Awaiting Execution 10-21-2011 05:18 Success Red Hat Enterprise Linux Server 6 (RPMs) (6Server-i386) Awaiting Execution 10-21-2011 02:43 Success Red Hat Enterprise Linux Server 6 (RPMs) (6Server-x86_64) Awaiting Execution 10-21-2011 03:04 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-i386) Awaiting Execution 10-21-2011 00:18 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.0-x86_64) Awaiting Execution 10-21-2011 00:39 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-i386) Awaiting Execution 10-21-2011 01:03 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6.1-x86_64) Awaiting Execution 10-21-2011 01:01 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-i386) Awaiting Execution 10-21-2011 00:47 Success Red Hat Enterprise Linux Server 6 Optional (RPMs) (6Server-x86_64) Awaiting Execution 10-21-2011 01:01 Success Red Hat Update Infrastructure 2.0 (RPMs) (6.0-i386) Awaiting Execution 10-21-2011 15:39 Success Red Hat Update Infrastructure 2.0 (RPMs) (6.0-x86_64) Awaiting Execution 10-21-2011 15:39 Success Red Hat Update Infrastructure 2.0 (RPMs) (6.1-i386) Awaiting Execution 10-21-2011 15:39 Success Red Hat Update Infrastructure 2.0 (RPMs) (6.1-x86_64) Awaiting Execution 10-21-2011 15:42 Success Red Hat Update Infrastructure 2.0 (RPMs) (6Server-i386) Awaiting Execution 10-21-2011 15:40 Success Red Hat Update Infrastructure 2.0 (RPMs) (6Server-x86_64) Awaiting Execution 10-21-2011 15:40 Success Connected: dhcp201-152.englab.pnq.redhat.com ------------------------------------------------------------------------------ grinder processes: total of 17 ================== [root@dhcp201-152 /]# ps aux | grep grinder apache 28729 0.0 0.6 75552 6660 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28730 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28731 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28732 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28733 0.0 0.6 75552 6660 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28734 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28737 0.1 0.6 75552 6660 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28738 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28740 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28741 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28742 0.0 0.6 75552 6660 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28743 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28745 0.0 0.6 75544 6652 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28746 0.0 0.6 75544 6652 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28747 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc apache 28748 0.0 0.6 75548 6656 ? S 19:30 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 29068 0.0 0.0 103236 816 pts/8 S+ 19:31 0:00 grep grinder [root@dhcp201-152 /]# ps aux | grep grinder | wc -l 17 [root@dhcp201-152 /]# since grinder processed shouldn't be more than 20 so I think the number of total grinder processes as here 17 are good. Sync is completely finished and haven't seen the issue related to Too many open files during or after repo sync. Will check the same for goferd process during cds sync.
I started the CDS sync for two CDS nodes and in parallel and Haven't see the issue related to too many open files on CDS nodes as well. Here is status : -= CDS Synchronization Status =- Last Refreshed: 08:50:55 (updated every 5 seconds, ctrl+c to exit) cds171 ...................................................... [ UP ] cds207 ...................................................... [ UP ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds171 In Progress Never Never cds207 In Progress 10-22-2011 07:16 Success Connected: dhcp201-152.englab.pnq.redhat.com ------------------------------------------------------------------------------ Here is the number of grinder processes running on CDS-1 (dhcp201-171) from CDS207 [root@dhcp201-171 ~]# ps aux | grep grinder root 17558 0.4 0.6 174464 6812 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17559 0.3 0.6 174464 6812 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17560 0.3 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17561 0.2 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17562 0.3 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17563 0.3 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17564 0.2 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17565 0.3 0.6 174464 6812 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17566 0.2 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17567 0.3 0.6 174460 6808 ? S 08:46 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 17571 0.0 0.0 103240 824 pts/0 S+ 08:46 0:00 grep grinder [root@dhcp201-171 ~]# ps aux | grep grinder| wc -l 11 [root@dhcp201-171 ~]# from cds207: ============== [root@dhcp201-207 ~]# ps aux | grep grinder root 5749 0.8 1.8 319300 18708 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5750 0.9 1.8 319304 18712 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5751 0.9 1.8 319296 18704 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5752 0.9 1.8 319304 18716 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5753 0.8 1.8 319296 18708 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5754 0.9 1.8 319304 18712 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5755 0.8 1.8 319304 18712 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5756 0.8 1.8 319304 18712 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5757 0.9 1.8 319304 18716 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5758 0.8 1.8 319300 18708 ? D 08:51 0:00 /usr/bin/python /usr/lib/python2.6/site-packages/grinder/activeobject.pyc root 5772 0.0 0.0 103240 824 pts/0 S+ 08:52 0:00 grep grinder [root@dhcp201-207 ~]# ps aux | grep grinder | wc -l 11 [root@dhcp201-207 ~]#
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Do not document
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0367