Bug 2070611

Summary: [Errno 24] too many open files after some time syncing.
Product: Red Hat Satellite Reporter: Vedashree Deshpande <vdeshpan>
Component: PulpAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED NOTABUG QA Contact: Lai <ltran>
Severity: high Docs Contact:
Priority: high    
Version: 6.10.4CC: bbuckingham, bdm, dalley, dkliban, ehelms, faguiard, ggainey, jkrajice, pdwyer, rchan, wclark, zhunting
Target Milestone: 6.12.0   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-05 15:10:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
open fd count - el8baseos sync pulp 3.14 without proxy none

Description Vedashree Deshpande 2022-03-31 14:15:39 UTC
Description of problem:
In satellite 6.10.4, while syncing any repository, 

the error getting is:

[Errno 24] too many open files  after some time syncing.


Version-Release number of selected component (if applicable):
6.10.4 although the customer says, he faces the issue in 6.9 as well as 6.10.3

Steps to Reproduce:
on Satellite 6.10.4 when syncing any repository. 


Actual results:
[Errno 24] too many open files  after some time syncing.

Expected results:
Repo should be synced. 

Additional info:
Customer did try to fix it with the workaround which helped. But needs a fix or a deeper investigation. 

~~~
checking the number of open files for the pulp user with
watch -n1 -t "lsof -u pulp -n | wc -l"

i reach over 4300 open files

which the pulp processes doent seem to be prepared to:
for i in $(ps waxu|grep ^pulp | awk '{print $2}') ; do prlimit --noheadings --pid $i --nofile; done
returns
1024:4096

manually setting it to a higher value with:
for i in $(ps waxu|grep ^pulp | awk '{print $2}') ; do prlimit --pid $i --nofile=4096:8192; done

~~~

Comment 3 Daniel Alley 2022-06-08 00:34:15 UTC
While running a sync of RHEL 7 I am seeing it cap out around 1020 open files.  I'm curious if this can still be produced on 6.10.6, and which repo specifically was being tested (I know it said "any" repo, but nonetheless)

We've fixed a few file leak bugs lately upstream, I believe at least some of those patches are already in 6.10.6.

Comment 20 pulp-infra@redhat.com 2022-07-03 13:28:13 UTC
The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

Comment 21 pulp-infra@redhat.com 2022-07-03 13:28:16 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 22 Daniel Alley 2022-07-03 14:35:45 UTC
Setting it back to NEW because as I said I don't think the small bug makes any real difference.

Comment 23 pulp-infra@redhat.com 2022-07-03 15:21:35 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 25 Daniel Alley 2022-07-08 18:42:56 UTC
Brian,

I am unable to reproduce any issues without an http proxy in the loop, and combined with the two reports and the traceback we can probably conclude that it is a necessary component of the bug.

I now have some results (which are attached) from using this command to sync the repo https://cdn.redhat.com/content/dist/rhel8/8/x86_64/baseos/os/ with no HTTP proxy using the following command:

watch -t -n 10 "(date '+TIME:%H:%M:%S' ; lsof -a -u pulp -n -d ^mem -d ^cwd -d ^rtd -d ^txt -d ^DEL | wc -l) | tee -a /tmp/pulp_nofile_sync"

Could you get results of the same command running through the duration of the sync, syncing the same repository, under an HTTP proxy, so that we can compare them?file:///home/dalley/devel/pulp_nofile_sync

Comment 26 Daniel Alley 2022-07-08 18:44:55 UTC
Created attachment 1895482 [details]
open fd count - el8baseos sync pulp 3.14 without proxy

Comment 28 pulp-infra@redhat.com 2022-07-08 19:18:08 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 30 pulp-infra@redhat.com 2022-07-15 12:49:18 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 38 pulp-infra@redhat.com 2022-08-05 13:48:33 UTC
Requesting needsinfo from upstream developer dkliban, ggainey because the 'FailedQA' flag is set.