Bug 1336819
| Summary: | [RHEL-7-2] Cannot access/write repodata files: [Errno 24] Too many open files | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | PaulB <pbunyan> |
| Component: | createrepo | Assignee: | Packaging Maintenance Team <packaging-team-maint> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | BaseOS QE Security Team <qe-baseos-security> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | emrakova, james.antill, mdomonko, pbunyan, prarit |
| Target Milestone: | rc | Flags: | pbunyan:
needinfo-
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-03 14:13:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
PaulB
2016-05-17 13:50:57 UTC
All,
While configuring the large system (24TB-RAM 768CPUs) for for remote testing, I found that the createrepo command failed due to the following issue:
---<-snip->---
Spawning worker 492 with 0 pkgs
Spawning worker 493 with 0 pkgs
Spawning worker 494 with 0 pkgs
Spawning worker 495 with 0 pkgs
Spawning worker 496 with 0 pkgs
Spawning worker 497 with 0 pkgs
Spawning worker 498 with 0 pkgs
Spawning worker 499 with 0 pkgs
Spawning worker 500 with 0 pkgs
Spawning worker 501 with 0 pkgs
Spawning worker 502 with 0 pkgs
Spawning worker 503 with 0 pkgs
Spawning worker 504 with 0 pkgs
Spawning worker 505 with 0 pkgs
Spawning worker 506 with 0 pkgs
Cannot access/write repodata files: [Errno 24] Too many open files
---<-snip->---
Looking at the "default" set current limits, I see this:
[root@ ]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 99079946
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 99079946
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@ ]#
In an attempt to tweak the "open files (-n) 1024" limit,
we tried the following:
[root@ ]# ulimit -n 2000
[root@ ]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 99079946
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 2000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 99079946
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@ ]#
Rerunning the createrepo command with "open files (-n) 2000" set,
we get the following issue:
Spawning worker 762 with 0 pkgs
Spawning worker 763 with 0 pkgs
Spawning worker 764 with 0 pkgs
Spawning worker 765 with 0 pkgs
Spawning worker 766 with 0 pkgs
Spawning worker 767 with 0 pkgs
Traceback (most recent call last):
File "/usr/share/createrepo/genpkgmetadata.py", line 294, in <module>
main(sys.argv[1:])
File "/usr/share/createrepo/genpkgmetadata.py", line 268, in main
mdgen.doPkgMetadata()
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 421, in doPkgMetadata
self.writeMetadataDocs(packages)
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 704, in writeMetadataDocs
log_messages(num)
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 676, in log_messages
for stream in select((job.stdout, job.stderr), (), ())[0]:
ValueError: filedescriptor out of range in select()
I reset the the value:
[root@ ]# ulimit -n 1024
You have new mail in /var/spool/mail/root
[root@ ]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 99079946
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 99079946
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@ ]#
And was able to use the following, as a workaround for this issue:
[root@ ]# createrepo --workers 1 .
Spawning worker 0 with 65 pkgs
Workers Finished
Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete
[root@ ]#
Best,
-pbunyan
So I think there are two issues. The first is the
Cannot access/write repodata files: [Errno 24] Too many open files
which occurs because the OS has a lot of open files when createrepo is run. This, as pbunyan, has pointed out can be resolved by increasing the open file limit.
The second problem pbunyan hits is more of a coding problem AFAICT:
File "/usr/share/createrepo/genpkgmetadata.py", line 294, in <module>
main(sys.argv[1:])
File "/usr/share/createrepo/genpkgmetadata.py", line 268, in main
mdgen.doPkgMetadata()
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 421, in doPkgMetadata
self.writeMetadataDocs(packages)
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 704, in writeMetadataDocs
log_messages(num)
File "/usr/lib/python2.7/site-packages/createrepo/__init__.py", line 676, in log_messages
for stream in select((job.stdout, job.stderr), (), ())[0]:
ValueError: filedescriptor out of range in select()
which implies some declaration or casting issue with with the way the filedescriptors are stored.
P.
Can you still reproduce this issue? Looking into the code, I cannot really grasp why it would spawn so many workers without being explicitly asked with the --workers switch. The way it works is, if the switch isn't specified, it spawns as many workers as there are CPUs (or as many packages are being processed if the number of CPUs is higher than that). Eh, I was looking at the upstream version; in the RHEL version, the logic always spawns as many workers as there are CPUs (so it's possible the way we detect the number of CPUs is flawed). But we would still greatly benefit from having a reproducer. |