This commit: https://github.com/rpm-software-management/rpm/commit/7a7c31f551ff167f8718aea6d5048f6288d60205 is causing a serious performance regression when ulimit -n is set significantly higher than the typical 1024 on the desktop. For instance, Docker configurations can set it to 1048576, and that causes a 3x time degradation or more when hundreds of packages are installed.
The bug that produced the fix in question is this: https://bugzilla.redhat.com/show_bug.cgi?id=919801
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle. Changing version to '28'.
This error just bit me and I got a full day scratching my head at why my jenkins/docker deployment was so slow. I thought I was being bitten by some strange problem in my harddrive or something similar, because strace was showing me an installation of 1 file per second. The weiredest thing was that my builds outside docker did not have this problem. At the end of the day I finally found a post on github [1] telling me it is a combination of the rpm package manager and docker which caused massive slowdown. The solution they offer is just to use any other package manager than rpm. In the mean time Open-embbeded has submitted a revert patch. Please consider it while this is not fixed. [1] https://github.com/moby/moby/issues/23137
(In reply to Paulo Neves from comment #3) > This error just bit me and I got a full day scratching my head at why my > jenkins/docker deployment was so slow. > > I thought I was being bitten by some strange problem in my harddrive or > something similar, because strace was showing me an installation of 1 file > per second. The weiredest thing was that my builds outside docker did not > have this problem. > > At the end of the day I finally found a post on github [1] telling me it is > a combination of the rpm package manager and docker which caused massive > slowdown. The solution they offer is just to use any other package manager > than rpm. > > In the mean time Open-embbeded has submitted a revert patch. Please consider > it while this is not fixed. > [1] https://github.com/moby/moby/issues/23137 Same issues, same workaround as Paulo.
Folks, I can see how looping fnctl() over a huge number of fd's would be slower than not doing so... but -p <lua> scriptlets calling posix.exec() is something that's expected to occur maybe a few times in a transaction, certainly not once per each installed file. Smells fishy. Paulo/Mirza, please attach a full strace of an "one file per second" install (timings of course differ under strace)
TBH, seems fishy to me too. I am simply not seeing the fcntl bomb here, on a standard setup, building standard image. Strace does not show that fcntl is executed 'ulimit -n' (or so) times in a loop. What I said in the bug description was second hand info that I'm not able to reproduce (no docker here). It's entirely possible folks seeing the problem are misusing Lua, or we have mis-diagnozed it otherwise. Strace, please.
Nevermind, I found it. This has nothing to do with Lua, the problematic code is here: https://github.com/rpm-software-management/rpm/blob/master/lib/rpmscript.c#L181 Works okay when it counts up to 1024, but I can imagine how counting up to a 1024*1024 is indeed 1024 times slower. And that's once per every package scriptlet I suppose.
Well but you just said you're not seeing an fcntl() bomb. Whether Lua or not, you'd be seeing those fcntl()'s in a strace (although you need to follow forks to see that). A strace please. This all rings another bell, whether related or not: just recently I noticed the rpm build-time test-suite running *dreadfully* slow under mock-builds. I haven't had time to properly investigate, but it had to do with mock recently switching it's default to systemd-nspawn instead of chroot. And even with systemd-nspawn, things got back to normal when enabling network access. Pray tell why, it's not as if rpm does any networking.
I was doing a wrong grep. :) With the right grep I am indeed seeing it, although it counts only up to 1024, because I am not using Docker. I'll attach a sample strace from postinst execution in a second (I run strace -ff to better separate which process is doing what).
Created attachment 1433043 [details] fcntl bomb via strace
Well obviously those fcntl()'s are going to be there, and that going through multitude more is going to take more time, that's not the question. There's still zero connection between multiple those fcntl()'s and seeing "file per second" install rate. A strace from an install that actually exhibits the problem, please. Preferably a full strace at that, and with -tt so the timing can be seen.
The full strace of making an image is over a gigabyte, I don't want to attach that, and you probably don't want to look at it. Would you be okay with excerpts? 1. ulimit -n 1024 scenario Making an image under strace: 10723 12:08:13.423430 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) ... 10723 12:08:13.499483 fcntl(1023, F_GETFD) = -1 EBADF (Bad file descriptor) So 0.07 seconds or so. Making an image without strace takes about 22 seconds 2. ulimit -n 1000000 scenario 17249 12:17:20.625723 fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) ... 17249 12:18:29.382538 fcntl(999999, F_GETFD) = -1 EBADF (Bad file descriptor) This jumps to 70 seconds. Which is actually exactly 1000*0.07. Making an image without strace takes 43 seconds. Does this convince you? If not, I can patch various hardcoded values for maxfd into rpm's doScriptExec(), and tell you how that affects running time for us. No need to run strace for it.
When using 'ulimit -n 1000000' the strace size is already pushing 10 gigabytes here, and it's still not finished. Not surprising, considering each fcntl bomb adds hundreds of megabytes of output to it. You can't seriously expect us to provide that file :)
Right, I don't need proof for those fcntl()'s taking place, of course they do. And no, I don't want to look at 10gb's worth of fcntl()'s :) What I *do* want is the "one file per second" timed strace (if full not reasonable then at least a couple of package's worth of it), because that sounds potentially like a different issue.
I suspect 'one file per second' means 'one package containing a postinst script per second' which is almost certainly the same issue.
For my tests, the created image has about 30 packages that have postinsts in them, and the jump in image creation time (22sec->43sec) is consistent with that. Some people may have 10x the amount of such packages for their images, and that's where it starts to hurt seriously.
This data doesn't quite add up now. You're speaking of 'ulimit -n 1000000' causing one script taking 70s, yet 30 package image with about as many scripts takes 43s. I have to assume those are with different file limit settings, but you know what they say about assuming.
That's the overhead of strace. It has to process and write things out to disk, and of course that adds massively to the timing, considering each fcntl bomb adds 300 megabytes to the log.
Oh, of course. Said so myself in an earlier comment. Stupid me :)
Anyway, what should be done about this bit in rpmscript.c? /* XXX Force FD_CLOEXEC on all inherited fdno's. */ open_max = sysconf(_SC_OPEN_MAX); if (open_max == -1) { open_max = 1024; } for (fdno = 3; fdno < open_max; fdno++) { flag = fcntl(fdno, F_GETFD); if (flag == -1 || (flag & FD_CLOEXEC)) continue; xx = fcntl(fdno, F_SETFD, FD_CLOEXEC); We can patch it out in Yocto, and carry the patch forward, but if you can find a better way out, that'd be a lot better.
Since we have been affected by this problems in our builds in Docker containers, where we have seen an increase in the time it takes to generate a rootfs using dnf from about five minutes to 30-45 minutes, I have invested some time in examining it. Here is a small table showing the times I measured when building core-image-minimal from Poky (it is an a lot smaller image than our regular ones): Max files Time --------- ----- 1024 30 seconds 1048576 50 seconds 10000000 3 minutes 40 seconds 100000000 31 minutes As can be seen, the time it takes to generate the rootfs is quite severely affected when the maximum number of file descriptors is increased. It is possible to reproduce the problem without Docker by increasing the maximum number of open files. This can be done by editing /etc/security/limits.conf and adding: * - nofile 100000000 where 100000000 is the maximum number of open files. It is also very likely that you need to increase the global limit by running: sudo sysctl -w fs.nr_open=100000000 After that, ssh to localhost to get a new session where the new settings are active. Then you can use `ulimit -n <number>` to set the max you want to try. My suggestion for how to fix this problem is to make rpm explicitly set the maximum number of open file descriptors to a sane value when it starts using setrlimit(). That way the application is not adversely affected by unusual configurations for no reason. I do not know the rpm code well enough to suggest an implementation, or how a user can configure the actual limit used if that is deemed necessary.
One other approach would be to do readdir(/proc/self/fd) and iterate over fds found in there. This can also be done conditionally, say if open_max > 1024.
I went ahead to implement my suggestion, here's the patch: https://github.com/rpm-software-management/rpm/pull/444
(In reply to Kir Kolyshkin from comment #22) > One other approach would be to do readdir(/proc/self/fd) and iterate over > fds found in there. This can also be done conditionally, say if open_max > > 1024. Ooh, I like that idea. Much better than mine. I gave some more comments in the GitHub review.
This should be fixed in rawhide now (rpm >= 4.14.2-rc1)
rpm-4.14.2-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-16c78b3d92
rpm-4.14.2-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-16c78b3d92
rpm-4.14.2-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.