Hide Forgot
Description of problem: 'fedpkg build' is in top five most CPU eager processes on my machine (sometimes top one). Why does it use aggressive polling? I think the design should be improved, there is no need to utilize the CPU two much. Version-Release number of selected component (if applicable): fedpkg-1.21-2.fc23.noarch How reproducible: Always Steps to Reproduce: 1. fedpkb build 2. Check top 3. Actual results: fedpkg process on top places Expected results: fedpkg process on bottom places Additional info:
(In reply to Jaroslav Škarvada from comment #0) > Why does it use aggressive polling? Typo, the right question: Why? Does it use aggressive polling?
Can you try running the build with koji directly to see if it has the same symptoms? koji build f24 --scratch $(fedpkg giturl) When watching tasks, it polls Koji for the status once per second. As far as I know, there is no other way to get the same information. I guess all we could do here is change the interval.
I guess the code is coming from the rpkg package which, regarding the watch method, seems to be just copy&paste of the code from the koji client. Koji client itself seems to consume a bit less CPU, but it still seems too much for me. It seems koji uses 5 seconds polling interval, but rpkg uses 1 second polling interval. I think it shouldn't matter much, 1 second is pretty long, but I think the following is the problem: http://fpaste.org/344262/ Notice the number of select calls in less than 100 milliseconds - this is what's utilizing the CPU.
Increasing the sleep delay to 5 seconds didn't help much: 18:28:53.190394 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 585421}) 18:28:53.605273 read(3, "\27\3\3\1,", 5) = 5 18:28:53.605495 read(3, "e\210\v8\3101+yQR\234p\343[\366\311\213\276b\227\217\226V1z\356\371\34/\16r\7"..., 300) = 300 18:28:53.605877 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994}) 18:28:53.606229 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999995}) ..... 18:28:53.630567 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 598114}) 18:28:54.032746 read(3, "\27\3\3\1+", 5) = 5 18:28:54.032944 read(3, "e\210\v8\3101+{\274\177\251%Ze8\347l\255\201\231\34]\267\242\33-\360\276].4\352"..., 299) = 299 18:28:54.033262 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994}) 18:28:54.033516 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999996}) ... You can see it never sleeps longer than 0.4 seconds.
The following hack of _watch_koji_tasks seems to improve the situation: - time.sleep(1) + clk = time.time() + while time.time() < clk + 5: + time.sleep(5)
Or maybe better hack: clk = time.time() clkn = clk while clkn < clk + 5: time.sleep(clk + 5 - clkn) clkn = time.time()
I retried on another machine and it seems that just: time.sleep(5) should be enough.
(In reply to Jaroslav Škarvada from comment #7) > I retried on another machine and it seems that just: > time.sleep(5) > should be enough. and consistent with koji cli
Created attachment 1139677 [details] Proposed fix Lower CPU utilization by increasing the sleep interval and make it consistent with koji-cli defaults.
There may be more to optimize (e.g. the number of select calls), but this should be a good start.
I just tried running a scratch-build (didn't have anything to build), and fedpkg barely uses any CPU here. Those select calls you mention, do they happen during the "time.sleep(1)" ? Or do they happen in between sleeps, when fedpkg is actually doing something?
(In reply to Mathieu Bridon from comment #11) > I just tried running a scratch-build (didn't have anything to build), and > fedpkg barely uses any CPU here. > > Those select calls you mention, do they happen during the "time.sleep(1)" ? > Or do they happen in between sleeps, when fedpkg is actually doing something? It seems it happen between sleeps. By looking on the strace, it's full of: 17:08:55.278088 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994}) 17:08:55.278412 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994}) i.e. waiting for one FD #3, waiting for input, waiting for 43200 seconds, returned after 43199.999994 seconds. Is the timing so critical? Cannot it wait longer and read the FD in batches?
As of fedpkg-1.24-3.fc24.noarch, this does not happen. Thanks for your effort to dig into this issue. To close this bug at this moment. Feel free to reopen if this performance issue happens again.