Bug 1320276 - 'fedpkg build' high CPU consumption
Summary: 'fedpkg build' high CPU consumption
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: rpkg
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dennis Gilmore
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-22 17:35 UTC by Jaroslav Škarvada
Modified: 2016-08-22 08:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-22 08:33:48 UTC
Type: Bug


Attachments (Terms of Use)
Proposed fix (441 bytes, patch)
2016-03-23 18:10 UTC, Jaroslav Škarvada
no flags Details | Diff

Description Jaroslav Škarvada 2016-03-22 17:35:51 UTC
Description of problem:
'fedpkg build' is in top five most CPU eager processes on my machine (sometimes top one). Why does it use aggressive polling? I think the design should be improved, there is no need to utilize the CPU two much.

Version-Release number of selected component (if applicable):
fedpkg-1.21-2.fc23.noarch

How reproducible:
Always

Steps to Reproduce:
1. fedpkb build
2. Check top
3.

Actual results:
fedpkg process on top places

Expected results:
fedpkg process on bottom places

Additional info:

Comment 1 Jaroslav Škarvada 2016-03-22 17:37:52 UTC
(In reply to Jaroslav Škarvada from comment #0)
> Why does it use aggressive polling?

Typo, the right question:

Why? Does it use aggressive polling?

Comment 2 Lubomír Sedlář 2016-03-23 07:30:22 UTC
Can you try running the build with koji directly to see if it has the same symptoms?

koji build f24 --scratch $(fedpkg giturl)

When watching tasks, it polls Koji for the status once per second. As far as I know, there is no other way to get the same information. I guess all we could do here is change the interval.

Comment 3 Jaroslav Škarvada 2016-03-23 16:55:11 UTC
I guess the code is coming from the rpkg package which, regarding the watch method, seems to be just copy&paste of the code from the koji client.

Koji client itself seems to consume a bit less CPU, but it still seems too much for me. It seems koji uses 5 seconds polling interval, but rpkg uses 1 second polling interval. I think it shouldn't matter much, 1 second is pretty long, but I think the following is the problem:

http://fpaste.org/344262/

Notice the number of select calls in less than 100 milliseconds - this is what's utilizing the CPU.

Comment 4 Jaroslav Škarvada 2016-03-23 17:32:32 UTC
Increasing the sleep delay to 5 seconds didn't help much:

18:28:53.190394 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 585421})
18:28:53.605273 read(3, "\27\3\3\1,", 5) = 5
18:28:53.605495 read(3, "e\210\v8\3101+yQR\234p\343[\366\311\213\276b\227\217\226V1z\356\371\34/\16r\7"..., 300) = 300
18:28:53.605877 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994})
18:28:53.606229 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999995})
.....
18:28:53.630567 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 598114})
18:28:54.032746 read(3, "\27\3\3\1+", 5) = 5
18:28:54.032944 read(3, "e\210\v8\3101+{\274\177\251%Ze8\347l\255\201\231\34]\267\242\33-\360\276].4\352"..., 299) = 299
18:28:54.033262 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994})
18:28:54.033516 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999996})
...

You can see it never sleeps longer than 0.4 seconds.

Comment 5 Jaroslav Škarvada 2016-03-23 17:46:37 UTC
The following hack of _watch_koji_tasks seems to improve the situation:

-           time.sleep(1)
+               clk = time.time()
+                while time.time() < clk + 5:
+                   time.sleep(5)

Comment 6 Jaroslav Škarvada 2016-03-23 17:52:51 UTC
Or maybe better hack:
                clk = time.time()
                clkn = clk
                while clkn < clk + 5:
                   time.sleep(clk + 5 - clkn)
                   clkn = time.time()

Comment 7 Jaroslav Škarvada 2016-03-23 18:00:12 UTC
I retried on another machine and it seems that just:
  time.sleep(5)
should be enough.

Comment 8 Jaroslav Škarvada 2016-03-23 18:00:39 UTC
(In reply to Jaroslav Škarvada from comment #7)
> I retried on another machine and it seems that just:
>   time.sleep(5)
> should be enough.

and consistent with koji cli

Comment 9 Jaroslav Škarvada 2016-03-23 18:10:44 UTC
Created attachment 1139677 [details]
Proposed fix

Lower CPU utilization by increasing the sleep interval and make it consistent with koji-cli defaults.

Comment 10 Jaroslav Škarvada 2016-03-23 18:12:02 UTC
There may be more to optimize (e.g. the number of select calls), but this should be a good start.

Comment 11 Mathieu Bridon 2016-03-23 18:29:05 UTC
I just tried running a scratch-build (didn't have anything to build), and fedpkg barely uses any CPU here.

Those select calls you mention, do they happen during the "time.sleep(1)" ? Or do they happen in between sleeps, when fedpkg is actually doing something?

Comment 12 Jaroslav Škarvada 2016-03-23 18:52:37 UTC
(In reply to Mathieu Bridon from comment #11)
> I just tried running a scratch-build (didn't have anything to build), and
> fedpkg barely uses any CPU here.
> 
> Those select calls you mention, do they happen during the "time.sleep(1)" ?
> Or do they happen in between sleeps, when fedpkg is actually doing something?

It seems it happen between sleeps. By looking on the strace, it's full of:

17:08:55.278088 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994})
17:08:55.278412 select(4, [3], [], [], {43200, 0}) = 1 (in [3], left {43199, 999994})

i.e. waiting for one FD #3, waiting for input, waiting for 43200 seconds, returned after 43199.999994 seconds. Is the timing so critical? Cannot it wait longer and read the FD in batches?

Comment 13 cqi 2016-08-22 08:33:48 UTC
As of fedpkg-1.24-3.fc24.noarch, this does not happen. Thanks for your effort to dig into this issue. To close this bug at this moment. Feel free to reopen if this performance issue happens again.


Note You need to log in before you can comment on or make changes to this bug.