Bug 1691856

Summary: Downloading packages consume 100% of one CPU
Product: [Fedora] Fedora Reporter: Yann Droneaud <yann>
Component: librepoAssignee: Jaroslav Rohel <jrohel>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: aspotashev, dmach, ego.cordatus, jmracek, jrohel, lhrazky, mblaha, mhatina, packaging-team-maint, pkratoch, rpm-software-management, tmlcoch, vitaly, vmukhame, yaneti
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: librepo-1.9.6-1.fc30 librepo-1.9.6-1.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-31 00:04:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Yann Droneaud 2019-03-22 16:36:06 UTC
Downloading packages makes DNF consume 100% of one CPU core, even for a modest download speed of 1.7MBytes/s:

(2/2): kernel-debug-debuginfo-5.0.3  1% [     ] 1.7 MB/s |  15 MB     11:24 ETA

Top reports:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                
   5559 root      20   0  501004 226224  31184 R 100,0   1,9   0:20.93 dnf                                                                                                                    
    944 root     -51   0       0      0      0 S   2,0   0,0   0:24.03 irq/127-iwlwifi   

Download should not be CPU bound, unless having a speedy network connection.

I've noticed the issue because the laptop's fan doesn't stop during the whole downloads, which seems to be a new thing, not happening before ... 

When using perf:

Samples: 123K of event 'cycles:ppp', Event count (approx.): 79317414716
Overhead  Command  Shared Object                             Symbol
   2,83%  dnf      [kernel.kallsyms]                         [k] syscall_return_via_sysret
   2,79%  dnf      [kernel.kallsyms]                         [k] entry_SYSCALL_64
   2,39%  dnf      libpython3.7m.so.1.0                      [.] lookdict_unicode_nodummy
   1,61%  dnf      libc-2.29.so                              [.] __strlen_avx2
   1,48%  dnf      libpython3.7m.so.1.0                      [.] lookdict_split
   1,45%  dnf      libpython3.7m.so.1.0                      [.] call_function
   1,44%  dnf      libc-2.29.so                              [.] __vfprintf_internal
   1,06%  dnf      libsolv.so.1                              [.] dirpool_add_dir
...

Comment 1 Yann Droneaud 2019-03-22 16:41:41 UTC
strace output is probably more useful:

poll([{fd=35, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=37, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=35, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=37, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=35, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=37, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
poll([{fd=35, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f221fca3180}, NULL, 8) = 0

It seems dnf is busylooping on the sockets (fd 35 and and fd 37) connected to the HTTP servers where the files are downloaded.
 
There's a bug somewhere.

Comment 2 Yann Droneaud 2019-03-22 16:56:59 UTC
gdb:

Breakpoint 1, __GI___poll (fds=fds@entry=0x7ffcfc020190, nfds=nfds@entry=1, timeout=timeout@entry=0) at ../sysdeps/unix/sysv/linux/poll.c:27
27	{
(gdb) bt
#0  0x00007f46a4a0f7f0 in __GI___poll (fds=fds@entry=0x7ffcfc020190, nfds=nfds@entry=1, timeout=timeout@entry=0) at ../sysdeps/unix/sysv/linux/poll.c:27
#1  0x00007f46a4a2b0e2 in __poll_chk (fds=fds@entry=0x7ffcfc020190, nfds=nfds@entry=1, timeout=timeout@entry=0, fdslen=fdslen@entry=24) at poll_chk.c:27
#2  0x00007f469c260d96 in poll (__timeout=0, __nfds=<optimized out>, __fds=0x7ffcfc020190) at /usr/include/bits/poll2.h:41
#3  0x00007f469c260d96 in Curl_socket_check (readfd0=37, readfd1=readfd1@entry=-1, writefd=-1, timeout_ms=timeout_ms@entry=0) at ../../lib/select.c:217
#4  0x00007f469c2522c4 in Curl_readwrite (conn=0x5575f846c4b0, data=data@entry=0x5575f6aabc30, done=done@entry=0x7ffcfc0202aa, comeback=comeback@entry=0x7ffcfc0202ab)
    at ../../lib/transfer.c:1281
#5  0x00007f469c25bf8c in multi_runsingle (multi=0x5575f6154c70, now=..., data=0x5575f6aabc30) at ../../lib/multi.c:1932
#6  0x00007f469c25d4d1 in curl_multi_perform (multi=0x5575f6154c70, running_handles=running_handles@entry=0x7ffcfc020428) at ../../lib/multi.c:2217
#7  0x00007f469cb2cede in lr_perform (err=0x7ffcfc020430, dd=0x7ffcfc020450) at /usr/src/debug/librepo-1.9.5-1.fc30.x86_64/librepo/downloader.c:2505
#8  0x00007f469cb2cede in lr_download (targets=targets@entry=0x5575f34e9260 = {...}, failfast=failfast@entry=1, err=err@entry=0x7ffcfc020828)
    at /usr/src/debug/librepo-1.9.5-1.fc30.x86_64/librepo/downloader.c:2620
#9  0x00007f469cb35eae in lr_download_packages (targets=targets@entry=0x5575f33c3a30 = {...}, flags=flags@entry=LR_PACKAGEDOWNLOAD_FAILFAST, err=err@entry=0x7ffcfc020828)
    at /usr/src/debug/librepo-1.9.5-1.fc30.x86_64/librepo/package_downloader.c:416
#10 0x00007f469cc82eaa in libdnf::PackageTarget::downloadPackages(std::vector<libdnf::PackageTarget*, std::allocator<libdnf::PackageTarget*> >&, bool)
    (targets=std::vector of length 2, capacity 2 = {...}, failFast=<optimized out>) at /usr/src/debug/libdnf-0.28.0-1.fc30.x86_64/libdnf/repo/Repo.cpp:1668
...
(gdb) frame 3
#3  Curl_socket_check (readfd0=37, readfd1=readfd1@entry=-1, writefd=-1, timeout_ms=timeout_ms@entry=0) at ../../lib/select.c:217
217	    r = poll(pfd, num, pending_ms);
(gdb) frame 4
#4  0x00007f469c2522c4 in Curl_readwrite (conn=0x5575f846c4b0, data=data@entry=0x5575f6aabc30, done=done@entry=0x7ffcfc0202aa, comeback=comeback@entry=0x7ffcfc0202ab)
    at ../../lib/transfer.c:1281
1281	    select_res = Curl_socket_check(fd_read, CURL_SOCKET_BAD, fd_write, 0);

Comment 3 Yann Droneaud 2019-03-22 17:48:56 UTC
It's likely an issue with librepo:

# rpm -q --changelog librepo
* lun. mars 11 2019 Pavla Kratochvilova <pkratoch> - 1.9.5-1
- Update to 1.9.5
- Reduce download delays

Comment 4 Yann Droneaud 2019-03-22 22:49:53 UTC
(In reply to Yann Droneaud from comment #3)
> It's likely an issue with librepo:
> 
> # rpm -q --changelog librepo
> * lun. mars 11 2019 Pavla Kratochvilova <pkratoch> - 1.9.5-1
> - Update to 1.9.5
> - Reduce download delays

It's there, and I've proposed a fix as https://github.com/rpm-software-management/librepo/pull/144

Once rebuilt with this patch, the CPU consumption is down to less than 20%, which is more acceptable.

Comment 6 Jaroslav Rohel 2019-03-26 07:29:36 UTC
The PR https://github.com/rpm-software-management/librepo/pull/144 from Yann Droneaud fixes it.

Comment 7 Fedora Update System 2019-03-27 13:37:26 UTC
dnf-4.2.2-1.fc30 librepo-1.9.6-1.fc30 libdnf-0.28.1-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-b504d6ad8f

Comment 8 Fedora Update System 2019-03-27 13:37:50 UTC
dnf-4.2.2-1.fc29 librepo-1.9.6-1.fc29 libdnf-0.28.1-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7fbfa37585

Comment 9 Fedora Update System 2019-03-27 18:18:31 UTC
dnf-4.2.2-1.fc30, libdnf-0.28.1-1.fc30, librepo-1.9.6-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b504d6ad8f

Comment 10 Fedora Update System 2019-03-27 20:02:01 UTC
dnf-4.2.2-1.fc29, libdnf-0.28.1-1.fc29, librepo-1.9.6-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-7fbfa37585

Comment 11 Vitaly Zaitsev 2019-03-28 11:39:54 UTC
*** Bug 1692423 has been marked as a duplicate of this bug. ***

Comment 12 Fedora Update System 2019-03-31 00:04:16 UTC
dnf-4.2.2-1.fc30, libdnf-0.28.1-1.fc30, librepo-1.9.6-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Fedora Update System 2019-03-31 03:00:50 UTC
dnf-4.2.2-1.fc29, libdnf-0.28.1-1.fc29, librepo-1.9.6-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.