Bug 1212320 - [metadata] dnf makecache never timeouts
[metadata] dnf makecache never timeouts
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dnf (Show other bugs)
21
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Michal Luscon
Fedora Extras Quality Assurance
: Reopened
: 1206878 1214538 1268364 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-04-16 03:23 EDT by Leszek Matok
Modified: 2015-10-21 09:08 EDT (History)
15 users (show)

See Also:
Fixed In Version: dnf-1.0.2-3.fc22 dnf-0.6.4-7.fc21
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-13 02:52:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gdb's bt (23.29 KB, text/plain)
2015-04-20 13:29 EDT, Leszek Matok
no flags Details
better bt (24.18 KB, text/plain)
2015-04-21 09:09 EDT, Leszek Matok
no flags Details

  None (edit)
Description Leszek Matok 2015-04-16 03:23:58 EDT
Hi, I've tried running dnf and it said:

metadata already locked by 6259
  The application with PID 6259 is: dnf
    Memory :  68 M RSS (528 MB VSZ)
    Started: Tue Apr 14 22:48:43 2015 - 1 day(s) 10:15:16 ago
    State  : Sleeping

The process is:

root      6259     1  0 kwi14 ?        00:00:03 /usr/bin/python -OO /usr/bin/dnf -v makecache timer


Let's see what it does:

# strace -f -p 6259
Process 6259 attached
select(27, [26], [], [], {0, 435164})   = 0 (Timeout)
poll([{fd=26, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
select(27, [26], [], [], {1, 0})        = 0 (Timeout)
poll([{fd=26, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
select(27, [26], [], [], {1, 0})        = 0 (Timeout)
poll([{fd=26, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)


So what's fd 26?

# lsof -n -p 6259
(...)
dnf     6259 root   26u  IPv4            8448464       0t0      TCP 83.9.186.222:44556->209.132.181.16:https (ESTABLISHED)
(...)


209.132.181.16 is wildcard.fedoraproject.org.
83.9.186.222 is an IP which my home computer had until 6:00 the next day, changed twice by now.

I think it's worth noting that dnf not only failed to notice that connection doesn't work, it even failed to notice the local network interface (ppp1) went down few hours later (actually I believe it gets removed and recreated every pppd reconnect).

This is dnf-0.6.4-1.fc21.noarch

I'm letting this process live in case you want me to get some more info from it.
Comment 1 Leszek Matok 2015-04-19 08:14:51 EDT
It's still around.

metadata already locked by 6259
  The application with PID 6259 is: dnf
    Memory :  62 M RSS (528 MB VSZ)
    Started: Tue Apr 14 22:48:43 2015 - 4 day(s) 15:24:31 ago
    State  : Sleeping
Comment 2 Honza Silhan 2015-04-20 04:39:01 EDT
Thanks for the report. Did you configured `timeout` option in /etc/dnf/dnf.conf? If you kill the process does it happen again?
Comment 3 Michal Luscon 2015-04-20 05:19:28 EDT
Hi,

are you able to attach gdb to running process and check where dnf got stuck?
Comment 4 Leszek Matok 2015-04-20 13:29:18 EDT
Created attachment 1016465 [details]
gdb's bt

Michal,

I've attached output of bt, please let me know if you need me to dive deeper, I'll keep the process around :)

I've only installed python-debuginfo and imagine it's enough - we can see .py files and lines and even the repo which it was trying to get metadata for.


Jan,

No, "timeout" doesn't appear anywhere in /etc/dnf.

I don't want to kill it yet (in case Michal needs more info), but it probably wouldn't be trivial to reproduce losing connection to Fedora servers again. I'd prefer not to build reproducible cases with iptables for something this simple :)
Comment 5 Michal Luscon 2015-04-21 08:09:42 EDT
Would it be possible to install librepo debug package and upload the bt again?
Comment 6 Leszek Matok 2015-04-21 09:09:34 EDT
Created attachment 1016854 [details]
better bt

In the meantime I have foolishly upgraded python with yum, so I had to downgrade that and accompanying debuginfo - hope this didn't break anything!

(Hopefully) better backtrace attached.
Comment 7 Michal Luscon 2015-04-28 03:34:04 EDT
It looks like librepo does not set a proper timeout value for select() operation => reassigning to librepo.
Comment 8 Tomas Mlcoch 2015-05-25 07:55:59 EDT
Librepo always uses 1 sec at most for the select().
The problem is not inside the select, rather this is a problem with timeout configuration.

By default, librepo uses:
LRO_LOWSPEEDTIME = 120 second
LRO_LOWSPEEDLIMIT = 1000 bytes per second

These configuration options ensure that if a transfer speed is below 1000 B/s for 120sec then librepo consider this transfer too slow and abort.

I've tried to reproduce the reported issue with librepo:
I've opened a connection and started a download of an ISO, then I disabled my network adapter and waited. After 120sec I got this (expected) error:
Timeout was reached for http://mirror.karneval.cz/pub/linux/fedora/linux/releases/21/Workstation/i386/iso/Fedora-Live-Workstation-i686-21-5.iso [Operation too slow. Less than 1000 bytes/sec transferred the last 120 seconds]

Michal, are you sure that DNF doesn't set these values to different values which could led to the reported issue?
Comment 9 Michal Luscon 2015-05-25 08:24:49 EDT
Thank you Tomas. It looks like dnf sets LRO_LOWSPEEDLIMIT to zero. Is there any reasonable default value for LRO_LOWSPEEDLIMIT?
Comment 10 Tomas Mlcoch 2015-06-01 02:09:30 EDT
The default values could work but feel free tweak them to better fit your requirements - maybe you could find useful to use a shorter time period (e.g. 60sec).
Comment 11 Kashyap Chamarthy 2015-06-09 11:53:59 EDT
Related pull request (thanks to paragan on #yum, for pointing me to this) and the associated patch list:


  https://github.com/rpm-software-management/dnf/pull/291
  https://github.com/mluscon/dnf/commit/1b17488b2b31b5116ccff18051aa71b9c23b65e8
Comment 12 Fedora Update System 2015-06-12 05:05:29 EDT
dnf-1.0.1-2.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/dnf-1.0.1-2.fc22
Comment 13 Michal Luscon 2015-06-12 08:18:07 EDT
*** Bug 1206878 has been marked as a duplicate of this bug. ***
Comment 14 Michal Luscon 2015-06-12 08:18:34 EDT
*** Bug 1214538 has been marked as a duplicate of this bug. ***
Comment 15 Fedora Update System 2015-06-14 13:33:26 EDT
Package dnf-1.0.1-2.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dnf-1.0.1-2.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-10004/dnf-1.0.1-2.fc22
then log in and leave karma (feedback).
Comment 16 Fedora Update System 2015-06-20 20:16:08 EDT
dnf-1.0.1-2.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 17 Fedora Update System 2015-07-22 04:25:33 EDT
dnf-1.0.2-2.fc22,hawkey-0.5.9-2.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/dnf-1.0.2-2.fc22,hawkey-0.5.9-2.fc22
Comment 18 Fedora Update System 2015-08-10 22:08:51 EDT
dnf-1.0.2-3.fc22, hawkey-0.5.9-3.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 19 Steevithak 2015-09-24 12:32:02 EDT
I seem to be experiencing this bug too. I was doing a "dnf update" today and the process hung after the downloads were complete, about three packages into the "[DRPM]... done" sequence. After about 10 minutes I gave up on and killed the process with a Ctrl-C. Then tried to do a "dnf clean all" but that hung as well and cannot be killed by Ctrl-C. Had to open a second terminal, su to root and do a kill -9 on it. Subsequent attempts to do a clean also hang. How does one recover from this?
Comment 20 Steevithak 2015-09-24 13:53:55 EDT
Update: after an hour of playing with dnf, it was still hanging immediately after every attempt to execute it. I finally gave up, shutdown the system, power cycled it, and restarted. After it came back up I was able to execute "dnf -v clean all" successfully on the first try. After that I was also successful at doing the update with "dnf -v update". Towards the end of the update it looked like dnf had hung but running top in another terminal revealed it had just stopped producing output while a lot of "dracut" processes were running in the background (probably needs some user feedback to say "not dead, please wait while x happens" or "x % done with y" so users won't wonder what's up - this was a *long* delay, like 4 or 5 minutes with no output from dnf. Seems like Yum always let us know what it was doing in these sorts of cases. It was also a lot faster than dnf on large updates like this (or maybe it just seemed that way because I wasn't worried that it had crashed because of the lack of user feedback?)

It's kind of scary that dnf got itself so messed up that the entire system has be restarted before it would run again! Anyway, for the record, here's the version involved: dnf-1.1.1-2.fc2
Comment 21 Fedora Update System 2015-09-29 07:18:31 EDT
dnf-0.6.4-7.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-9bcdcdc46b
Comment 22 Fedora Update System 2015-10-02 14:51:52 EDT
dnf-0.6.4-7.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update dnf'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-9bcdcdc46b
Comment 23 Andrew Wippler 2015-10-06 20:40:23 EDT
I am encountering the same "freeze" with dnf on my F22 laptop and F23 desktop. When I added timeout=1 to /etc/dnf/dnf.conf the freeze immediately went away. When I removed that line, the "freeze" started happening again; however, it was timing out at 30 seconds rather than being indefinite.

As a side note, ftp://ftp.uci.edu/mirrors/fedora seems to be having an issue and not working when my dnf attempts to download from them.
Comment 24 Fedora Update System 2015-10-13 02:51:52 EDT
dnf-0.6.4-7.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
Comment 25 Michal Luscon 2015-10-21 09:08:18 EDT
*** Bug 1268364 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.