Description of problem:
dnf hangs for a very long time banging on non-responsive mirrors.
[SKIPPED] PackageKit-1.0.3-4.fc21.x86_64.rpm: Already downloaded
[SKIPPED] PackageKit-glib-1.0.3-4.fc21.x86_64.rpm: Already downloaded
[SKIPPED] PackageKit-cached-metadata-1.0.3-4.fc21.x86_64.rpm: Already downloaded
[MIRROR] PackageKit-gtk3-module-1.0.3-4.fc21.x86_64.rpm: Curl error: Timeout was reached for ftp://mirror.cs.pitt.edu/fedora/linux/updates/testing/21/x86_64/p/PackageKit-gtk3-module-1.0.3-4.fc21.x86_64.rpm [Connection timed out after 120002 milliseconds]
[MIRROR] PackageKit-gstreamer-plugin-1.0.3-4.fc21.x86_64.rpm: Curl error: Timeout was reached for ftp://mirror.cs.pitt.edu/fedora/linux/updates/testing/21/x86_64/p/PackageKit-gstreamer-plugin-1.0.3-4.fc21.x86_64.rpm [Connection timed out after 120002 milliseconds]
[MIRROR] PackageKit-command-not-found-1.0.3-4.fc21.x86_64.rpm: Curl error: Timeout was reached for ftp://mirror.cs.pitt.edu/fedora/linux/updates/testing/21/x86_64/p/PackageKit-command-not-found-1.0.3-4.fc21.x86_64.rpm [Connection timed out after 120001 milliseconds]
(4-6/59): PackageKit-gtk3-module-1.0.3-4.fc21.x86_64.rpm 57% [=====================================================================================- ] --- B/s | 66 MB --:-- ETA
Version-Release number of selected component (if applicable):
dnf.noarch 0.6.3-2.fc21 @System
Steps to Reproduce:
1. dnf upgrade -y
dnf hangs for a very long time, eventually moves onto another mirror for the first 3 downloads and then on the 4th to 6th download hangs again as it returns to the dead mirrors.
1) dnf has more reasonable timeouts. 10 seconds should do it. If a mirror takes longer than that to respond we probably shouldn't be using it.
2) past failures should be remembered and those mirrors are blacklisted for a certain length of time, certainly at least for this session, perhaps with the same timeout as the metadata.
Thanks for the report.
1) Setting constants for non-responsiveness is always subjective. I personally think that current behavior is good enough. Maybe it could re-initiate failed downloads again at the end of the queue instead.
2) Is this possible, Tomas? Or better could it be marked as the least preferred mirror?
(In reply to Jan Silhan from comment #1)
> 2) Is this possible, Tomas? Or better could it be marked as the least
> preferred mirror?
Personally I would be in favor of moving timing-our mirror to the last position in priority list. It could be just intermittent failure or just one package missing on that particular mirror.
Maybe DNF could be clever and remove mirror completely after large X failures (like 50)?
Librepo has several option that can be used for fine-tuning of such behavior.
LRO_CONNECTTIMEOUT - Max time in sec for connection phase. (Default: 300 seconds)
LRO_LOWSPEEDLIMIT - The transfer speed in bytes per second that the transfer should be below during LRO_LOWSPEEDTIME seconds for the library to consider it too slow and abort. (Default: 0)
LRO_LOWSPEEDTIME - The time in seconds that the transfer should be below the LRO_LOWSPEEDLIMIT for the library to consider it too slow and abort. (Default: 120 seconds)
LRO_ALLOWEDMIRRORFAILURES - Max number of allowed failures per mirror. If a mirror outreach this number and there was no successful download, the mirror ignored for the rest of the session. (Default: 4)
LRO_ADAPTIVEMIRRORSORTING - After each finished transfer, the mirrors are resorted. - A mirror is moved forward or backward by one position depending on its rank (calculated as ration between successful and failed downloads) and ranks of its neighbors (Default: True)
JFYI, as you can see, in the Wolfgang's case, its the combination of lowpeedlimit and lowspeedtime what kills the transfer after 120sec (because the default connection timeout is far more higher - 300sec). So maybe it could be useful to also add these two options into repo conf.
Moving of non-responsive mirror at the end of the queue as suggested by Petr is possible and it could work.
Petr or Jan, could someone of you open me an RFE in bugzilla to get this thing tracked? Thanks
*** Bug 1185553 has been marked as a duplicate of this bug. ***
Making it configurable is a nice step. But c'mon, 300s timeout (or 120 as it currently seems to be)? This should be changed to some value that "just works" for most common cases, and not to have people discover this configuration option on their own. Long connection timeouts make sense for random pages on the web, but not for accessing mirrors which are supposed to be fast.
It depends. Yes, mirrors are supposed to be fast but they are also supposed to be available most of the time.
The world is not perfect and there are still people with dial-up, GPRS and similar types of connection. Such connections are slow, lossy and have high latency. We need use values that works for majority of people and 120s looks like such value. It works for them (for people with slow connection with high loss rate and high latency) but also for others with reliable high-speed connection types. Only drawback is that the second group can sometimes hit two minutes delay. But I guess we could do some changes and use shorter timeout as default (maybe something like 30s).
> can sometimes hit two minutes delay
If one mirror is nonresponsive. Sometimes more than one fails.
People who are on "bad" connections usually have slow transfers and/or unreliable packet delivery, but they usually do not have an extreme latency. Even for countries connected through satellite networks, round-trip latencies are usually below half a second. Let's say that determining whether a connection is up or down might take 10 roundtrips, so 10s should be enough.
Still rather high though, but certainly better then 120s.
I believe that I am just unlucky :-)
[root@localhost ~]# ping 220.127.116.11
PING 18.104.22.168 (22.214.171.124) 56(84) bytes of data.
64 bytes from 126.96.36.199: icmp_seq=13 ttl=42 time=947 ms
64 bytes from 188.8.131.52: icmp_seq=14 ttl=41 time=1606 ms
64 bytes from 184.108.40.206: icmp_seq=15 ttl=40 time=745 ms
64 bytes from 220.127.116.11: icmp_seq=16 ttl=41 time=7849 ms
64 bytes from 18.104.22.168: icmp_seq=17 ttl=41 time=6849 ms
64 bytes from 22.214.171.124: icmp_seq=18 ttl=41 time=6027 ms
64 bytes from 126.96.36.199: icmp_seq=19 ttl=41 time=7206 ms
64 bytes from 188.8.131.52: icmp_seq=20 ttl=41 time=6386 ms
64 bytes from 184.108.40.206: icmp_seq=21 ttl=41 time=6087 ms
64 bytes from 220.127.116.11: icmp_seq=22 ttl=42 time=5105 ms
64 bytes from 18.104.22.168: icmp_seq=23 ttl=42 time=4926 ms
64 bytes from 22.214.171.124: icmp_seq=24 ttl=41 time=5506 ms
64 bytes from 126.96.36.199: icmp_seq=25 ttl=41 time=5705 ms
64 bytes from 188.8.131.52: icmp_seq=26 ttl=40 time=5466 ms
64 bytes from 184.108.40.206: icmp_seq=27 ttl=41 time=10063 ms
64 bytes from 220.127.116.11: icmp_seq=28 ttl=41 time=9105 ms
64 bytes from 18.104.22.168: icmp_seq=29 ttl=41 time=8766 ms
--- 22.214.171.124 ping statistics ---
37 packets transmitted, 17 received, 54% packet loss, time 36001ms
rtt min/avg/max/mdev = 745.763/5785.517/10063.948/2587.143 ms, pipe 11
I live in Brazil. I am using a 3g connection...
English is not my natural language, sorry...
Fixed in the upstream. The default timeout is 30s - the same as in yum.
dnf-plugins-core-0.1.5-1.fc21,hawkey-0.5.3-2.fc21,dnf-0.6.4-1.fc21 has been submitted as an update for Fedora 21.
Package hawkey-0.5.3-2.fc21, dnf-plugins-core-0.1.5-1.fc21, dnf-0.6.4-1.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing hawkey-0.5.3-2.fc21 dnf-plugins-core-0.1.5-1.fc21 dnf-0.6.4-1.fc21'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
hawkey-0.5.3-2.fc21, dnf-plugins-core-0.1.5-1.fc21, dnf-0.6.4-1.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.