Bug 1272977

Summary: dnf waits 10 minutes for a single repo (for unresponsive ftp server?)
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: dnfAssignee: Packaging Maintenance Team <packaging-team-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: jsilhan, mluscon, packaging-team-maint, pnemade, tmlcoch, vmukhame
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dnf-1.1.8-1.fc24 dnf-1.1.8-1.fc23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-13 07:22:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dnf.iibrepo.log none

Description Kamil Páral 2015-10-19 10:38:55 UTC
Description of problem:
I have just waited 20 minutes to see output of `dnf update`. See the duration on the right side:

$ sudo dnf update --refresh
RPM Fusion for Fedora 23 - Free                                                                             7.8 MB/s | 738 kB     00:00    
wikitcms repository                                                                                         8.4 kB/s | 7.5 kB     00:00    
RPM Fusion for Fedora 23 - Nonfree                                                                          3.6 MB/s | 218 kB     00:00    
Failed to synchronize cache for repo 'Java_esmska' from 'http://download.opensuse.org/repositories/Java:/esmska/common/': Yum repo downloading error: Downloading error(s): repodata/c416477a40cc4e89d9d636cc3657508c757b50743413faca9cdf3965b2d6fe1c-filelists.xml.gz - Cannot download, all mirrors were already tried without success; repodata/5975ff53b08dcf92ba13145afc939e1e4dc4c4d64439f455edfe9c10e85d1410-primary.xml.gz - Cannot download, all mirrors were already tried without success, disabling.
Blue Jeans Network, Inc. - x86_64 software and updates                                                       29 kB/s |  19 kB     00:00    
Copr repo for taskotron owned by tflink                                                                      18 kB/s |  13 kB     00:00    
RPM Fusion for Fedora 23 - Nonfree - Updates                                                                8.0 kB/s | 421  B     00:00    
firmware for qemu, built by jenkins, fresh from git repos                                                    36 kB/s |  13 kB     00:00    
Phabricator Repsitory                                                                                        89 kB/s | 123 kB     00:01    
Fedora 23 - x86_64 - Test Updates                                                                            33 kB/s |  20 MB     10:12    
RPM Fusion for Fedora 23 - Nonfree - Test Updates                                                           8.5 kB/s | 421  B     00:00    
RPM Fusion for Fedora 23 - Free - Updates                                                                   8.3 kB/s | 415  B     00:00    
Fedora 23 - x86_64                                                                                           63 kB/s |  43 MB     11:34    
RPM Fusion for Fedora 23 - Free - Test Updates                                                              6.8 kB/s | 415  B     00:00    
Last metadata expiration check performed 0:00:00 ago on Mon Oct 19 11:33:57 2015.
Dependencies resolved.


Fedora 23 Test Updates took 10 minutes 12 seconds to refresh, Fedora 23 base repo took 11 minutes 34 seconds to refresh. I can reproduce this consistently, and Jiri Eischmann sitting next to me as well. We're currently at FIT VUT university. Looking into dnf.librepo.log (which is very hard to navigate, see bug 1272963), this seems important:

select_next_target: Selecting mirror for: repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 31): Operation not supported
select_next_target: Selecting mirror for: repodata/62b181b19d390ad566c4a24e50f6ba02fae5274de954dcd85969a8728225ee42-prestodelta.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/62b181b19d390ad566c4a24e50f6ba02fae5274de954dcd85969a8728225ee42-prestodelta.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 32): Operation not supported
select_next_target: Selecting mirror for: repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 33): Operation not supported
lr_download: Downloading started
check_transfer_statuses: Transfer finished: repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz (Effective url: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz)
check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz []
check_transfer_statuses: Error during transfer: Curl error (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz []
sort_mirrors: Mirror ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/ was moved at the end


prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 32): Operation not supported
select_next_target: Selecting mirror for: repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 33): Operation not supported
select_next_target: Selecting mirror for: repodata/9ffb24b929aa28c5b7ea706a83ebe2f4e68d08771861f887752550ea96857726-primary.xml.gz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/9ffb24b929aa28c5b7ea706a83ebe2f4e68d08771861f887752550ea96857726-primary.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 34): Operation not supported
lr_download: Downloading started
check_transfer_statuses: Transfer finished: repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz (Effective url: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz)
check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz []
check_transfer_statuses: Error during transfer: Curl error (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz []
sort_mirrors: Mirror ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/ was moved at the end
check_transfer_statuses: Ignore error - Try another mirror


I have tried opening ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/ and ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/ manually in the browser, and those addresses are unresponsive - they're in the "connecting" state for a very long time. However, a broken server should not make dnf unusable (yum works fine).


Please consider this:
1. 10 minutes timeout for ftp protocol is way too long
2. dnf is silent and does not inform the user what's going on. Nobody is going to wait 10 minutes in a blinking terminal with no info on a screen. It should be obvious why the user is waiting.
3. It should either be possible to switch to a different server manually (ctrl+c once?), or better yet dnf should be more intelligent when waiting for a timeout. It serves no purpose to wait several minutes for a single repo, then try a different one, and possible wait another several minutes again. When a repo is completely unresponsive for 5-10 seconds, start another connection to a different repo. Have max N such connections in parallel and once one of them works reasonably fast, use it and close the others.

Thank you.

Version-Release number of selected component (if applicable):
dnf-1.1.2-4.fc23.noarch
librepo-1.7.16-2.fc23.x86_64
hawkey-0.6.1-2.fc23.x86_64
libsolv-0.6.11-3.git1f9abfb.fc23.x86_64


How reproducible:
it seems always, from this particular place. probably quite hard from a different place.

Comment 1 Kamil Páral 2015-10-19 10:41:42 UTC
Created attachment 1084330 [details]
dnf.iibrepo.log

Don't forget to unwrap comments when reading comment 0.

Here is dnf.librepo.log output, but since there are no timestamps, I have no idea which parts are relevant. I tried to pick a part I think is relevant.

Comment 2 Michal Luscon 2015-10-26 12:28:22 UTC
Hi Tomas,

is there any librepo timeout option which DNF should use to prevent this?

Comment 3 Tomas Mlcoch 2015-10-26 15:24:51 UTC
Hi Michal,

there are few. The most important one is LRO_CONNECTTIMEOUT [1]. Another ones are LRO_LOWSPEEDTIME [2] and LRO_LOWSPEEDLIMIT [3].

Default timeout value for connection phase (LRO_CONNECTTIMEOUT) is currently set to 120sec. 10min timeout is a really big number.

From what I see, dnf is using:

current_timeout = h.getinfo(librepo.LRO_CONNECTTIMEOUT)
h.connecttimeout = max(self.timeout, current_timeout)

Maybe the self.timeout in dnf contains a higher value than librepo and thus the higher timeout is used?

An idea, is it possible to set the connection timeout value to dnf manually via dnf.conf? Could be the higher timeout hard-coded in Kamil's and Jiri's configs? (I know this is very unlikely, but worth to check)

Also, if the value could be set by user, it may worth to try to set there manually some small value (e.g. 10sec) and then try to reproduce the issue. Would you be willing to try that once you are at VUT FIT again, Kamil?

[1] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_CONNECTTIMEOUT
[2] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_LOWSPEEDTIME
[3] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_LOWSPEEDLIMIT

Comment 4 Michal Luscon 2015-10-26 15:55:41 UTC
The default timeout values currently used by DNF are:

h.lowspeedlimit = 1000 
h.lowspeedtime = 120
h.connecttimeout = max(120, librepo.LRO_CONNECTTIMEOUT) 

librepo.LRO_CONNECTTIMEOUT seems to be set to 18 by default, any idea why?


If I add timeout=1 into my dnf conf, I get:

Error: Failed to synchronize cache for repo 'updates' from 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f22&arch=x86_64': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f22&arch=x86_64 [Connection timed out after 1001 milliseconds]

Comment 5 Kamil Páral 2015-10-26 17:33:44 UTC
(In reply to Tomas Mlcoch from comment #3)
> Could be the higher timeout hard-coded in Kamil's and Jiri's
> configs? (I know this is very unlikely, but worth to check)

Nope :)

> Also, if the value could be set by user, it may worth to try to set there
> manually some small value (e.g. 10sec) and then try to reproduce the issue.
> Would you be willing to try that once you are at VUT FIT again, Kamil?

That's a great suggestion. Unfortunately I don't go there very often, and it seems they already fixed their ftp server (now they work fine). I wonder if there's a way to emulate this? (an ftp server which doesn't respond).

Comment 6 Tomas Mlcoch 2015-10-27 08:55:27 UTC
librepo.LRO_CONNECTTIMEOUT == 18 is ok as it is just a constant for usage in Handle.setopt() method [1].

But there is a problem. Dnf is using:
current_timeout = h.getinfo(librepo.LRO_CONNECTTIMEOUT)

which is not right, "LRO_" prefixed constants should be used only in Handle.setopt(), for Result.getinfo() there are "LRI_" prefixed constants [2].

LRO_ and LRI_ lists weren't designed 1:1 because Result object can contain extra info (from the downloading etc.), for example LRI_MIRRORS which returns list of mirrors built from urls specified by LRO_URLS and urls obtained from metalink and/or mirrorlists.

librepo.LRO_CONNECTTIMEOUT == 18 == librepo.LRI_FASTESTMIRROR so the line in dnf doesn't obtain a timeout set in librepo but value 0 or 1 whether fastest mirror build-in functionality is enabled or disabled (it is disabled default == 0). Currently there is no way in librepo how to get connecttimeout.

Moreover, I think that intended construction:

h.connecttimeout = max(self.timeout, current_timeout)

is incorrect. It means that when a user specifies lower timeout than handle already has, user's timeout is going to be ignored and handle's value is going to be preserved. IMHO, the correct approach should be:

if self.timeout:
    h.connecttimeout = self.timeout

(This probably isn't the case of the 10min timeout, but definitely a bug)

The fail you see when you set timeout to 1sec just means that librepo wasn't able to finish connect phase in the 1sec (1000ms) you have specified, which looks sane, 1sec for connection phase is low value.

[1] http://rpm-software-management.github.io/librepo/lib.html#handle-options
[2] http://rpm-software-management.github.io/librepo/lib.html#handle-info-options

Comment 7 Tomas Mlcoch 2015-10-27 08:57:27 UTC
Kamil, I'll try to emulate a FTP server that accepts SYN but hesitate to return SYN-ACK packets :)

Comment 8 Tomas Mlcoch 2015-11-02 08:47:05 UTC
Today, I had connection issues with source of one repo. I had to wait for connection timeout which occurred, exactly as expected, after 120sec.

Log:
----
08:35:11 Librepo version: 1.7.17 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.40.0 NSS/3.20 Basic ECC zlib/1.2.8 libidn/1.32 libssh2/1.5.0)
...SNIP...
08:38:23 prepare_next_transfer: URL: http://XXX/fedora/22/repodata/repomd.xml
...SNIP...
08:40:23 check_transfer_statuses: Transfer finished: repodata/repomd.xml (Effective url: http://XXX/fedora/22/repodata/repomd.xml)
08:40:23 check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for http://XXX/fedora/22/repodata/repomd.xml [Connection timed out after 120004 milliseconds]
...SNIP...

Comment 9 Kamil Páral 2015-11-02 14:29:11 UTC
(In reply to Tomas Mlcoch from comment #8)
> Today, I had connection issues with source of one repo. I had to wait for
> connection timeout which occurred, exactly as expected, after 120sec.

I've seen the 2 minute timeout many times, it's working as expected. So that's why I was very much surprised of a 10 minute timeout in certain cases, and why I reported this bug.

So yeah, I see the 2 minute timeout in most cases too.

Comment 10 Fedora Update System 2016-04-06 14:38:54 UTC
dnf-plugins-core-0.1.20-1.fc24 dnf-1.1.8-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-5dae5d2add

Comment 11 Fedora Update System 2016-04-07 10:02:17 UTC
dnf-1.1.8-1.fc23 dnf-plugins-core-0.1.20-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-ddeabfcfe6

Comment 12 Fedora Update System 2016-04-07 16:54:29 UTC
dnf-1.1.8-1.fc24, dnf-plugins-core-0.1.20-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-5dae5d2add

Comment 13 Fedora Update System 2016-04-08 16:59:19 UTC
dnf-1.1.8-1.fc24, dnf-plugins-core-0.1.20-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2016-04-08 20:51:38 UTC
dnf-1.1.8-1.fc23, dnf-plugins-core-0.1.20-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-ddeabfcfe6

Comment 15 Fedora Update System 2016-04-13 07:21:45 UTC
dnf-1.1.8-1.fc23, dnf-plugins-core-0.1.20-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.