Bug 1272977 - dnf waits 10 minutes for a single repo (for unresponsive ftp server?)
dnf waits 10 minutes for a single repo (for unresponsive ftp server?)
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dnf (Show other bugs)
23
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: packaging-team-maint
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-19 06:38 EDT by Kamil Páral
Modified: 2016-04-13 03:22 EDT (History)
6 users (show)

See Also:
Fixed In Version: dnf-1.1.8-1.fc24 dnf-1.1.8-1.fc23
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-13 03:22:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dnf.iibrepo.log (595.49 KB, text/plain)
2015-10-19 06:41 EDT, Kamil Páral
no flags Details

  None (edit)
Description Kamil Páral 2015-10-19 06:38:55 EDT
Description of problem:
I have just waited 20 minutes to see output of `dnf update`. See the duration on the right side:

$ sudo dnf update --refresh
RPM Fusion for Fedora 23 - Free                                                                             7.8 MB/s | 738 kB     00:00    
wikitcms repository                                                                                         8.4 kB/s | 7.5 kB     00:00    
RPM Fusion for Fedora 23 - Nonfree                                                                          3.6 MB/s | 218 kB     00:00    
Failed to synchronize cache for repo 'Java_esmska' from 'http://download.opensuse.org/repositories/Java:/esmska/common/': Yum repo downloading error: Downloading error(s): repodata/c416477a40cc4e89d9d636cc3657508c757b50743413faca9cdf3965b2d6fe1c-filelists.xml.gz - Cannot download, all mirrors were already tried without success; repodata/5975ff53b08dcf92ba13145afc939e1e4dc4c4d64439f455edfe9c10e85d1410-primary.xml.gz - Cannot download, all mirrors were already tried without success, disabling.
Blue Jeans Network, Inc. - x86_64 software and updates                                                       29 kB/s |  19 kB     00:00    
Copr repo for taskotron owned by tflink                                                                      18 kB/s |  13 kB     00:00    
RPM Fusion for Fedora 23 - Nonfree - Updates                                                                8.0 kB/s | 421  B     00:00    
firmware for qemu, built by jenkins, fresh from git repos                                                    36 kB/s |  13 kB     00:00    
Phabricator Repsitory                                                                                        89 kB/s | 123 kB     00:01    
Fedora 23 - x86_64 - Test Updates                                                                            33 kB/s |  20 MB     10:12    
RPM Fusion for Fedora 23 - Nonfree - Test Updates                                                           8.5 kB/s | 421  B     00:00    
RPM Fusion for Fedora 23 - Free - Updates                                                                   8.3 kB/s | 415  B     00:00    
Fedora 23 - x86_64                                                                                           63 kB/s |  43 MB     11:34    
RPM Fusion for Fedora 23 - Free - Test Updates                                                              6.8 kB/s | 415  B     00:00    
Last metadata expiration check performed 0:00:00 ago on Mon Oct 19 11:33:57 2015.
Dependencies resolved.


Fedora 23 Test Updates took 10 minutes 12 seconds to refresh, Fedora 23 base repo took 11 minutes 34 seconds to refresh. I can reproduce this consistently, and Jiri Eischmann sitting next to me as well. We're currently at FIT VUT university. Looking into dnf.librepo.log (which is very hard to navigate, see bug 1272963), this seems important:

select_next_target: Selecting mirror for: repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 31): Operation not supported
select_next_target: Selecting mirror for: repodata/62b181b19d390ad566c4a24e50f6ba02fae5274de954dcd85969a8728225ee42-prestodelta.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/62b181b19d390ad566c4a24e50f6ba02fae5274de954dcd85969a8728225ee42-prestodelta.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 32): Operation not supported
select_next_target: Selecting mirror for: repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 33): Operation not supported
lr_download: Downloading started
check_transfer_statuses: Transfer finished: repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz (Effective url: ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz)
check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz []
check_transfer_statuses: Error during transfer: Curl error (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/repodata/c347f34ede9966a309afa15415f61aeb286c7dd8d9721e78db8851d0f7dcac8a-filelists.xml.gz []
sort_mirrors: Mirror ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/ was moved at the end


prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 32): Operation not supported
select_next_target: Selecting mirror for: repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/9e929a44494d45eb1cafdc789c67d293774267fce0b75920070f46e0f88860a1-comps-f23.xml.xz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 33): Operation not supported
select_next_target: Selecting mirror for: repodata/9ffb24b929aa28c5b7ea706a83ebe2f4e68d08771861f887752550ea96857726-primary.xml.gz
prepare_next_transfer: URL: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/9ffb24b929aa28c5b7ea706a83ebe2f4e68d08771861f887752550ea96857726-primary.xml.gz
add_librepo_xattr: Cannot set xattr user.Librepo.DownloadInProgress (fd: 34): Operation not supported
lr_download: Downloading started
check_transfer_statuses: Transfer finished: repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz (Effective url: ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz)
check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz []
check_transfer_statuses: Error during transfer: Curl error (28): Timeout was reached for ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/repodata/971e291172e69ef3aec64a8be68b815a863e159bbea51c3cefbfe22aa6a21536-filelists.xml.gz []
sort_mirrors: Mirror ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/ was moved at the end
check_transfer_statuses: Ignore error - Try another mirror


I have tried opening ftp://mirror.vutbr.cz/fedora/updates/testing/23/x86_64/ and ftp://mirror.vutbr.cz/fedora/development/23/x86_64/os/ manually in the browser, and those addresses are unresponsive - they're in the "connecting" state for a very long time. However, a broken server should not make dnf unusable (yum works fine).


Please consider this:
1. 10 minutes timeout for ftp protocol is way too long
2. dnf is silent and does not inform the user what's going on. Nobody is going to wait 10 minutes in a blinking terminal with no info on a screen. It should be obvious why the user is waiting.
3. It should either be possible to switch to a different server manually (ctrl+c once?), or better yet dnf should be more intelligent when waiting for a timeout. It serves no purpose to wait several minutes for a single repo, then try a different one, and possible wait another several minutes again. When a repo is completely unresponsive for 5-10 seconds, start another connection to a different repo. Have max N such connections in parallel and once one of them works reasonably fast, use it and close the others.

Thank you.

Version-Release number of selected component (if applicable):
dnf-1.1.2-4.fc23.noarch
librepo-1.7.16-2.fc23.x86_64
hawkey-0.6.1-2.fc23.x86_64
libsolv-0.6.11-3.git1f9abfb.fc23.x86_64


How reproducible:
it seems always, from this particular place. probably quite hard from a different place.
Comment 1 Kamil Páral 2015-10-19 06:41 EDT
Created attachment 1084330 [details]
dnf.iibrepo.log

Don't forget to unwrap comments when reading comment 0.

Here is dnf.librepo.log output, but since there are no timestamps, I have no idea which parts are relevant. I tried to pick a part I think is relevant.
Comment 2 Michal Luscon 2015-10-26 08:28:22 EDT
Hi Tomas,

is there any librepo timeout option which DNF should use to prevent this?
Comment 3 Tomas Mlcoch 2015-10-26 11:24:51 EDT
Hi Michal,

there are few. The most important one is LRO_CONNECTTIMEOUT [1]. Another ones are LRO_LOWSPEEDTIME [2] and LRO_LOWSPEEDLIMIT [3].

Default timeout value for connection phase (LRO_CONNECTTIMEOUT) is currently set to 120sec. 10min timeout is a really big number.

From what I see, dnf is using:

current_timeout = h.getinfo(librepo.LRO_CONNECTTIMEOUT)
h.connecttimeout = max(self.timeout, current_timeout)

Maybe the self.timeout in dnf contains a higher value than librepo and thus the higher timeout is used?

An idea, is it possible to set the connection timeout value to dnf manually via dnf.conf? Could be the higher timeout hard-coded in Kamil's and Jiri's configs? (I know this is very unlikely, but worth to check)

Also, if the value could be set by user, it may worth to try to set there manually some small value (e.g. 10sec) and then try to reproduce the issue. Would you be willing to try that once you are at VUT FIT again, Kamil?

[1] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_CONNECTTIMEOUT
[2] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_LOWSPEEDTIME
[3] http://rpm-software-management.github.io/librepo/lib.html#librepo.LRO_LOWSPEEDLIMIT
Comment 4 Michal Luscon 2015-10-26 11:55:41 EDT
The default timeout values currently used by DNF are:

h.lowspeedlimit = 1000 
h.lowspeedtime = 120
h.connecttimeout = max(120, librepo.LRO_CONNECTTIMEOUT) 

librepo.LRO_CONNECTTIMEOUT seems to be set to 18 by default, any idea why?


If I add timeout=1 into my dnf conf, I get:

Error: Failed to synchronize cache for repo 'updates' from 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f22&arch=x86_64': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f22&arch=x86_64 [Connection timed out after 1001 milliseconds]
Comment 5 Kamil Páral 2015-10-26 13:33:44 EDT
(In reply to Tomas Mlcoch from comment #3)
> Could be the higher timeout hard-coded in Kamil's and Jiri's
> configs? (I know this is very unlikely, but worth to check)

Nope :)

> Also, if the value could be set by user, it may worth to try to set there
> manually some small value (e.g. 10sec) and then try to reproduce the issue.
> Would you be willing to try that once you are at VUT FIT again, Kamil?

That's a great suggestion. Unfortunately I don't go there very often, and it seems they already fixed their ftp server (now they work fine). I wonder if there's a way to emulate this? (an ftp server which doesn't respond).
Comment 6 Tomas Mlcoch 2015-10-27 04:55:27 EDT
librepo.LRO_CONNECTTIMEOUT == 18 is ok as it is just a constant for usage in Handle.setopt() method [1].

But there is a problem. Dnf is using:
current_timeout = h.getinfo(librepo.LRO_CONNECTTIMEOUT)

which is not right, "LRO_" prefixed constants should be used only in Handle.setopt(), for Result.getinfo() there are "LRI_" prefixed constants [2].

LRO_ and LRI_ lists weren't designed 1:1 because Result object can contain extra info (from the downloading etc.), for example LRI_MIRRORS which returns list of mirrors built from urls specified by LRO_URLS and urls obtained from metalink and/or mirrorlists.

librepo.LRO_CONNECTTIMEOUT == 18 == librepo.LRI_FASTESTMIRROR so the line in dnf doesn't obtain a timeout set in librepo but value 0 or 1 whether fastest mirror build-in functionality is enabled or disabled (it is disabled default == 0). Currently there is no way in librepo how to get connecttimeout.

Moreover, I think that intended construction:

h.connecttimeout = max(self.timeout, current_timeout)

is incorrect. It means that when a user specifies lower timeout than handle already has, user's timeout is going to be ignored and handle's value is going to be preserved. IMHO, the correct approach should be:

if self.timeout:
    h.connecttimeout = self.timeout

(This probably isn't the case of the 10min timeout, but definitely a bug)

The fail you see when you set timeout to 1sec just means that librepo wasn't able to finish connect phase in the 1sec (1000ms) you have specified, which looks sane, 1sec for connection phase is low value.

[1] http://rpm-software-management.github.io/librepo/lib.html#handle-options
[2] http://rpm-software-management.github.io/librepo/lib.html#handle-info-options
Comment 7 Tomas Mlcoch 2015-10-27 04:57:27 EDT
Kamil, I'll try to emulate a FTP server that accepts SYN but hesitate to return SYN-ACK packets :)
Comment 8 Tomas Mlcoch 2015-11-02 03:47:05 EST
Today, I had connection issues with source of one repo. I had to wait for connection timeout which occurred, exactly as expected, after 120sec.

Log:
----
08:35:11 Librepo version: 1.7.17 with CURL_GLOBAL_ACK_EINTR support (libcurl/7.40.0 NSS/3.20 Basic ECC zlib/1.2.8 libidn/1.32 libssh2/1.5.0)
...SNIP...
08:38:23 prepare_next_transfer: URL: http://XXX/fedora/22/repodata/repomd.xml
...SNIP...
08:40:23 check_transfer_statuses: Transfer finished: repodata/repomd.xml (Effective url: http://XXX/fedora/22/repodata/repomd.xml)
08:40:23 check_finished_transfer_status: Serious error - Curl code (28): Timeout was reached for http://XXX/fedora/22/repodata/repomd.xml [Connection timed out after 120004 milliseconds]
...SNIP...
Comment 9 Kamil Páral 2015-11-02 09:29:11 EST
(In reply to Tomas Mlcoch from comment #8)
> Today, I had connection issues with source of one repo. I had to wait for
> connection timeout which occurred, exactly as expected, after 120sec.

I've seen the 2 minute timeout many times, it's working as expected. So that's why I was very much surprised of a 10 minute timeout in certain cases, and why I reported this bug.

So yeah, I see the 2 minute timeout in most cases too.
Comment 10 Fedora Update System 2016-04-06 10:38:54 EDT
dnf-plugins-core-0.1.20-1.fc24 dnf-1.1.8-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-5dae5d2add
Comment 11 Fedora Update System 2016-04-07 06:02:17 EDT
dnf-1.1.8-1.fc23 dnf-plugins-core-0.1.20-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-ddeabfcfe6
Comment 12 Fedora Update System 2016-04-07 12:54:29 EDT
dnf-1.1.8-1.fc24, dnf-plugins-core-0.1.20-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-5dae5d2add
Comment 13 Fedora Update System 2016-04-08 12:59:19 EDT
dnf-1.1.8-1.fc24, dnf-plugins-core-0.1.20-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.
Comment 14 Fedora Update System 2016-04-08 16:51:38 EDT
dnf-1.1.8-1.fc23, dnf-plugins-core-0.1.20-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-ddeabfcfe6
Comment 15 Fedora Update System 2016-04-13 03:21:45 EDT
dnf-1.1.8-1.fc23, dnf-plugins-core-0.1.20-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.