Bug 1294953 - needs better graceful failures when 'Curl error: weird server reply'
Summary: needs better graceful failures when 'Curl error: weird server reply'
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: dnf
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-31 09:20 UTC by Chris Murphy
Modified: 2016-11-30 09:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-30 09:54:20 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
strace of dnf -y downgrade with hang (6.54 MB, text/plain)
2015-12-31 09:20 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2015-12-31 09:20:13 UTC
Created attachment 1110722 [details]
strace of dnf -y downgrade with hang

Description of problem:

Frequently I find dnf commands just stall out without any error message or reason what the wait is about. I'll cancel, and reissue the command and more often than not it will just proceed normally.


Version-Release number of selected component (if applicable):
dnf-1.1.5-1.fc23.noarch
hawkey-0.6.2-3.fc23.x86_64


How reproducible:
Often.


Steps to Reproduce:
1. [chris@f23m /]$ sudo dnf -vy downgrade https://kojipkgs.fedoraproject.org//packages/firefox/42.0/5.fc23/x86_64/firefox-42.0-5.fc23.x86_64.rpm


Actual results:

cachedir: /var/cache/dnf
Loaded plugins: builddep, copr, download, playground, needs-restarting, Query, noroot, protected_packages, debuginfo-install, config-manager, generate_completion_cache, reposync
DNF version: 1.1.5
repo: using cache for: updates-testing
reviving: 'updates' can be revived.
repo: using cache for: fedora
not found deltainfo for: Fedora 23 - x86_64
not found updateinfo for: Fedora 23 - x86_64
repo: using cache for: hadrons123-thermald
not found deltainfo for: Copr repo for thermald owned by hadrons123
not found updateinfo for: Copr repo for thermald owned by hadrons123
updates-testing: using metadata from Wed Dec 30 11:56:11 2015.
updates: using metadata from Wed Dec 30 17:19:20 2015.
fedora: using metadata from Sat Oct 31 11:34:41 2015.
hadrons123-thermald: using metadata from Fri Dec  4 21:13:01 2015.
Last metadata expiration check performed 0:00:00 ago on Thu Dec 31 02:10:09 2015.
Terminated.
Terminated.

Terminated message appears over 5 minutes after the silent hang starts, but dnf hasn't actually terminated. There's still no prompt, and ps -ef still shows it's running.

Except it hasn't actually terminated, it's still hanging.




Expected results:

If it wants something that isn't available pretty much right away, it needs to state what it's hanging on rather than silently doing nothing; and then eventually it just needs to fail. Even a crash is better than this.


Additional info:

Comment 1 Jaroslav Mracek 2016-01-11 12:33:20 UTC
Thanks for report. We will try to change that dnf will have same behavior independently of source of packages.

*** This bug has been marked as a duplicate of bug 1161950 ***

Comment 2 Chris Murphy 2016-01-12 23:47:31 UTC
I think this is marked duplicate incorrectly. This is some kind of problem where dnf/curl keeps waiting for a misbehaving mirror much longer than it should. If I leave it long enough I usually get messages like this:

[MIRROR] glibc-devel-2.22-7.fc23.x86_64.rpm: Curl error (8): FTP: weird server reply for ftp://mirrors.syringanetworks.net/fedora/updates/23/x86_64/g/glibc-devel-2.22-7.fc23.x86_64.rpm [Connection time-out]    
(7/8): glibc-devel-2.22-7.fc23.x86_64.rpm                                                                                                                                          7.4 kB/s | 910 kB     02:02    
[MIRROR] glibc-headers-2.22-7.fc23.x86_64.rpm: Curl error (8): FTP: weird server reply for ftp://mirrors.syringanetworks.net/fedora/updates/23/x86_64/g/glibc-headers-2.22-7.fc23.x86_64.rpm [Connection time-out]
(8/8): glibc-headers-2.22-7.fc23.x86_64.rpm   

But that's at least 2 minutes hang time before it recovers. I think if it's not getting a response from a mirror in like 5 seconds it should give up on that mirror and try another one. I'm getting these hangs more than 1/2 the time I use dnf these days, it didn't use to be this bad. And it's a different mirror every time.

Comment 3 Michal Luscon 2016-01-19 13:54:31 UTC
Hi, did you try to tweak your minrate and timeout conf. options?

Comment 4 Chris Murphy 2016-01-19 19:50:39 UTC
No, the /etc/dnf/dnf.conf is default in one case, and for the other case the only change is installonly_limit=10.

Comment 5 Michal Luscon 2016-01-25 12:17:26 UTC
Could you please do so and inform us about the result?

Comment 6 Chris Murphy 2016-01-26 05:00:30 UTC
minrate=500k
timeout=8

This produces sane results so far. This sort of tweaking isn't something I've had to do with dnf prior to Fedora 23, I've been testing it for at least a couple of years; and definitely no tweaking needed on any other OS platform to get similar sane results. So the defaults need to more aggressively fail or it needs to be dynamic and look at the "typical" rate and disqualify rates that are, e.g. 20% of that rate or less. The bandwidth I have is ~6Mbps, and what's apparently happening is the default minrate of 1000 bytes per second is just ridiculously too low in comparison.

Comment 7 Chris Murphy 2016-01-26 20:15:55 UTC
From today's journal, same "weird server reply" messages. This is not from user initiated dnf, but makecache timer so it could be a different problem.

Jan 26 13:00:52 f23m.localdomain systemd[1]: Started dnf makecache timer.
Jan 26 13:00:52 f23m.localdomain systemd[1]: Starting dnf makecache timer.
Jan 26 13:10:46 f23m.localdomain systemd[1]: Starting dnf makecache...
Jan 26 13:10:47 f23m.localdomain dnf[3965]: cachedir: /var/cache/dnf
Jan 26 13:10:47 f23m.localdomain dnf[3965]: Loaded plugins: protected_packages, reposync, download, builddep, playground, config-manager, copr, Query, debuginfo-install, generate_completion_cache, noroot, needs-restarting
Jan 26 13:10:47 f23m.localdomain dnf[3965]: DNF version: 1.1.5
Jan 26 13:10:47 f23m.localdomain dnf[3965]: Making cache files for all metadata files.
Jan 26 13:10:47 f23m.localdomain dnf[3965]: updates: has expired and will be refreshed.
Jan 26 13:10:47 f23m.localdomain dnf[3965]: fedora: will expire after 1656750 seconds.
Jan 26 13:10:47 f23m.localdomain dnf[3965]: hadrons123-thermald: will expire after 78429 seconds.
Jan 26 13:10:48 f23m.localdomain dnf[3965]: reviving: failed for 'updates', mismatched sha256 sum.
Jan 26 13:10:56 f23m.localdomain dnf[3965]: error: Curl error (8): FTP: weird server reply for ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml [Connection time-out] (ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml).
Jan 26 13:10:56 f23m.localdomain dnf[3965]: Error: Failed to synchronize cache for repo 'updates' from 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f23&arch=x86_64': Cannot download repomd.xml: Curl error (8): FTP: weird server reply for ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml [Connection time-out]
Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Main process exited, code=exited, status=1/FAILURE
Jan 26 13:10:56 f23m.localdomain systemd[1]: Failed to start dnf makecache.
Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Unit entered failed state.
Jan 26 13:10:56 f23m.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Failed with result 'exit-code'.


Really it seems like possibly not directly dnf's fault, but rather there just needs to be a better way to fallback to other mirrors when one is misbehaving or slow, rather than just failing, then waiting until the next makecache timer is hit.

Comment 8 Honza Silhan 2016-02-01 12:10:26 UTC
We will try to improve error messages from server in this bug report. The progress bar for remote url download should be there too (bug 1161950).

Comment 9 Fedora Admin XMLRPC Client 2016-07-08 09:35:18 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Fedora End Of Life 2016-11-24 14:37:14 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Michael Mráka 2016-11-30 09:54:20 UTC
It seems to be connected to FTP EPSV mode (extended passive mode) which has been fixed in librepo.

We can't reproduce it and assume this bug has been fixed.
Fell free to reopen if you still hit the issue.


Note You need to log in before you can comment on or make changes to this bug.