Created attachment 1110722 [details] strace of dnf -y downgrade with hang Description of problem: Frequently I find dnf commands just stall out without any error message or reason what the wait is about. I'll cancel, and reissue the command and more often than not it will just proceed normally. Version-Release number of selected component (if applicable): dnf-1.1.5-1.fc23.noarch hawkey-0.6.2-3.fc23.x86_64 How reproducible: Often. Steps to Reproduce: 1. [chris@f23m /]$ sudo dnf -vy downgrade https://kojipkgs.fedoraproject.org//packages/firefox/42.0/5.fc23/x86_64/firefox-42.0-5.fc23.x86_64.rpm Actual results: cachedir: /var/cache/dnf Loaded plugins: builddep, copr, download, playground, needs-restarting, Query, noroot, protected_packages, debuginfo-install, config-manager, generate_completion_cache, reposync DNF version: 1.1.5 repo: using cache for: updates-testing reviving: 'updates' can be revived. repo: using cache for: fedora not found deltainfo for: Fedora 23 - x86_64 not found updateinfo for: Fedora 23 - x86_64 repo: using cache for: hadrons123-thermald not found deltainfo for: Copr repo for thermald owned by hadrons123 not found updateinfo for: Copr repo for thermald owned by hadrons123 updates-testing: using metadata from Wed Dec 30 11:56:11 2015. updates: using metadata from Wed Dec 30 17:19:20 2015. fedora: using metadata from Sat Oct 31 11:34:41 2015. hadrons123-thermald: using metadata from Fri Dec 4 21:13:01 2015. Last metadata expiration check performed 0:00:00 ago on Thu Dec 31 02:10:09 2015. Terminated. Terminated. Terminated message appears over 5 minutes after the silent hang starts, but dnf hasn't actually terminated. There's still no prompt, and ps -ef still shows it's running. Except it hasn't actually terminated, it's still hanging. Expected results: If it wants something that isn't available pretty much right away, it needs to state what it's hanging on rather than silently doing nothing; and then eventually it just needs to fail. Even a crash is better than this. Additional info:
Thanks for report. We will try to change that dnf will have same behavior independently of source of packages. *** This bug has been marked as a duplicate of bug 1161950 ***
I think this is marked duplicate incorrectly. This is some kind of problem where dnf/curl keeps waiting for a misbehaving mirror much longer than it should. If I leave it long enough I usually get messages like this: [MIRROR] glibc-devel-2.22-7.fc23.x86_64.rpm: Curl error (8): FTP: weird server reply for ftp://mirrors.syringanetworks.net/fedora/updates/23/x86_64/g/glibc-devel-2.22-7.fc23.x86_64.rpm [Connection time-out] (7/8): glibc-devel-2.22-7.fc23.x86_64.rpm 7.4 kB/s | 910 kB 02:02 [MIRROR] glibc-headers-2.22-7.fc23.x86_64.rpm: Curl error (8): FTP: weird server reply for ftp://mirrors.syringanetworks.net/fedora/updates/23/x86_64/g/glibc-headers-2.22-7.fc23.x86_64.rpm [Connection time-out] (8/8): glibc-headers-2.22-7.fc23.x86_64.rpm But that's at least 2 minutes hang time before it recovers. I think if it's not getting a response from a mirror in like 5 seconds it should give up on that mirror and try another one. I'm getting these hangs more than 1/2 the time I use dnf these days, it didn't use to be this bad. And it's a different mirror every time.
Hi, did you try to tweak your minrate and timeout conf. options?
No, the /etc/dnf/dnf.conf is default in one case, and for the other case the only change is installonly_limit=10.
Could you please do so and inform us about the result?
minrate=500k timeout=8 This produces sane results so far. This sort of tweaking isn't something I've had to do with dnf prior to Fedora 23, I've been testing it for at least a couple of years; and definitely no tweaking needed on any other OS platform to get similar sane results. So the defaults need to more aggressively fail or it needs to be dynamic and look at the "typical" rate and disqualify rates that are, e.g. 20% of that rate or less. The bandwidth I have is ~6Mbps, and what's apparently happening is the default minrate of 1000 bytes per second is just ridiculously too low in comparison.
From today's journal, same "weird server reply" messages. This is not from user initiated dnf, but makecache timer so it could be a different problem. Jan 26 13:00:52 f23m.localdomain systemd[1]: Started dnf makecache timer. Jan 26 13:00:52 f23m.localdomain systemd[1]: Starting dnf makecache timer. Jan 26 13:10:46 f23m.localdomain systemd[1]: Starting dnf makecache... Jan 26 13:10:47 f23m.localdomain dnf[3965]: cachedir: /var/cache/dnf Jan 26 13:10:47 f23m.localdomain dnf[3965]: Loaded plugins: protected_packages, reposync, download, builddep, playground, config-manager, copr, Query, debuginfo-install, generate_completion_cache, noroot, needs-restarting Jan 26 13:10:47 f23m.localdomain dnf[3965]: DNF version: 1.1.5 Jan 26 13:10:47 f23m.localdomain dnf[3965]: Making cache files for all metadata files. Jan 26 13:10:47 f23m.localdomain dnf[3965]: updates: has expired and will be refreshed. Jan 26 13:10:47 f23m.localdomain dnf[3965]: fedora: will expire after 1656750 seconds. Jan 26 13:10:47 f23m.localdomain dnf[3965]: hadrons123-thermald: will expire after 78429 seconds. Jan 26 13:10:48 f23m.localdomain dnf[3965]: reviving: failed for 'updates', mismatched sha256 sum. Jan 26 13:10:56 f23m.localdomain dnf[3965]: error: Curl error (8): FTP: weird server reply for ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml [Connection time-out] (ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml). Jan 26 13:10:56 f23m.localdomain dnf[3965]: Error: Failed to synchronize cache for repo 'updates' from 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f23&arch=x86_64': Cannot download repomd.xml: Curl error (8): FTP: weird server reply for ftp://linux.mirrors.es.net/fedora/linux/updates/23/x86_64/repodata/repomd.xml [Connection time-out] Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Main process exited, code=exited, status=1/FAILURE Jan 26 13:10:56 f23m.localdomain systemd[1]: Failed to start dnf makecache. Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Unit entered failed state. Jan 26 13:10:56 f23m.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Jan 26 13:10:56 f23m.localdomain systemd[1]: dnf-makecache.service: Failed with result 'exit-code'. Really it seems like possibly not directly dnf's fault, but rather there just needs to be a better way to fallback to other mirrors when one is misbehaving or slow, rather than just failing, then waiting until the next makecache timer is hit.
We will try to improve error messages from server in this bug report. The progress bar for remote url download should be there too (bug 1161950).
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
It seems to be connected to FTP EPSV mode (extended passive mode) which has been fixed in librepo. We can't reproduce it and assume this bug has been fixed. Fell free to reopen if you still hit the issue.