Our rawhide and fedora30 pre-beta composes have been failing off and on with "Curl error (16): Error in the HTTP2 framing layer". Now also a user has seen this against mirrors.fedoraproject.org: [07:50:15] <bowlofeggs> just saw this from dnf: [07:50:22] <bowlofeggs> Error: Error downloading packages: [07:50:24] <bowlofeggs> Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=updates-testing-f30&arch=x86_64 [] [07:50:45] <bowlofeggs> that was on fedora 30 On composes it always seems to happen on armv7 builds when it happens: https://koji.fedoraproject.org/koji/taskinfo?taskID=33629664 ... DEBUG util.py:439: repo: downloading from remote: koji-override-0 DEBUG util.py:439: error: Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck [] (https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck). DEBUG util.py:439: koji-override-0 335 kB/s | 15 MB 00:45 DEBUG util.py:439: Cannot download 'https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os': Yum repo downloading error: Downloading error(s): repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck - Cannot download, all mirrors were already tried without success. DEBUG util.py:439: Failed to synchronize cache for repo 'koji-override-0' ... I'm not sure what additional info I can get from composes, but if we can duplicate it against mirrors perhaps we can figure out whats going on? I wonder if it's related to GOAWAY after 1000 connections, but that patch was merged and should be in the version in both f30 and rawhide. Also, the curl version in both rawhide and f30 is the same currently. Happy to try and gather more info...
Do we have output of `curl --verbose` in the failing case? Does the server use nghttp2? Which version?
Alas, we do not. ;( Perhaps we can try hitting mirrors in a loop with --verbose until we get a failure? mirrors is our proxy network, which is all fedora 29 hosts running httpd-2.4.38-2.fc29.x86_64 with h2 enabled. kojipkgs (what the composes hit) is those same proxy servers with varnish behind the httpd.
There's some more verbose logs in https://pagure.io/dusty/failed-composes/issue/1703
I think that the "Could not resolve host: kojipkgs.fedoraproject.org" messages appeared there by mistake. I have attempted to fix it in curl-7.64.0-5.fc31 with the following commit: https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=902ddefe
ok. It sounds like jdieter has tracked this to a librepo/zchunk issue now... https://pagure.io/dusty/failed-composes/issue/1703#comment-561822 "I've done some more probing and this is definitely zchunk-related. It seems to be an issue with reusing curl handles when downloading the second part of the file (the data), which should work, but seems to crap out occasionally. There are also some (possibly related) speed issues, so I'll see if I can get a patch ready for librepo within the next day or two." So, perhaps we can just close this? Adding jdieter here for comment...
Ok, I've tracked this down, and it seems to be a combination of two bugs, one identified by Kamil and the other in zchunk. First, while zchunk-1.0.4 is downloading missing chunks, it finds the current chunk as it comes in and checks its digest against the full list of chunk digests in the target file. Needless to say this O(n log n) algorithm is less than efficient and makes the write process slow down the more of the file that gets downloaded. On armv7, this delay in the write callback seems to be hitting a timeout, causing the connection to be aborted. I've just released and built zchunk-1.1.0 which stores the target digest in the internal struct, reducing the lookup time to O(1), drastically reducing CPU usage, and avoiding the timeout. In testing, this dropped the likelihood of hitting the HTTP/2 framing error from roughly 50% to 15%. Further investigation showed that libcurl was complaining about being unable to lookup the hostname, even though it had already successfully done so (see Kamil's message in comment 4). This is a known bug (https://github.com/curl/curl/issues/3629) in libcurl-7.64 that supposedly has no effect on the actual download, but, due to some mix of architecture, timing and the aurora borealis, it seems to be killing the downloads roughly 1 in 24 times (which, when multiplied by three for a compose, means 1 in 8 would fail). When both zchunk-1.1.0 and Kamil's libcurl-7.64.0-5 are installed on the test system, I haven't hit any framing errors in between 50 and 100 attempts. To test, I'd like to request that we re-enable zchunk for Rawhide composes only. The latest libcurl is already there, and zchunk-1.1.0 should be in the next compose, so it should just work.
Thank you for tracking it down! I realized that all the "Expire in ..." messages appear in the verbose output also by mistake. The following commit will fix it: https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=7594f15b
curl-7.64.0-6.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43
zchunk-1.1.0-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8
zchunk-1.1.0-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.
Sadly, we are seeing this again... I don't know if the cause is the same or something new. rawhide-20190713.n.2 failed due to kde live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36235157 ... 20:28:40,671 INF packaging: Downloading packages to /var/tmp/dnf.package.cache. 20:29:57,766 WRN packaging: Failed to download 'krb5-libs-1.17-34.fc31.x86_64.rpm': 3 - Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190713.n.2/compose/Everything/x86_64/os/Packages/k/krb5-libs-1.17-34.fc31.x86_64.rpm [] 20:29:57,767 WRN packaging: Failed to download 'krb5-libs-1.17-34.fc31.x86_64.rpm': 1 - No more mirrors to try - All mirrors were already tried without success 20:32:57,947 INF packaging: setting DNF platform id to: platform:f31 ... rawhide-20190713.n.1 failed due to workstation live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36223053 ... 10:05:47,603 INF packaging: Downloading packages to /var/tmp/dnf.package.cache. 10:06:12,960 WRN packaging: Failed to download 'gnome-color-manager-3.32.0-1.fc30.x86_64.rpm': 3 - Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190713.n.1/compose/Workstation/x86_64/os/Packages/g/gnome-color-manager-3.32.0-1.fc30.x86_64.rpm [] 10:06:12,960 WRN packaging: Failed to download 'gnome-color-manager-3.32.0-1.fc30.x86_64.rpm': 1 - No more mirrors to try - All mirrors were already tried without success ... rawhide-20190711.n.0 failed due to kde live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36191909 ... Happy to open a new bug if you prefer...
(In reply to Kevin Fenzi from comment #17) > Happy to open a new bug if you prefer... That is not necessary but I will need some steps to reproduce this locally.
Sadly I don't have a simple reproducer in hand. ;( I disabled h2 for now on kojipkgs.fedoraproject.org so we could get composes out. Out setup is somewhat complex: dnf runs in a vm/chroot downloading from kojipkgs.fedoraproject.org kojipkgs.fedoraproject.org resolves to 2 proxy ips. Each of those proxies is running apache proxied to haproxy haproxy has 2 backends: kojipkgs01/kojipkgs02 each backend has varnish listening on port 80 and using apache on port 8080 as it's upstream to fetch the files locally from an nfs mount. So a request would go: dnf -> proxy server apache -> haproxy -> varnish -> (possibly apache if not already cached) then back. Is there any way curl could provide more info for this error? More than 'error in framing layer'? Or even a hacked up version we could put on our builders to dump more info when it hits this?
Created attachment 1591614 [details] Patch to turn on verbose logging in librepo I'm heading to bed, otherwise I'd give you more than this, but when I was trying to track down the framing error in the metadata download, I used something like this patch to librepo (untested) to enable verbose curl output.
If you install librepo on the builders with this patch enabled, you should get far more verbose logging.
Kevin, could you please check whether packages from the following copr work any better? https://copr.fedorainfracloud.org/coprs/kdudka/curl-next/ I am thinking about rebase of curl in f30 to the latest upstream release because of bug #1697566.
(In reply to Jonathan Dieter from comment #21) > If you install librepo on the builders with this patch enabled, you should > get far more verbose logging. ok. I might try that this weekend. Perhaps if the new curl doesn't solve it... (In reply to Kamil Dudka from comment #22) > Kevin, could you please check whether packages from the following copr work > any better? > > https://copr.fedorainfracloud.org/coprs/kdudka/curl-next/ > > I am thinking about rebase of curl in f30 to the latest upstream release > because of bug #1697566. Well, these failures are running the rawhide curl, so if you update rawhide, we can try turning h2 back on and see if it works/still has the issue? I see you already have. I can try it next compose...
So, with the new curl so far I have seen none of these failures. ;) It must have been one of the h2 fixes they landed in that version. :) Lets go ahead and close this out for now...will reopen or file new if we ever see it again.
Perfect. Thank you for testing it!
This upstream commit seems related: https://github.com/curl/curl/commit/c1b6a384
Interesting. We did hit more issues and disabled h2 again a while back... Can you let me know when thats in a rawhide build and i can try re-enabling h2...
Could you please try it with curl-7.65.3-4.fc32 ?
We on the Pulp upstream project appear to be randomly running into the same issue when attempting to run `dnf install ...` in `docker build` on top of the fedora 30 container image on our Travis CI: Error: Error downloading packages: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f30&arch=x86_64 [] We'd appreciate curl-7.65.3-4 in updates-testing for stable Fedora releases, so we can easily workaround it.
dist-git commits: https://src.fedoraproject.org/rpms/curl/c/91c50ee6 https://src.fedoraproject.org/rpms/curl/c/03d0b645
FEDORA-2019-3e417d3ec7 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3e417d3ec7
FEDORA-2019-ae52a20ff6 has been submitted as an update to Fedora 30: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ae52a20ff6
curl-7.65.3-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ae52a20ff6
curl-7.65.3-4.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3e417d3ec7
So far no issues on last nights rawhide compose with this version. We did see: https://koji.fedoraproject.org/koji/taskinfo?taskID=37385248 Ostree installer: error: While fetching https://kojipkgs.fedoraproject.org/compose/ostree/repo/objects/ef/2689ea93ad29b992c09b65e25a180168f14911e741932517cdae433752eb31.filez: [18] Transferred a partial file But that could be unrelated...
(In reply to Kevin Fenzi from comment #35) > So far no issues on last nights rawhide compose with this version. Thanks for checking! > We did see: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=37385248 Ostree > installer: > > error: While fetching > https://kojipkgs.fedoraproject.org/compose/ostree/repo/objects/ef/ > 2689ea93ad29b992c09b65e25a180168f14911e741932517cdae433752eb31.filez: [18] > Transferred a partial file > > But that could be unrelated... I am not sure whether it is related to the original issue but it is definitely _not_ related to the fix mentioned in comment #26.
curl-7.65.3-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
FEDORA-2019-6d7f6fa2c8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-6d7f6fa2c8
curl-7.66.0-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-6d7f6fa2c8
Happened again on copr (staging) builder: # rpm -q libcurl curl librepo rpm dnf libcurl-7.65.3-4.fc30.x86_64 curl-7.65.3-2.fc30.x86_64 librepo-1.10.5-1.fc30.x86_64 rpm-4.14.2.1-5.fc30.x86_64 dnf-4.2.8-2.fc30.noarch builder-live.log: Installed size: 265 M Downloading Packages: fedora 0% [ ] 9.5 MB/s | 0 B 00:06 ETA Error: Error downloading packages: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] dnf.log: 2019-09-18T08:01:20Z INFO Installed size: 265 M 2019-09-18T08:01:20Z INFO Downloading Packages: 2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64). 2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64). 2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64). 2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64). 2019-09-18T08:01:20Z DDEBUG Cleaning up. 2019-09-18T08:01:20Z SUBDEBUG
Created attachment 1616116 [details] Backed up copr result directory The build info is here: https://copr-be-dev.cloud.fedoraproject.org/results/praiskup/ping/fedora-rawhide-x86_64/00846335-test-package/ It will be deleted soon, so I'm rather attaching it here.
Thanks for sharing! Unfortunately, it is not clear what happened in this case. We would need at least libcurl's verbose output to begin with.
curl-7.66.0-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.