Bug 1690971 - Curl error (16): Error in the HTTP2 framing layer against fedora servers
Summary: Curl error (16): Error in the HTTP2 framing layer against fedora servers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: curl
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kamil Dudka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-20 15:23 UTC by Kevin Fenzi
Modified: 2019-09-21 00:01 UTC (History)
11 users (show)

Fixed In Version: curl-7.65.3-4.fc32 curl-7.65.3-3.fc30 curl-7.66.0-1.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-05 11:48:10 UTC


Attachments (Terms of Use)
Patch to turn on verbose logging in librepo (423 bytes, patch)
2019-07-17 22:17 UTC, Jonathan Dieter
no flags Details | Diff
Backed up copr result directory (15.13 KB, application/x-xz)
2019-09-18 08:09 UTC, Pavel Raiskup
no flags Details

Description Kevin Fenzi 2019-03-20 15:23:38 UTC
Our rawhide and fedora30 pre-beta composes have been failing off and on with "Curl error (16): Error in the HTTP2 framing layer". Now also a user has seen this against mirrors.fedoraproject.org: 

[07:50:15] <bowlofeggs> just saw this from dnf:
[07:50:22] <bowlofeggs> Error: Error downloading packages:
[07:50:24] <bowlofeggs>   Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=updates-testing-f30&arch=x86_64 []
[07:50:45] <bowlofeggs> that was on fedora 30

On composes it always seems to happen on armv7 builds when it happens:

https://koji.fedoraproject.org/koji/taskinfo?taskID=33629664
...
DEBUG util.py:439:  repo: downloading from remote: koji-override-0
DEBUG util.py:439:  error: Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck [] (https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os/repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck).
DEBUG util.py:439:  koji-override-0                                 335 kB/s |  15 MB     00:45    
DEBUG util.py:439:  Cannot download 'https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190319.n.0/compose/Everything/armhfp/os': Yum repo downloading error: Downloading error(s): repodata/6d2f4c287cfe95c8352e4f196b73775aa4adbd972be26b770363cb674305f03f-filelists.xml.zck - Cannot download, all mirrors were already tried without success.
DEBUG util.py:439:  Failed to synchronize cache for repo 'koji-override-0'
...


I'm not sure what additional info I can get from composes, but if we can duplicate it against mirrors perhaps we can figure out whats going on?

I wonder if it's related to GOAWAY after 1000 connections, but that patch was merged and should be in the version in both f30 and rawhide. Also, the curl version in both rawhide and f30 is the same currently. 

Happy to try and gather more info...

Comment 1 Kamil Dudka 2019-03-20 16:53:53 UTC
Do we have output of `curl --verbose` in the failing case?

Does the server use nghttp2?  Which version?

Comment 2 Kevin Fenzi 2019-03-20 17:12:11 UTC
Alas, we do not. ;( Perhaps we can try hitting mirrors in a loop with --verbose until we get a failure?

mirrors is our proxy network, which is all fedora 29 hosts running httpd-2.4.38-2.fc29.x86_64 with h2 enabled. 
kojipkgs (what the composes hit) is those same proxy servers with varnish behind the httpd.

Comment 3 Kevin Fenzi 2019-03-20 21:41:39 UTC
There's some more verbose logs in https://pagure.io/dusty/failed-composes/issue/1703

Comment 4 Kamil Dudka 2019-03-21 08:49:59 UTC
I think that the "Could not resolve host: kojipkgs.fedoraproject.org" messages appeared there by mistake.  I have attempted to fix it in curl-7.64.0-5.fc31 with the following commit:

https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=902ddefe

Comment 5 Kevin Fenzi 2019-03-21 16:29:09 UTC
ok. It sounds like jdieter has tracked this to a librepo/zchunk issue now... 

https://pagure.io/dusty/failed-composes/issue/1703#comment-561822

"I've done some more probing and this is definitely zchunk-related. It seems to be an issue with reusing curl handles when downloading the second part of the file (the data), which should work, but seems to crap out occasionally. There are also some (possibly related) speed issues, so I'll see if I can get a patch ready for librepo within the next day or two."

So, perhaps we can just close this? Adding jdieter here for comment...

Comment 6 Jonathan Dieter 2019-03-23 23:13:17 UTC
Ok, I've tracked this down, and it seems to be a combination of two bugs, one identified by Kamil and the other in zchunk.

First, while zchunk-1.0.4 is downloading missing chunks, it finds the current chunk as it comes in and checks its digest against the full list of chunk digests in the target file.  Needless to say this O(n log n) algorithm is less than efficient and makes the write process slow down the more of the file that gets downloaded.  On armv7, this delay in the write callback seems to be hitting a timeout, causing the connection to be aborted.  I've just released and built zchunk-1.1.0 which stores the target digest in the internal struct, reducing the lookup time to O(1), drastically reducing CPU usage, and avoiding the timeout.

In testing, this dropped the likelihood of hitting the HTTP/2 framing error from roughly 50% to 15%.  Further investigation showed that libcurl was complaining about being unable to lookup the hostname, even though it had already successfully done so (see Kamil's message in comment 4).  This is a known bug (https://github.com/curl/curl/issues/3629) in libcurl-7.64 that supposedly has no effect on the actual download, but, due to some mix of architecture, timing and the aurora borealis, it seems to be killing the downloads roughly 1 in 24 times (which, when multiplied by three for a compose, means 1 in 8 would fail).

When both zchunk-1.1.0 and Kamil's libcurl-7.64.0-5 are installed on the test system, I haven't hit any framing errors in between 50 and 100 attempts.

To test, I'd like to request that we re-enable zchunk for Rawhide composes only.  The latest libcurl is already there, and zchunk-1.1.0 should be in the next compose, so it should just work.

Comment 7 Kamil Dudka 2019-03-25 11:41:45 UTC
Thank you for tracking it down!  I realized that all the "Expire in ..." messages appear in the verbose output also by mistake.  The following commit will fix it:

https://src.fedoraproject.org/cgit/rpms/curl.git/commit/?id=7594f15b

Comment 8 Fedora Update System 2019-03-25 12:53:18 UTC
curl-7.64.0-6.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43

Comment 9 Fedora Update System 2019-03-25 18:54:37 UTC
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-530dc1ff43

Comment 10 Fedora Update System 2019-03-25 21:28:45 UTC
zchunk-1.1.0-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8

Comment 11 Fedora Update System 2019-03-25 21:28:54 UTC
zchunk-1.1.0-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8

Comment 12 Fedora Update System 2019-03-27 00:45:07 UTC
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-c83e45fbc8

Comment 13 Fedora Update System 2019-03-27 04:34:23 UTC
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-291b0ea9e8

Comment 14 Fedora Update System 2019-03-29 19:20:32 UTC
curl-7.64.0-6.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2019-03-29 19:20:41 UTC
zchunk-1.1.0-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 16 Fedora Update System 2019-04-03 03:31:10 UTC
zchunk-1.1.0-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Kevin Fenzi 2019-07-13 22:49:05 UTC
Sadly, we are seeing this again... I don't know if the cause is the same or something new. 

rawhide-20190713.n.2 failed due to kde live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36235157

...

20:28:40,671 INF packaging: Downloading packages to /var/tmp/dnf.package.cache.
20:29:57,766 WRN packaging: Failed to download 'krb5-libs-1.17-34.fc31.x86_64.rpm': 3 - Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190713.n.2/compose/Everything/x86_64/os/Packages/k/krb5-libs-1.17-34.fc31.x86_64.rpm []
20:29:57,767 WRN packaging: Failed to download 'krb5-libs-1.17-34.fc31.x86_64.rpm': 1 - No more mirrors to try - All mirrors were already tried without success
20:32:57,947 INF packaging: setting DNF platform id to: platform:f31
...

rawhide-20190713.n.1 failed due to workstation live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36223053

...
10:05:47,603 INF packaging: Downloading packages to /var/tmp/dnf.package.cache.
10:06:12,960 WRN packaging: Failed to download 'gnome-color-manager-3.32.0-1.fc30.x86_64.rpm': 3 - Curl error (16): Error in the HTTP2 framing layer for https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190713.n.1/compose/Workstation/x86_64/os/Packages/g/gnome-color-manager-3.32.0-1.fc30.x86_64.rpm []
10:06:12,960 WRN packaging: Failed to download 'gnome-color-manager-3.32.0-1.fc30.x86_64.rpm': 1 - No more mirrors to try - All mirrors were already tried without success
...

rawhide-20190711.n.0 failed due to kde live failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=36191909

...

Happy to open a new bug if you prefer...

Comment 18 Kamil Dudka 2019-07-15 07:47:56 UTC
(In reply to Kevin Fenzi from comment #17)
> Happy to open a new bug if you prefer...

That is not necessary but I will need some steps to reproduce this locally.

Comment 19 Kevin Fenzi 2019-07-17 22:07:59 UTC
Sadly I don't have a simple reproducer in hand. ;( 

I disabled h2 for now on kojipkgs.fedoraproject.org so we could get composes out. 

Out setup is somewhat complex: 

dnf runs in a vm/chroot downloading from kojipkgs.fedoraproject.org
kojipkgs.fedoraproject.org resolves to 2 proxy ips. 
Each of those proxies is running apache proxied to haproxy 
haproxy has 2 backends: kojipkgs01/kojipkgs02
each backend has varnish listening on port 80 and using apache on port 8080 as it's upstream to fetch the files locally from an nfs mount.

So a request would go: 
dnf -> proxy server apache -> haproxy -> varnish -> (possibly apache if not already cached)
then back.

Is there any way curl could provide more info for this error? More than 'error in framing layer'? Or even a hacked up version we could put on our builders to dump more info when it hits this?

Comment 20 Jonathan Dieter 2019-07-17 22:17:22 UTC
Created attachment 1591614 [details]
Patch to turn on verbose logging in librepo

I'm heading to bed, otherwise I'd give you more than this, but when I was trying to track down the framing error in the metadata download, I used something like this patch to librepo (untested) to enable verbose curl output.

Comment 21 Jonathan Dieter 2019-07-17 22:19:26 UTC
If you install librepo on the builders with this patch enabled, you should get far more verbose logging.

Comment 22 Kamil Dudka 2019-07-18 07:57:12 UTC
Kevin, could you please check whether packages from the following copr work any better?

    https://copr.fedorainfracloud.org/coprs/kdudka/curl-next/

I am thinking about rebase of curl in f30 to the latest upstream release because of bug #1697566.

Comment 23 Kevin Fenzi 2019-07-18 16:00:25 UTC
(In reply to Jonathan Dieter from comment #21)
> If you install librepo on the builders with this patch enabled, you should
> get far more verbose logging.

ok. I might try that this weekend. Perhaps if the new curl doesn't solve it...

(In reply to Kamil Dudka from comment #22)
> Kevin, could you please check whether packages from the following copr work
> any better?
> 
>     https://copr.fedorainfracloud.org/coprs/kdudka/curl-next/
> 
> I am thinking about rebase of curl in f30 to the latest upstream release
> because of bug #1697566.

Well, these failures are running the rawhide curl, so if you update rawhide, we can try turning h2 back on and see if it works/still has the issue?
I see you already have. I can try it next compose...

Comment 24 Kevin Fenzi 2019-07-20 01:14:33 UTC
So, with the new curl so far I have seen none of these failures. ;) It must have been one of the h2 fixes they landed in that version. :) 

Lets go ahead and close this out for now...will reopen or file new if we ever see it again.

Comment 25 Kamil Dudka 2019-07-21 16:46:45 UTC
Perfect.  Thank you for testing it!

Comment 26 Kamil Dudka 2019-08-27 07:27:07 UTC
This upstream commit seems related:

https://github.com/curl/curl/commit/c1b6a384

Comment 27 Kevin Fenzi 2019-08-27 13:42:42 UTC
Interesting. 

We did hit more issues and disabled h2 again a while back... 

Can you let me know when thats in a rawhide build and i can try re-enabling h2...

Comment 28 Kamil Dudka 2019-08-27 16:49:03 UTC
Could you please try it with curl-7.65.3-4.fc32 ?

Comment 29 Michael DePaulo 2019-08-29 21:48:50 UTC
We on the Pulp upstream project appear to be randomly running into the same issue when attempting to run `dnf install ...` in `docker build` on top of the fedora 30 container image on our Travis CI:

Error: Error downloading packages:
      Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f30&arch=x86_64 []

We'd appreciate curl-7.65.3-4 in updates-testing for stable Fedora releases, so we can easily workaround it.

Comment 31 Fedora Update System 2019-08-30 12:52:33 UTC
FEDORA-2019-3e417d3ec7 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3e417d3ec7

Comment 32 Kamil Dudka 2019-08-30 12:55:18 UTC
FEDORA-2019-ae52a20ff6 has been submitted as an update to Fedora 30: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ae52a20ff6

Comment 33 Fedora Update System 2019-08-30 15:28:13 UTC
curl-7.65.3-3.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ae52a20ff6

Comment 34 Fedora Update System 2019-08-30 20:33:01 UTC
curl-7.65.3-4.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3e417d3ec7

Comment 35 Kevin Fenzi 2019-08-31 18:17:42 UTC
So far no issues on last nights rawhide compose with this version. 

We did see: 

https://koji.fedoraproject.org/koji/taskinfo?taskID=37385248 Ostree installer:

error: While fetching https://kojipkgs.fedoraproject.org/compose/ostree/repo/objects/ef/2689ea93ad29b992c09b65e25a180168f14911e741932517cdae433752eb31.filez: [18] Transferred a partial file

But that could be unrelated...

Comment 36 Kamil Dudka 2019-09-02 10:26:14 UTC
(In reply to Kevin Fenzi from comment #35)
> So far no issues on last nights rawhide compose with this version. 

Thanks for checking!

> We did see: 
> 
> https://koji.fedoraproject.org/koji/taskinfo?taskID=37385248 Ostree
> installer:
> 
> error: While fetching
> https://kojipkgs.fedoraproject.org/compose/ostree/repo/objects/ef/
> 2689ea93ad29b992c09b65e25a180168f14911e741932517cdae433752eb31.filez: [18]
> Transferred a partial file
> 
> But that could be unrelated...

I am not sure whether it is related to the original issue but it is definitely _not_ related to the fix mentioned in comment #26.

Comment 37 Fedora Update System 2019-09-05 11:48:10 UTC
curl-7.65.3-3.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 38 Fedora Update System 2019-09-13 09:16:26 UTC
FEDORA-2019-6d7f6fa2c8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-6d7f6fa2c8

Comment 39 Fedora Update System 2019-09-14 01:40:23 UTC
curl-7.66.0-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-6d7f6fa2c8

Comment 40 Pavel Raiskup 2019-09-18 08:05:15 UTC
Happened again on copr (staging) builder:

# rpm -q libcurl curl librepo rpm dnf
libcurl-7.65.3-4.fc30.x86_64
curl-7.65.3-2.fc30.x86_64
librepo-1.10.5-1.fc30.x86_64
rpm-4.14.2.1-5.fc30.x86_64
dnf-4.2.8-2.fc30.noarch

builder-live.log:
Installed size: 265 M
Downloading Packages:
fedora                0% [                    ] 9.5 MB/s |   0  B     00:06 ETA
Error: Error downloading packages:
  Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 []

dnf.log:
2019-09-18T08:01:20Z INFO Installed size: 265 M
2019-09-18T08:01:20Z INFO Downloading Packages:
2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64).
2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64).
2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64).
2019-09-18T08:01:20Z DEBUG error: Curl error (16): Error in the HTTP2 framing layer for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [] (https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64).
2019-09-18T08:01:20Z DDEBUG Cleaning up.
2019-09-18T08:01:20Z SUBDEBUG

Comment 41 Pavel Raiskup 2019-09-18 08:09:19 UTC
Created attachment 1616116 [details]
Backed up copr result directory

The build info is here:
https://copr-be-dev.cloud.fedoraproject.org/results/praiskup/ping/fedora-rawhide-x86_64/00846335-test-package/

It will be deleted soon, so I'm rather attaching it here.

Comment 42 Kamil Dudka 2019-09-18 08:37:58 UTC
Thanks for sharing!  Unfortunately, it is not clear what happened in this case.  We would need at least libcurl's verbose output to begin with.

Comment 43 Fedora Update System 2019-09-21 00:01:48 UTC
curl-7.66.0-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.