Bug 2166254

Summary: curl fails large file downloads for some http2 server
Product: Red Hat Enterprise Linux 8 Reporter: Riccardo Piccoli <rpiccoli>
Component: curlAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Daniel Rusek <drusek>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.6CC: brault, kdudka, oourfali
Target Milestone: rcKeywords: Patch, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: curl-7.61.1-29.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2167825 2167826 (view as bug list) Environment:
Last Closed: 2023-05-16 09:03:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2167825, 2167826    

Description Riccardo Piccoli 2023-02-01 10:11:37 UTC
Description of problem:

Curl < 7.69.0 seems to be affected by a strange behaviour when trying to download large files with http2 protocol from api.openshift.com: file download will stop exactly at 1024*1024*1024 bytes . All other applications tried (wget, browsers, etc) are not affected by this behaviour.

Looking through the 7.69 changelog, we can see some changes about handling of http flow have been introduced that might have affected the behaviour https://github.com/curl/curl/issues/4779 https://github.com/curl/curl/pull/4961 https://github.com/curl/curl/issues/4939

Performing the same operation forcing http1.1 protocol leads to successfully downloading the whole file, and the same is true for curl 7.69 or later.

Unfortunately we cannot control curl usage in dracut modules, so we would like to upgrade to a version that is not affected by this behaviour.

https://github.com/coreos/fedora-coreos-config/blob/0a84651cb79e79c4ea0a846eb32865c430ac2011/overlay.d/05core/usr/lib/dracut/modules.d/35coreos-live/coreos-livepxe-rootfs.sh#L23-L47


Version-Release number of selected component (if applicable):



How reproducible:
100% of the times

Steps to Reproduce:
1. Try the download with curl 7.68.0: docker run --rm curlimages/curl:7.68.0 --output /dev/null --limit-rate 10M 'https://api.openshift.com/api/assisted-images/boot-artifacts/rootfs?arch=x86_64&version=4.12'

Actual results:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 97 1052M   97 1024M    0     0  3001k      0  0:05:59  0:05:49  0:00:10 2693k
curl: (18) transfer closed with 29821952 bytes remaining to read



Expected results:

1. Run: docker run --rm curlimages/curl:7.69.0 --output /dev/null --limit-rate 10M 'https://api.openshift.com/api/assisted-images/boot-artifacts/rootfs?arch=x86_64&version=4.12'

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1052M  100 1052M    0     0  2811k      0  0:06:23  0:06:23 --:--:-- 5886k


Additional info:
This issue seem to happen only with the following parameters:
* curl < 7.69
* low bandwidth (10MBps or lower)
* http2 protocol
* api.openshift.com server (i.e. it works on mirror.openshift.com)
* file bigger than 1024*1024*1024 bytes

OCP Customers are affected by this bug: when installing openshift with Assisted Installer with option "minimal-iso", dracut will try to download the remaining part of the image with curl, and it would fail when the above conditions are met.

Also on-premises installations might be affected, but we have no way to monitor that.

Comment 1 Kamil Dudka 2023-02-03 16:47:59 UTC
Thank you for reporting it!  git-bisect points to the following commit:
https://github.com/curl/curl/commit/15f51474c837679c0b79825c23356ac681ffabde

Although the commit message says that nghttp2-1.41.0 or newer is required to fix https://github.com/curl/curl/issues/4939 it does not seem to be needed to fix this bug.  The problem went away for me despite I was still using libnghttp2-1.33.0-3.el8_2.1.x86_64 from the system.  So I guess this bug is fixed just by coincidence and there is a chance that the upstream fix could be somehow minimized to better apply on RHEL-8 and lower the risk that it would break something else.

Comment 2 Oved Ourfali 2023-02-06 15:16:46 UTC
Changing the severity due to the service impact here.

Comment 3 Kamil Dudka 2023-02-06 17:03:46 UTC
The important part of the upstream commit regarding the bug in question seems to be this one-liner:

--- a/lib/http2.c
+++ b/lib/http2.c
@@ -63,7 +63,7 @@
 #define NGHTTP2_HAS_SET_LOCAL_WINDOW_SIZE 1
 #endif
 
-#define HTTP2_HUGE_WINDOW_SIZE (1 << 30)
+#define HTTP2_HUGE_WINDOW_SIZE (32 * 1024 * 1024) /* 32 MB */
 
 #ifdef DEBUG_HTTP2
 #define H2BUGF(x) x

Comment 4 Riccardo Piccoli 2023-02-06 17:32:08 UTC
Amazing, indeed the transfer would always stop at 1073741824 bytes, which is 1 << 30. I guess the stream was stalling and they were not negotiating a new window? Just out of curiosity, do you have any more insights on what this window size was triggering?

Comment 5 Kamil Dudka 2023-02-07 09:17:39 UTC
Not really.  The value is passed to libnghttp2.  I have never debugged the protocol at this level.

Comment 12 errata-xmlrpc 2023-05-16 09:03:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: curl security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2963