Bug 2088539

Summary: Openshift route URLs starting with double slashes stopped working after update to 4.8.33 - curl version problems
Product: OpenShift Container Platform Reporter: Immanuvel <imm>
Component: NetworkingAssignee: Andrew McDermott <amcdermo>
Networking sub component: router QA Contact: Melvin Joseph <mjoseph>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: amcdermo, gspence, hongli, mjoseph, mmasters, molasaga, openshift-bugzilla-robot, pescorza
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-09 14:00:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2103729    
Bug Blocks: 2103731, 2103738    

Description Immanuvel 2022-05-19 16:20:55 UTC
Description of problem:

Due to some issues in software we use (which we are not able to easily solve now), URL of our Jenkins hosted on ocp-c1.prod sometimes contains double slashes. 

Example: https://jenkins.eapqe.psi.redhat.com//jnlpJars/remoting.jar

It had worked for months but somehow it stopped working a week or two ago. I cannot tell exactly but for sure it worked on Mar 8 and then on March 12 is when cluster got updated to 4.8.33

```
$ curl -k -v https://jenkins.eapqe.psi.redhat.com//jnlpJars/remoting.jar --output /tmp/remoting.jar
...
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
```

Also, it looks like only HTTP/2 is affected, HTTP 1.1 works fine. 

Didn't find any recent commit in the relevant repos which could cause this in ingress side ...

https://github.com/openshift/cluster-ingress-operator/commits/release-4.8
https://github.com/openshift/router/commits/release-4.8

Another observation:

When http2 is forced for default route that appears to accept http 1.1, it fails with the same error:
```
$ curl --http2-prior-knowledge -k -v https://master-jenkins-csb-eap-qe.apps.ocp-c1.prod.psi.redhat.com//jnlpJars/remoting.jar --output /tmp/remoting.jar
...
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
...
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
```

We have also similar setup on ocp4.prod - 4.8.24 (jenkins-csb-eapcpqe project) where everything works fine (uses HTTP/2):


OpenShift release version:
4.8.33 

Cluster Platform:
OCP (any platform)

How reproducible:
Always reproducible and it's simple 

Steps to Reproduce (in detail):
1. As the url is redhat.com , we can directly try to curl from our laptop  or desktop
2.Login into OCP bastion node and try to curl , you will see it's working
3. Login into OCP nodes , it won't work 


Actual results:
$ curl -k -v https://jenkins.eapqe.psi.redhat.com//jnlpJars/remoting.jar --output /tmp/remoting.jar
...
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

Expected results:
< HTTP/1.1 200 OK

Impact of the problem:

URL with double slash doesn't work with 

Additional info:

With the recent  curl version , it's throwing error  where http/2 is officially included but in older versions of curl , the  url's with double // is working 

** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 Immanuvel 2022-05-19 16:23:36 UTC
Hi,


Is there something we can do for this ?

Below additional details about some analysis 

From bastion node       --> it works curl version is 7.29

From my laptop          --> it doesn't work because of version greater than 7.61  

From one of the OCP node --> it doesn't work because of version greater than 7.61  

From ingress router pod --> curl doesn't work as the curl version is 7.61  

Below you can see the results of curl version and results of url reaching from bastion  node 
[quicklab@upi-0 ~]$ curl --version
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.44 zlib/1.2.7 libidn/1.28 libssh2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp 
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets     --> HTTP2 not present as a official supported feature in earlier curl version 


You can see there is no TLS / SSL involved when i try to curl from bastion host as it's using older curl version 

++++++++++++
[quicklab@upi-0 ~]$ curl -i -kv  https://jenkins.eapqe.psi.redhat.com//jnlpJars/remoting.jar --output /tmp/remoting.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to jenkins.eapqe.psi.redhat.com port 443 (#0)
*   Trying 10.0.180.88...
* Connected to jenkins.eapqe.psi.redhat.com (10.0.180.88) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* 	subject: CN=jenkins.eapqe.psi.redhat.com,E=eap-qe-cci-maintenance,OU=EAP QE Team,O=Red Hat,L=Raleigh,ST=North Carolina,C=US
* 	start date: Jun 10 14:27:54 2021 GMT
* 	expire date: Jun 05 14:27:54 2022 GMT
* 	common name: jenkins.eapqe.psi.redhat.com
* 	issuer: CN=Certificate Authority,OU=prod,O=Red Hat
> GET //jnlpJars/remoting.jar HTTP/1.1
> User-Agent: curl/7.29.0
> Host: jenkins.eapqe.psi.redhat.com
> Accept: */*
> 
< HTTP/1.1 200 OK

+++++++++++

From the OCP node , we can see additional Features on the  curl package like HTTP2 , HTTPS-proxy  since HTTP2 is supported  we could expect  there would be some  definite security measures for using  HTTP2 applications 

####################
sh-4.4# curl --version
curl 7.61.1 (x86_64-redhat-linux-gnu) libcurl/7.61.1 OpenSSL/1.1.1g zlib/1.2.11 brotli/1.0.6 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.2.0) libssh/0.9.4/openssl/zlib nghttp2/1.33.0
Release-Date: 2018-09-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz brotli TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL   --> HTTP2  included as a feature 


sh-4.4# curl -kv https://jenkins.eapqe.psi.redhat.com//jnlpJars/remoting.jar --output /tmp/remoting.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.0.180.90...

OUTPUTS TRIMMED ########


* TLSv1.3 (OUT), TLS app data, [no content] (0):
} [1 bytes data]
> GET //jnlpJars/remoting.jar HTTP/2
> Host: jenkins.eapqe.psi.redhat.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
{ [5 bytes data]


{ [1 bytes data]
* HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
* stopped the pause stream!
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
* Connection #0 to host jenkins.eapqe.psi.redhat.com left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

Comment 2 Miciah Dashiel Butler Masters 2022-05-19 16:50:51 UTC
It could be that the requests are failing because of stricter URI parsing in HTTP/2 or because of some recent CVE fixes to mitigate against request smuggling attacks.  

Can you get the output of `rpm -q haproxy22` from inside a router pod?  

What type of route is jenkins.eapqe.psi.redhat.com?  Passthrough, edge-terminated, or reencrypt?  

Is there any reason you expect or need using a double-slash to work?

Comment 4 Miciah Dashiel Butler Masters 2022-05-19 17:14:04 UTC
Most likely, you are being affected by this change in HAProxy:
https://github.com/haproxy/haproxy/commit/09c026a1696b49da6d11b680772c2b8f624288a0#diff-ee7072d6347f8e9756c63e59b94dd5f8be353a7b16305545f4b522b7ced001bb

This patch was merged in OpenShift 4.8.25 (see bug 2002703, comment 30).  

The change mitigates a security vulnerability, so we are not going to revert it.

Comment 5 Andrew McDermott 2022-06-21 16:24:11 UTC
*** Bug 2099744 has been marked as a duplicate of this bug. ***

Comment 14 Andrew McDermott 2022-07-13 10:06:24 UTC
The fix for 4.9 should now be available in 4.9.42.

https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.9.42

Moving this to ON_QA - please could we verify that the new haproxy proxy RPM is 2.2.15-5

$ oc get pods -n openshift-ingress
$ oc rsh -n openshift-ingress router-default-c8ff897b-sjw49
sh-4.4$ rpm -qa| grep haproxy22
haproxy22-2.2.15-5.el8.x86_64

Comment 15 Andrew McDermott 2022-07-13 10:07:31 UTC
*** Bug 2103730 has been marked as a duplicate of this bug. ***

Comment 17 Melvin Joseph 2022-07-14 03:09:12 UTC
melvinjoseph@mjoseph-mac BZ2088539 % oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.42    True        False         48m     Cluster version is 4.9.42

melvinjoseph@mjoseph-mac BZ2088539 % oc rsh -n openshift-ingress router-default-5fc599d598-64lgr
Defaulted container "router" out of: router, logs
sh-4.4$ rpm -qa| grep haproxy22
haproxy22-2.2.15-5.el8.x86_64
sh-4.4$ exit

melvinjoseph@mjoseph-mac BZ2088539 % oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge --patch='{"spec":{"logging":{"access":{"destination":{"type":"Container"}}}}}'
ingresscontroller.operator.openshift.io/default patched
melvinjoseph@mjoseph-mac BZ2088539 % oc -n openshift-ingress-operator scale --replicas=1 ingresscontroller/default
ingresscontroller.operator.openshift.io/default scaled
melvinjoseph@mjoseph-mac BZ2088539 % oc apply -f deployment.yaml
deployment.apps/hello-app created
service/hello-app created

melvinjoseph@mjoseph-mac BZ2088539 % bash -x  ./create-routes.sh
+ set -eu
+ set -o pipefail
++ oc get -n openshift-ingress-operator ingresscontroller/default -o yaml
++ yq .status.domain
+ domain='"apps.mjoseph-20885392.qe.azure.devcluster.openshift.com"'
+ [[ 0 -ne 0 ]]
++ mktemp -d
+ tmpdir=/var/folders/7f/yh5cb4n953jf7bjbkr2vy0840000gn/T/tmp.KQgE0S1r
+ trap 'rm -rf -- "$tmpdir"' EXIT
+ go run certgen/certgen.go
+ . /var/folders/7f/yh5cb4n953jf7bjbkr2vy0840000gn/T/tmp.KQgE0S1r/env
++ TLS_KEY='-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIG8mE02VAknqocKftNnRDpcHSINkR/yM8UJlLzuhyGU2oAoGCCqGSM49
AwEHoUQDQgAENSIy9lCsILcWCKpii7kfHulUHwQ6FRVAUzZ/4lL/TRhBxEtqBz6s
3zlntnpqYKsUTedjJw2cwJ1MyB1+G5FY8g==
-----END EC PRIVATE KEY-----'
++ TLS_CRT='-----BEGIN CERTIFICATE-----
MIIBxDCCAWmgAwIBAgIRALb64BIXyoPjmuf8zaOioJ0wCgYIKoZIzj0EAwIwKDEU
MBIGA1UEChMLQ2VydCBHZW4gQ28xEDAOBgNVBAMTB1Jvb3QgQ0EwIBcNMjIwNzE0
MDI1MjQyWhgPMjEyMjA2MjAwMjUyNDJaMEIxGTAXBgNVBAoTEENlcnQgR2VuIENv
bXBhbnkxJTAjBgNVBAMTHENlcnQgR2VuIENvbXBhbnkgQ29tbW9uIE5hbWUwWTAT
BgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ1IjL2UKwgtxYIqmKLuR8e6VQfBDoVFUBT
Nn/iUv9NGEHES2oHPqzfOWe2empgqxRN52MnDZzAnUzIHX4bkVjyo1gwVjAOBgNV
HQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAh
BgNVHREEGjAYhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMCA0kA
MEYCIQDNUgmGF95w+ILvriyV3qMoPS0iz+vsLbufJc95Q7aqhAIhANsXxq+KDZww
NYnpwIxJKd7Aw9d80nCYlhM4Ed7zh7yM
-----END CERTIFICATE-----'
++ TLS_CACRT='-----BEGIN CERTIFICATE-----
MIIBpjCCAU2gAwIBAgIQXj/aPG3TbpDi2uT99J2dQDAKBggqhkjOPQQDAjAoMRQw
EgYDVQQKEwtDZXJ0IEdlbiBDbzEQMA4GA1UEAxMHUm9vdCBDQTAgFw0yMjA3MTQw
MjUyNDJaGA8yMTIyMDYyMDAyNTI0MlowKDEUMBIGA1UEChMLQ2VydCBHZW4gQ28x
EDAOBgNVBAMTB1Jvb3QgQ0EwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAATRfQ9j
DdsxZdH/e93iZ5Qw9e0kEzLAkHt1+2aRleojDDRkU2hWBBOZwPMGXHlPmF/ipRxq
JLOanK7mxEnM+Q9/o1cwVTAOBgNVHQ8BAf8EBAMCAgQwEwYDVR0lBAwwCgYIKwYB
BQUHAwEwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUsLWhUlQH8wu+qdxsIC50
+rLQfnUwCgYIKoZIzj0EAwIDRwAwRAIgH/Zlad3OCgvOD5mMcuVSalocn1E1LYmO
tPOiZaSkMv0CIEhggiTA9IbgFFe9QPFmjHF4zXv5NGCqixULj6IXmFqw
-----END CERTIFICATE-----'
+ oc process -p 'TLS_CRT=-----BEGIN CERTIFICATE-----
MIIBxDCCAWmgAwIBAgIRALb64BIXyoPjmuf8zaOioJ0wCgYIKoZIzj0EAwIwKDEU
MBIGA1UEChMLQ2VydCBHZW4gQ28xEDAOBgNVBAMTB1Jvb3QgQ0EwIBcNMjIwNzE0
MDI1MjQyWhgPMjEyMjA2MjAwMjUyNDJaMEIxGTAXBgNVBAoTEENlcnQgR2VuIENv
bXBhbnkxJTAjBgNVBAMTHENlcnQgR2VuIENvbXBhbnkgQ29tbW9uIE5hbWUwWTAT
BgcqhkjOPQIBBggqhkjOPQMBBwNCAAQ1IjL2UKwgtxYIqmKLuR8e6VQfBDoVFUBT
Nn/iUv9NGEHES2oHPqzfOWe2empgqxRN52MnDZzAnUzIHX4bkVjyo1gwVjAOBgNV
HQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAh
BgNVHREEGjAYhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMCA0kA
MEYCIQDNUgmGF95w+ILvriyV3qMoPS0iz+vsLbufJc95Q7aqhAIhANsXxq+KDZww
NYnpwIxJKd7Aw9d80nCYlhM4Ed7zh7yM
-----END CERTIFICATE-----' -p 'TLS_KEY=-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIG8mE02VAknqocKftNnRDpcHSINkR/yM8UJlLzuhyGU2oAoGCCqGSM49
AwEHoUQDQgAENSIy9lCsILcWCKpii7kfHulUHwQ6FRVAUzZ/4lL/TRhBxEtqBz6s
3zlntnpqYKsUTedjJw2cwJ1MyB1+G5FY8g==
-----END EC PRIVATE KEY-----' -p 'DOMAIN="apps.mjoseph-20885392.qe.azure.devcluster.openshift.com"' -f routes.yaml
+ oc apply -f -
route.route.openshift.io/hello-edge created
+ rm -rf -- /var/folders/7f/yh5cb4n953jf7bjbkr2vy0840000gn/T/tmp.KQgE0S1r
melvinjoseph@mjoseph-mac BZ2088539 % oc get route
NAME         HOST/PORT                                                            PATH   SERVICES    PORT   TERMINATION     WILDCARD
hello-edge   hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com          hello-app   8080   edge/Redirect   None

Disable HTTP/2 on the default ingresscontroller (this is the default)
melvinjoseph@mjoseph-mac BZ2088539 % oc -n openshift-ingress-operator annotate --overwrite ingresscontrollers/default ingress.operator.openshift.io/default-enable-http2=false
ingresscontroller.operator.openshift.io/default annotated
melvinjoseph@mjoseph-mac BZ2088539 % oc rollout status deployment -n openshift-ingress router-default --timeout=120s
deployment "router-default" successfully rolled out

rove the app works without HTTP/2 enabled on the default ingresscontroller
melvinjoseph@mjoseph-mac BZ2088539 % curl -k -L hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com
Hello OpenShift!

Prove the app works with double-slash in the URL with HTTP/1.1 and HTTP/2
melvinjoseph@mjoseph-mac BZ2088539 % curl --http1.1 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz
Hello OpenShift!
melvinjoseph@mjoseph-mac BZ2088539 % curl --http2 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz
Hello OpenShift!

Enable HTTP/2 on the default ingresscontroller
melvinjoseph@mjoseph-mac BZ2088539 % oc -n openshift-ingress-operator annotate --overwrite ingresscontrollers/default ingress.operator.openshift.io/default-enable-http2=true
ingresscontroller.operator.openshift.io/default annotated
melvinjoseph@mjoseph-mac BZ2088539 % oc rollout status deployment -n openshift-ingress router-default --timeout=120s
Waiting for deployment "router-default" rollout to finish: 1 old replicas are pending termination...
melvinjoseph@mjoseph-mac BZ2088539 % oc get pods -n openshift-ingress -w
NAME                              READY   STATUS        RESTARTS   AGE
router-default-5fc599d598-64lgr   2/2     Running       0          36s
router-default-67b79d8dc9-wtcjg   2/2     Terminating   0          3m42s
router-default-67b79d8dc9-wtcjg   0/2     Terminating   0          4m58s
router-default-67b79d8dc9-wtcjg   0/2     Terminating   0          4m58s
router-default-67b79d8dc9-wtcjg   0/2     Terminating   0          4m58s
^C%                                                                                                                                                                                                       melvinjoseph@mjoseph-mac BZ2088539 % oc get pods -n openshift-ingress -w
NAME                              READY   STATUS    RESTARTS   AGE
router-default-5fc599d598-64lgr   2/2     Running   0          2m20s

Prove the app works when there are no double-slashes in the URL
melvinjoseph@mjoseph-mac BZ2088539 % curl --http1.1 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com
Hello OpenShift!
melvinjoseph@mjoseph-mac BZ2088539 % curl --http2 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com                                         
Hello OpenShift!

Prove the app works when there are double-slashes in the URL
melvinjoseph@mjoseph-mac BZ2088539 % curl --http1.1 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz                           
Hello OpenShift!
melvinjoseph@mjoseph-mac BZ2088539 % curl --http2 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz                              
Hello OpenShift!
melvinjoseph@mjoseph-mac BZ2088539 % curl --http2-prior-knowledge -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz
Hello OpenShift!
melvinjoseph@mjoseph-mac BZ2088539 % curl --http2 -k -L https://hello-edge.apps.mjoseph-20885392.qe.azure.devcluster.openshift.com//foo/bar/baz                
Hello OpenShift!

Hence marking as verified

Comment 26 errata-xmlrpc 2022-08-09 14:00:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.9.45 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5879

Comment 27 Melvin Joseph 2023-04-25 16:17:06 UTC
The fix is only for 4.8 and 4.9 builds, both of them are EOL now.
The patch mentioned in #c6 is not applicable from 4.10 onwards as each of those releases are using a version of haproxy that already includes the fix.
so no Test cases are created for the same.