1853711 – HTTP/2 backend support breaks websocket

Bug 1853711 - HTTP/2 backend support breaks websocket

Summary: HTTP/2 backend support breaks websocket

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Andrew McDermott
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1854195
TreeView+	depends on / blocked

Reported:	2020-07-03 15:33 UTC by Erik Lalancette
Modified:	2023-12-15 18:23 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1854195 1854814 (view as bug list)
Environment:
Last Closed:	2020-10-27 16:12:04 UTC
Target Upstream Version:
Embargoed:
Flags:	jboxman: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 422	None	closed	Bug 1853711: Invert http/2 kill switch logic	2021-02-03 21:42:14 UTC
Github	openshift origin pull 25243	None	closed	Bug 1853711: Disable HTTP/2 tests	2021-02-03 21:42:14 UTC
Red Hat Bugzilla	1826990	high	CLOSED	HTTP/2 frontend support breaks oauth flow	2022-08-04 22:27:24 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:12:26 UTC

Description Erik Lalancette 2020-07-03 15:33:56 UTC

Description of problem:

My customer upgrade from 4.3.0 to 4.4.9 and the haproxy.config change and add this parameter on backend.

alpn h2

server pod:xxxx-8-vx7ft:xxxx:10.xx.7.75:8444 10.xx.7.75:8444 cookie f209111d3d62592d80e7ac9f49943327 weight 256 alpn h2,http/1.1 ssl verifyhost mock-xxx.xxx-xxx-dev.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt

Currently are the upgrade from 4.3.0 to 4.4.9 their WebSocket java application is not working via the Route anymore blocking them.

Version-Release number of selected component (if applicable):


How reproducible:

Upgrade from 4.3.0 to 4.4.9 
compare haproxy.config 


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Please refer to this PR:
https://github.com/openshift/router/pull/123

Comment 1 Andrew McDermott 2020-07-03 15:36:59 UTC

The fix here should only be for 4.4 as in 4.5 we have enabled HTTP/2 on the front- and backend.

Comment 5 Andrew McDermott 2020-07-06 08:45:56 UTC

I need to test but this is likely to break in 4.5 too. Moving to 4.6 for further investigation and will backport.

Comment 11 Lalatendu Mohanty 2020-07-06 14:51:00 UTC

We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions.

Who is impacted?  If we have to block upgrade edges based on this issue, which edges would need blocking?
  example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact?  Is it serious enough to warrant blocking edges?
  example: Up to 2 minute disruption in edge routing
  example: Up to 90seconds of API downtime
  example: etcd loses quorum and you have to restore from backup
How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
  example: Issue resolves itself after five minutes
  example: Admin uses oc to fix things
  example: Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
  example: No, it’s always been like this we just never noticed
  example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 15 W. Trevor King 2020-07-06 17:44:27 UTC

router#123 landed for bug 1826990, so I'm dropping the formal reference here to make it clear that that PR isn't a fix for this bug (maybe it was a workaround for this bug?)

Comment 16 W. Trevor King 2020-07-06 17:48:00 UTC

Moving to target 4.6.0.  We'll clone this back for earlier releases.

Comment 20 Arvind iyengar 2020-07-07 04:02:47 UTC

The PR was merged and made in "4.6.0-0.nightly-2020-07-07-013418". It is noted that "apln h2" is not applied by default for the backend pods: 
----
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-07-07-013418   True        False         56m     Cluster version is 4.6.0-0.nightly-2020-07-07-013418

backend be_http:test-1:service-unsecure
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  http-request set-header X-Forwarded-Host %[req.hdr(host)]
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie a382534009153924326c326fce9c3d37 insert indirect nocache httponly
  server pod:caddy-rc-h8wt4:service-unsecure:10.129.2.7:8080 10.129.2.7:8080 cookie a33879e218582bde703e9271d8c94abf weight 256 check inter 5000ms   <---
  server pod:caddy-rc-6mkr7:service-unsecure:10.129.2.8:8080 10.129.2.8:8080 cookie 2916b1d4f97a96ec3a7a4b5ad1859808 weight 256 check inter 5000ms   <---
----

Comment 21 Aleksandar Lazic (RHAc) 2020-07-07 06:57:40 UTC

I haven't seen any `alpn` keyword in a frontend section in this template https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/conf/haproxy-config.template

The documentation means "ALPN is required to enable HTTP/2 on an HTTP frontend." 
http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-alpn

This implies to me that the OCP Router does not offer HTTP/2 on frontends, right?

Any plans to make HTTP/2 on frontends available again?

Comment 22 Aleksandar Lazic (RHAc) 2020-07-07 07:09:59 UTC

Looks like the `crt-list` on line https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/conf/haproxy-config.template#L235 solves these issue which uses this function https://github.com/openshift/router/blob/release-4.6/pkg/router/template/template_helper.go#L165 , is my assumption right?

The doc link for crt-list
http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-crt-list

Comment 23 Arvind iyengar 2020-07-07 11:04:28 UTC

The PR actually made into "4.6.0-0.nightly-2020-07-07-061013" version. With this latest payload we have verified the "ROUTER_DISABLE_HTTP2" parameter that governs the activation http2 on the router is getting applied properly as intended:
------
* with set to "true": [default]

$ oc get clusterversion
 NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
 version   4.6.0-0.nightly-2020-07-07-061013   True        False         16m     Cluster version is 4.6.0-0.nightly-2020-07-07-061013

 $ oc -n openshift-ingress describe pod/router-default-5f85554c76-7kmbj | grep -i http2
   ROUTER_DISABLE_HTTP2:                      true <---

$ oc -n openshift-ingress rsh router-internalapps-5f6d568f7c-8jjlh          
sh-4.2$ env  | grep -i http2
ROUTER_DISABLE_HTTP2=true

sh-4.2$ cat haproxy.config  | grep "h2,"
sh-4.2$

route addition with "http2" disabled:

 # Plain http backend or backend with TLS terminated at the edge or a
 # secure backend with re-encryption.
 backend be_http:aiyengar:service-unsecure
   mode http
   option redispatch
   option forwardfor
   balance leastconn
 
   timeout check 5000ms
   http-request set-header X-Forwarded-Host %[req.hdr(host)]
   http-request set-header X-Forwarded-Port %[dst_port]
   http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
   http-request set-header X-Forwarded-Proto https if { ssl_fc }
   http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
   http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
   cookie c7be100ce6ab44354f6344453431eecc insert indirect nocache httponly
   server pod:caddy-rc-nv7xg:service-unsecure:10.129.2.7:8080 10.129.2.7:8080 cookie 2310f549b9342b2084c2d16195303795 weight 256 check inter 5000ms <---
   server pod:caddy-rc-fq76k:service-unsecure:10.129.2.8:8080 10.129.2.8:8080 cookie 38d24d0266f75facc4405fdfdb56b884 weight 256 check inter 5000ms <---

* With set to parameter "false":
  
$ oc -n openshift-ingress rsh router-internalapps-5f6d568f7c-8jjlh 
$ env | grep -i http2
ROUTER_DISABLE_HTTP2=false

http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
cookie 4718f5ce7595a0e8ee322a050ac1c53e insert indirect nocache httponly secure
server pod:caddy-docker:service-secure:10.131.0.21:8443 10.131.0.21:8443 cookie 53bdb2a7b6e0bb1da4644154c6d1b3e1 weight 256 ssl alpn h2,http/1.1 verify required ca-file /var/lib/haproxy/router/cacerts/aiyengar:route-reen-path.pem <----
------

Comment 24 Andrew McDermott 2020-07-08 08:22:12 UTC

(In reply to Aleksandar Lazic (RHAc) from comment #21)
> I haven't seen any `alpn` keyword in a frontend section in this template
> https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/
> conf/haproxy-config.template
> 
> The documentation means "ALPN is required to enable HTTP/2 on an HTTP
> frontend." 
> http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-alpn
> 
> This implies to me that the OCP Router does not offer HTTP/2 on frontends,
> right?
> 
> Any plans to make HTTP/2 on frontends available again?

It was enabled by default but we had to revert that as the default
keeps breaking edge cases. Availability of http/2 is still available in 4.5+
and can be enabled by setting the following annotation on either the
ingress config, or the ingress controller:

  ingress.operator.openshift.io/default-enable-http2="true"

For example:

  oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/default-enable-http2="true"

Comment 26 errata-xmlrpc 2020-10-27 16:12:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 27 W. Trevor King 2021-04-05 17:47:02 UTC

Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.