Description of problem: My customer upgrade from 4.3.0 to 4.4.9 and the haproxy.config change and add this parameter on backend. alpn h2 server pod:xxxx-8-vx7ft:xxxx:10.xx.7.75:8444 10.xx.7.75:8444 cookie f209111d3d62592d80e7ac9f49943327 weight 256 alpn h2,http/1.1 ssl verifyhost mock-xxx.xxx-xxx-dev.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt Currently are the upgrade from 4.3.0 to 4.4.9 their WebSocket java application is not working via the Route anymore blocking them. Version-Release number of selected component (if applicable): How reproducible: Upgrade from 4.3.0 to 4.4.9 compare haproxy.config Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Please refer to this PR: https://github.com/openshift/router/pull/123
The fix here should only be for 4.4 as in 4.5 we have enabled HTTP/2 on the front- and backend.
I need to test but this is likely to break in 4.5 too. Moving to 4.6 for further investigation and will backport.
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, itβs always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
router#123 landed for bug 1826990, so I'm dropping the formal reference here to make it clear that that PR isn't a fix for this bug (maybe it was a workaround for this bug?)
Moving to target 4.6.0. We'll clone this back for earlier releases.
The PR was merged and made in "4.6.0-0.nightly-2020-07-07-013418". It is noted that "apln h2" is not applied by default for the backend pods: ---- NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-07-013418 True False 56m Cluster version is 4.6.0-0.nightly-2020-07-07-013418 backend be_http:test-1:service-unsecure mode http option redispatch option forwardfor balance leastconn timeout check 5000ms http-request set-header X-Forwarded-Host %[req.hdr(host)] http-request set-header X-Forwarded-Port %[dst_port] http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 } http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)] cookie a382534009153924326c326fce9c3d37 insert indirect nocache httponly server pod:caddy-rc-h8wt4:service-unsecure:10.129.2.7:8080 10.129.2.7:8080 cookie a33879e218582bde703e9271d8c94abf weight 256 check inter 5000ms <--- server pod:caddy-rc-6mkr7:service-unsecure:10.129.2.8:8080 10.129.2.8:8080 cookie 2916b1d4f97a96ec3a7a4b5ad1859808 weight 256 check inter 5000ms <--- ----
I haven't seen any `alpn` keyword in a frontend section in this template https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/conf/haproxy-config.template The documentation means "ALPN is required to enable HTTP/2 on an HTTP frontend." http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-alpn This implies to me that the OCP Router does not offer HTTP/2 on frontends, right? Any plans to make HTTP/2 on frontends available again?
Looks like the `crt-list` on line https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/conf/haproxy-config.template#L235 solves these issue which uses this function https://github.com/openshift/router/blob/release-4.6/pkg/router/template/template_helper.go#L165 , is my assumption right? The doc link for crt-list http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-crt-list
The PR actually made into "4.6.0-0.nightly-2020-07-07-061013" version. With this latest payload we have verified the "ROUTER_DISABLE_HTTP2" parameter that governs the activation http2 on the router is getting applied properly as intended: ------ * with set to "true": [default] $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-07-061013 True False 16m Cluster version is 4.6.0-0.nightly-2020-07-07-061013 $ oc -n openshift-ingress describe pod/router-default-5f85554c76-7kmbj | grep -i http2 ROUTER_DISABLE_HTTP2: true <--- $ oc -n openshift-ingress rsh router-internalapps-5f6d568f7c-8jjlh sh-4.2$ env | grep -i http2 ROUTER_DISABLE_HTTP2=true sh-4.2$ cat haproxy.config | grep "h2," sh-4.2$ route addition with "http2" disabled: # Plain http backend or backend with TLS terminated at the edge or a # secure backend with re-encryption. backend be_http:aiyengar:service-unsecure mode http option redispatch option forwardfor balance leastconn timeout check 5000ms http-request set-header X-Forwarded-Host %[req.hdr(host)] http-request set-header X-Forwarded-Port %[dst_port] http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 } http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)] cookie c7be100ce6ab44354f6344453431eecc insert indirect nocache httponly server pod:caddy-rc-nv7xg:service-unsecure:10.129.2.7:8080 10.129.2.7:8080 cookie 2310f549b9342b2084c2d16195303795 weight 256 check inter 5000ms <--- server pod:caddy-rc-fq76k:service-unsecure:10.129.2.8:8080 10.129.2.8:8080 cookie 38d24d0266f75facc4405fdfdb56b884 weight 256 check inter 5000ms <--- * With set to parameter "false": $ oc -n openshift-ingress rsh router-internalapps-5f6d568f7c-8jjlh $ env | grep -i http2 ROUTER_DISABLE_HTTP2=false http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)] cookie 4718f5ce7595a0e8ee322a050ac1c53e insert indirect nocache httponly secure server pod:caddy-docker:service-secure:10.131.0.21:8443 10.131.0.21:8443 cookie 53bdb2a7b6e0bb1da4644154c6d1b3e1 weight 256 ssl alpn h2,http/1.1 verify required ca-file /var/lib/haproxy/router/cacerts/aiyengar:route-reen-path.pem <---- ------
(In reply to Aleksandar Lazic (RHAc) from comment #21) > I haven't seen any `alpn` keyword in a frontend section in this template > https://github.com/openshift/router/blob/release-4.6/images/router/haproxy/ > conf/haproxy-config.template > > The documentation means "ALPN is required to enable HTTP/2 on an HTTP > frontend." > http://cbonte.github.io/haproxy-dconv/2.1/configuration.html#5.1-alpn > > This implies to me that the OCP Router does not offer HTTP/2 on frontends, > right? > > Any plans to make HTTP/2 on frontends available again? It was enabled by default but we had to revert that as the default keeps breaking edge cases. Availability of http/2 is still available in 4.5+ and can be enabled by setting the following annotation on either the ingress config, or the ingress controller: ingress.operator.openshift.io/default-enable-http2="true" For example: oc -n openshift-ingress-operator annotate ingresscontrollers/default ingress.operator.openshift.io/default-enable-http2="true"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475