Bug 1932401

Summary: Cluster Ingress Operator degrades if external LB redirects http to https because of new "canary" route
Product: OpenShift Container Platform Reporter: Josef Meier <josef.meier>
Component: NetworkingAssignee: Stephen Greene <sgreene>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: antgarci, anusaxen, aos-bugs, asallay, ausov, dcaros, lsantill, mjoseph, scuppett, sgreene
Version: 4.7Keywords: Upgrades
Target Milestone: ---Flags: sgreene: needinfo?
Target Release: 4.8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Exposing the default ingress controller via an external load balancer that redirects all HTTP traffic to HTTPS Consequence: Ingress Canary endpoint checks performed by the ingress operator would fail, which would ultimately cause the ingress cluster operator to become degraded. Fix: Convert the cleartext canary route to an edge encrypted route. Result: The canary route works via HTTPS only load balancers, when insecure traffic is redirected by the load balancer.
Story Points: ---
Clone Of:
: 1932649 (view as bug list) Environment:
Last Closed: 2021-07-27 22:48:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932649    

Description Josef Meier 2021-02-24 15:08:17 UTC
Hi,

in my company we use an external load balancer that redirects HTTP traffic to HTTPS.

During an upgrade from 4.6 to 4.7 the cluster-ingress-operator degraded because it couldn't reach the new canary route in openshift-ingress-canary.

I saw that this canary route is a HTTP route. This won't work in our setup.

I manually added edge termination to this route and immediately the upgrade proceeded.

This is a PR that should add 'edge' termination to the canary route:
https://github.com/openshift/cluster-ingress-operator/pull/555

Thanks and regards,

Josef

Comment 1 Hongan Li 2021-02-25 03:12:15 UTC
verified with a cluster launched by cluster-bot (launch openshift/cluster-ingress-operator#556) and passed

$ oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.ci.test-2021-02-25-014749-ci-ln-lvfqbrt   True        False         33m     Cluster version is 4.8.0-0.ci.test-2021-02-25-014749-ci-ln-lvfqbrt

$ oc -n openshift-ingress-canary get route
NAME     HOST/PORT                                                                                      PATH   SERVICES         PORT   TERMINATION     WILDCARD
canary   canary-openshift-ingress-canary.apps.ci-ln-lvfqbrt-f76d1.origin-ci-int-gce.dev.openshift.com          ingress-canary   8080   edge/Redirect   None

$ curl -kL http://canary-openshift-ingress-canary.apps.ci-ln-lvfqbrt-f76d1.origin-ci-int-gce.dev.openshift.com
Hello OpenShift!

$ curl -k https://canary-openshift-ingress-canary.apps.ci-ln-lvfqbrt-f76d1.origin-ci-int-gce.dev.openshift.com
Hello OpenShift!

Comment 4 Louis Santillan 2021-03-02 01:09:39 UTC
IHAC that is also hitting this issue since their F5 ELB is configured to drop all HTTP/80 traffic.  So this bug is related but may require another workaround.  Also, could I request an appropriate docs update (Release Notes and Install pages)?  It seems now that HTTP/80 traffic is fully required in order to upgrade to/install 4.7.

Comment 5 Stephen Greene 2021-03-02 14:13:23 UTC
(In reply to Louis Santillan from comment #4)
> IHAC that is also hitting this issue since their F5 ELB is configured to
> drop all HTTP/80 traffic.  So this bug is related but may require another
> workaround.  Also, could I request an appropriate docs update (Release Notes
> and Install pages)?  It seems now that HTTP/80 traffic is fully required in
> order to upgrade to/install 4.7.

There is a workaround mentioned here
https://github.com/openshift/openshift-docs/pull/29807

Comment 6 Louis Santillan 2021-03-03 18:11:36 UTC
I don't think the TLS termination matters if the packets on port 80 get dropped.

Comment 7 Stephen Greene 2021-03-03 18:21:46 UTC
(In reply to Louis Santillan from comment #6)
> I don't think the TLS termination matters if the packets on port 80 get
> dropped.

Using an edge terminated route means requests for the canary route will come into the cluster on port 443.

Comment 8 Stephen Greene 2021-03-03 18:24:00 UTC
(In reply to Stephen Greene from comment #7)
> (In reply to Louis Santillan from comment #6)
> > I don't think the TLS termination matters if the packets on port 80 get
> > dropped.
> 
> Using an edge terminated route means requests for the canary route will come
> into the cluster on port 443.

well I should be more specific. Requests for the edge terminated canary route will come into the external load balancer on port 443 (which will forward to the ingress controller's node port).

Comment 9 Stephen Greene 2021-03-03 18:41:53 UTC
ah, but if traffic to port 80 is dropped, the canary requests wont be able to redirect to use https. Can the customer just use an external load balancer that redirects http traffic to https? Do we officially support using an external load balancer for ingress that drops traffic on port 80? 

Sorry for churn with prior comments.

Comment 10 Stephen Greene 2021-03-03 18:46:36 UTC
Would it be sufficient to have the canary controller make requests over https (rather than over http + resolve via the route redirect?).

If so, could you open a new BZ to address that issue (and attach the a customer case)? Thanks!

Comment 11 Stephen Greene 2021-03-03 19:58:34 UTC
(In reply to Stephen Greene from comment #10)
> Would it be sufficient to have the canary controller make requests over
> https (rather than over http + resolve via the route redirect?).
> 
> If so, could you open a new BZ to address that issue (and attach the a
> customer case)? Thanks!

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1934773

Comment 12 Aleksey Usov 2021-03-12 08:12:30 UTC
Adding DNS entry for the route (wildcards are not allowed by my customer's policy) and edge termination worked, but then I started seeing "x509 certificate signed by unknown authority" errors. Fixed it by adding CA to the proxy, as described here https://docs.openshift.com/container-platform/4.7/networking/enable-cluster-wide-proxy.html#nw-proxy-configure-object_config-cluster-wide-proxy.

Comment 16 errata-xmlrpc 2021-07-27 22:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438