Description of problem: authentication-operator condition Degraded=True with UPI install. All other operators report Degraded=False. Version-Release number of selected component (if applicable): 4.2.0-0.okd-2019-09-19-161304 How reproducible: Always Steps to Reproduce: 1. Install a proxy-enabled cluster using UPI: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md 2. Verify the auth operator's status: $ oc get clusteroperator/authentication NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication Unknown Unknown True 61m 3. Check the operator's log to see why: $ oc logs authentication-operator-69c88b46cf-h2hzp -n openshift-authentication-operator <SNIP> E0919 18:41:24.662489 1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: EOF 4. The route exists: $ oc get route/oauth-openshift -n openshift-authentication NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD oauth-openshift oauth-openshift.apps.proxy-upi.devcluster.openshift.com oauth-openshift 6443 passthrough/Redirect None Actual results: $ oc get clusteroperator/authentication NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication Unknown Unknown True 61m Expected results: $ oc get clusteroperator/authentication NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication True False False 61m Additional info:
Please add full logs from the operator and state of the openshift-authentication namespace. Must-gather would be nice.
Created attachment 1617017 [details] auth-operator pod logs
Created attachment 1617018 [details] auth-operator must-gather
$ oc get all -n openshift-authentication NAME READY STATUS RESTARTS AGE pod/oauth-openshift-6cd87fc9df-5zpqs 1/1 Running 0 5h19m pod/oauth-openshift-6cd87fc9df-gdsct 1/1 Running 0 5h19m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/oauth-openshift ClusterIP 172.30.201.126 <none> 443/TCP 5h19m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/oauth-openshift 2/2 2 2 5h19m NAME DESIRED CURRENT READY AGE replicaset.apps/oauth-openshift-6cd87fc9df 2 2 2 5h19m NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/oauth-openshift oauth-openshift.apps.upi-proxy.devcluster.openshift.com oauth-openshift 6443 passthrough/Redirect None
It seems that the authn operator is stomping on its own route, but that does not seem to be affecting its function in normal cluster deployments. Nevertheless, that should be fixed as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1753886. Other than that I did not notice anything really weird that the authn operator could cause itself. I'd like the routing team to have a look as well, I'm hoping that the route updates that don't usually cause this behavior wouldn't cause it in UPI either. I expect they will need some more logs.
Created attachment 1617326 [details] must-gather
Turns out the masters were labeled as workers, causing routers to be scheduled to masters, which are excluded from LB target pools in k8s by design. This is not an ingress bug. Do not label masters as workers when using ingress published with LoadBalancer services unless you actively manage ingress node placement to stay off masters.
*** This bug has been marked as a duplicate of bug 1755073 ***
(In reply to Dan Mace from comment #7) > Turns out the masters were labeled as workers, causing routers to be > scheduled to masters, which are excluded from LB target pools in k8s by > design. This is not an ingress bug. Do not label masters as workers when > using ingress published with LoadBalancer services unless you actively > manage ingress node placement to stay off masters. Thanks. I got this problem as well during update in my lab. As a temporary workaround, I added master nodes in LB backend servers for ingress traffic, I was able to move forward.