Bug 1753761

Summary: [Proxy]authentication-operator condition Degraded=True with UPI install
Product: OpenShift Container Platform Reporter: Daneyon Hansen <dhansen>
Component: NetworkingAssignee: Daneyon Hansen <dhansen>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, bbennett, dmace, jeff.li, mfojtik, slaznick, wking
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-24 11:30:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
auth-operator pod logs
none
auth-operator must-gather
none
must-gather none

Description Daneyon Hansen 2019-09-19 18:51:20 UTC
Description of problem:
authentication-operator condition Degraded=True with UPI install. All other operators report Degraded=False.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-09-19-161304

How reproducible:
Always

Steps to Reproduce:
1. Install a proxy-enabled cluster using UPI: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md
2. Verify the auth operator's status:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

3. Check the operator's log to see why:
$ oc logs authentication-operator-69c88b46cf-h2hzp -n openshift-authentication-operator
<SNIP>
E0919 18:41:24.662489       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: EOF

4. The route exists:
$ oc get route/oauth-openshift -n openshift-authentication
NAME              HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
oauth-openshift   oauth-openshift.apps.proxy-upi.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Actual results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

Expected results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             True        False         False      61m

Additional info:

Comment 1 Standa Laznicka 2019-09-19 19:22:06 UTC
Please add full logs from the operator and state of the openshift-authentication namespace. Must-gather would be nice.

Comment 2 Daneyon Hansen 2019-09-20 04:14:07 UTC
Created attachment 1617017 [details]
auth-operator pod logs

Comment 3 Daneyon Hansen 2019-09-20 04:17:51 UTC
Created attachment 1617018 [details]
auth-operator must-gather

Comment 4 Daneyon Hansen 2019-09-20 04:19:17 UTC
$ oc get all -n openshift-authentication
NAME                                   READY   STATUS    RESTARTS   AGE
pod/oauth-openshift-6cd87fc9df-5zpqs   1/1     Running   0          5h19m
pod/oauth-openshift-6cd87fc9df-gdsct   1/1     Running   0          5h19m

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/oauth-openshift   ClusterIP   172.30.201.126   <none>        443/TCP   5h19m

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/oauth-openshift   2/2     2            2           5h19m

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/oauth-openshift-6cd87fc9df   2         2         2       5h19m

NAME                                       HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
route.route.openshift.io/oauth-openshift   oauth-openshift.apps.upi-proxy.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Comment 5 Standa Laznicka 2019-09-20 08:00:10 UTC
It seems that the authn operator is stomping on its own route, but that does not seem to be affecting its function in normal cluster deployments. Nevertheless, that should be fixed as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1753886. Other than that I did not notice anything really weird that the authn operator could cause itself.

I'd like the routing team to have a look as well, I'm hoping that the route updates that don't usually cause this behavior wouldn't cause it in UPI either. I expect they will need some more logs.

Comment 6 Daneyon Hansen 2019-09-20 22:03:40 UTC
Created attachment 1617326 [details]
must-gather

Comment 7 Dan Mace 2019-09-24 11:30:53 UTC
Turns out the masters were labeled as workers, causing routers to be scheduled to masters, which are excluded from LB target pools in k8s by design. This is not an ingress bug. Do not label masters as workers when using ingress published with LoadBalancer services unless you actively manage ingress node placement to stay off masters.

Comment 8 W. Trevor King 2019-09-24 20:33:49 UTC

*** This bug has been marked as a duplicate of bug 1755073 ***

Comment 9 Jeff Li 2019-12-03 17:02:58 UTC
(In reply to Dan Mace from comment #7)
> Turns out the masters were labeled as workers, causing routers to be
> scheduled to masters, which are excluded from LB target pools in k8s by
> design. This is not an ingress bug. Do not label masters as workers when
> using ingress published with LoadBalancer services unless you actively
> manage ingress node placement to stay off masters.

Thanks. 
I got this problem as well during update in my lab. As a temporary workaround, I added master nodes
in LB backend servers for ingress traffic, I was able to move forward.