1753761 – [Proxy]authentication-operator condition Degraded=True with UPI install

Bug 1753761 - [Proxy]authentication-operator condition Degraded=True with UPI install

Summary: [Proxy]authentication-operator condition Degraded=True with UPI install

Keywords:
Status:	CLOSED DUPLICATE of bug 1755073
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Daneyon Hansen
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-19 18:51 UTC by Daneyon Hansen
Modified:	2022-08-04 22:24 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-24 11:30:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
auth-operator pod logs (121.46 KB, text/plain) 2019-09-20 04:14 UTC, Daneyon Hansen	no flags	Details
auth-operator must-gather (481.82 KB, application/gzip) 2019-09-20 04:17 UTC, Daneyon Hansen	no flags	Details
must-gather (6.25 MB, application/gzip) 2019-09-20 22:03 UTC, Daneyon Hansen	no flags	Details
View All

Description Daneyon Hansen 2019-09-19 18:51:20 UTC

Description of problem:
authentication-operator condition Degraded=True with UPI install. All other operators report Degraded=False.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-09-19-161304

How reproducible:
Always

Steps to Reproduce:
1. Install a proxy-enabled cluster using UPI: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md
2. Verify the auth operator's status:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

3. Check the operator's log to see why:
$ oc logs authentication-operator-69c88b46cf-h2hzp -n openshift-authentication-operator
<SNIP>
E0919 18:41:24.662489       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: EOF

4. The route exists:
$ oc get route/oauth-openshift -n openshift-authentication
NAME              HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
oauth-openshift   oauth-openshift.apps.proxy-upi.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Actual results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

Expected results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             True        False         False      61m

Additional info:

Comment 1 Standa Laznicka 2019-09-19 19:22:06 UTC

Please add full logs from the operator and state of the openshift-authentication namespace. Must-gather would be nice.

Comment 2 Daneyon Hansen 2019-09-20 04:14:07 UTC

Created attachment 1617017 [details]
auth-operator pod logs

Comment 3 Daneyon Hansen 2019-09-20 04:17:51 UTC

Created attachment 1617018 [details]
auth-operator must-gather

Comment 4 Daneyon Hansen 2019-09-20 04:19:17 UTC

$ oc get all -n openshift-authentication
NAME                                   READY   STATUS    RESTARTS   AGE
pod/oauth-openshift-6cd87fc9df-5zpqs   1/1     Running   0          5h19m
pod/oauth-openshift-6cd87fc9df-gdsct   1/1     Running   0          5h19m

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/oauth-openshift   ClusterIP   172.30.201.126   <none>        443/TCP   5h19m

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/oauth-openshift   2/2     2            2           5h19m

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/oauth-openshift-6cd87fc9df   2         2         2       5h19m

NAME                                       HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
route.route.openshift.io/oauth-openshift   oauth-openshift.apps.upi-proxy.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Comment 5 Standa Laznicka 2019-09-20 08:00:10 UTC

It seems that the authn operator is stomping on its own route, but that does not seem to be affecting its function in normal cluster deployments. Nevertheless, that should be fixed as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1753886. Other than that I did not notice anything really weird that the authn operator could cause itself.

I'd like the routing team to have a look as well, I'm hoping that the route updates that don't usually cause this behavior wouldn't cause it in UPI either. I expect they will need some more logs.

Comment 6 Daneyon Hansen 2019-09-20 22:03:40 UTC

Created attachment 1617326 [details]
must-gather

Comment 7 Dan Mace 2019-09-24 11:30:53 UTC

Turns out the masters were labeled as workers, causing routers to be scheduled to masters, which are excluded from LB target pools in k8s by design. This is not an ingress bug. Do not label masters as workers when using ingress published with LoadBalancer services unless you actively manage ingress node placement to stay off masters.

Comment 8 W. Trevor King 2019-09-24 20:33:49 UTC


*** This bug has been marked as a duplicate of bug 1755073 ***

Comment 9 Jeff Li 2019-12-03 17:02:58 UTC

(In reply to Dan Mace from comment #7)
> Turns out the masters were labeled as workers, causing routers to be
> scheduled to masters, which are excluded from LB target pools in k8s by
> design. This is not an ingress bug. Do not label masters as workers when
> using ingress published with LoadBalancer services unless you actively
> manage ingress node placement to stay off masters.

Thanks. 
I got this problem as well during update in my lab. As a temporary workaround, I added master nodes
in LB backend servers for ingress traffic, I was able to move forward.

Note You need to log in before you can comment on or make changes to this bug.