Bug 1753761

Summary:

[Proxy]authentication-operator condition Degraded=True with UPI install

Product:

OpenShift Container Platform

Reporter:

Daneyon Hansen <dhansen>

Component:

Networking

Assignee:

Daneyon Hansen <dhansen>

Networking sub component:

router

QA Contact:

Hongan Li <hongli>

Status:

CLOSED DUPLICATE

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

aos-bugs, bbennett, dmace, jeff.li, mfojtik, slaznick, wking

Version:

4.2.0

Target Milestone:

---

Target Release:

4.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-09-24 11:30:53 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
auth-operator pod logs	none
auth-operator must-gather	none
must-gather	none

Description Daneyon Hansen 2019-09-19 18:51:20 UTC

Description of problem:
authentication-operator condition Degraded=True with UPI install. All other operators report Degraded=False.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-09-19-161304

How reproducible:
Always

Steps to Reproduce:
1. Install a proxy-enabled cluster using UPI: https://github.com/openshift/installer/blob/master/docs/user/aws/install_upi.md
2. Verify the auth operator's status:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

3. Check the operator's log to see why:
$ oc logs authentication-operator-69c88b46cf-h2hzp -n openshift-authentication-operator
<SNIP>
E0919 18:41:24.662489       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: EOF

4. The route exists:
$ oc get route/oauth-openshift -n openshift-authentication
NAME              HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
oauth-openshift   oauth-openshift.apps.proxy-upi.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Actual results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             Unknown     Unknown       True       61m

Expected results:
$ oc get clusteroperator/authentication
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication             True        False         False      61m

Additional info:

Comment 1 Standa Laznicka 2019-09-19 19:22:06 UTC

Please add full logs from the operator and state of the openshift-authentication namespace. Must-gather would be nice.

Comment 2 Daneyon Hansen 2019-09-20 04:14:07 UTC

Created attachment 1617017 [details]
auth-operator pod logs

Comment 3 Daneyon Hansen 2019-09-20 04:17:51 UTC

Created attachment 1617018 [details]
auth-operator must-gather

Comment 4 Daneyon Hansen 2019-09-20 04:19:17 UTC

$ oc get all -n openshift-authentication
NAME                                   READY   STATUS    RESTARTS   AGE
pod/oauth-openshift-6cd87fc9df-5zpqs   1/1     Running   0          5h19m
pod/oauth-openshift-6cd87fc9df-gdsct   1/1     Running   0          5h19m

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/oauth-openshift   ClusterIP   172.30.201.126   <none>        443/TCP   5h19m

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/oauth-openshift   2/2     2            2           5h19m

NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/oauth-openshift-6cd87fc9df   2         2         2       5h19m

NAME                                       HOST/PORT                                                 PATH   SERVICES          PORT   TERMINATION            WILDCARD
route.route.openshift.io/oauth-openshift   oauth-openshift.apps.upi-proxy.devcluster.openshift.com          oauth-openshift   6443   passthrough/Redirect   None

Comment 5 Standa Laznicka 2019-09-20 08:00:10 UTC

It seems that the authn operator is stomping on its own route, but that does not seem to be affecting its function in normal cluster deployments. Nevertheless, that should be fixed as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1753886. Other than that I did not notice anything really weird that the authn operator could cause itself.

I'd like the routing team to have a look as well, I'm hoping that the route updates that don't usually cause this behavior wouldn't cause it in UPI either. I expect they will need some more logs.

Comment 6 Daneyon Hansen 2019-09-20 22:03:40 UTC

Created attachment 1617326 [details]
must-gather

Comment 7 Dan Mace 2019-09-24 11:30:53 UTC

Turns out the masters were labeled as workers, causing routers to be scheduled to masters, which are excluded from LB target pools in k8s by design. This is not an ingress bug. Do not label masters as workers when using ingress published with LoadBalancer services unless you actively manage ingress node placement to stay off masters.

Comment 8 W. Trevor King 2019-09-24 20:33:49 UTC


*** This bug has been marked as a duplicate of bug 1755073 ***

Comment 9 Jeff Li 2019-12-03 17:02:58 UTC

(In reply to Dan Mace from comment #7)
> Turns out the masters were labeled as workers, causing routers to be
> scheduled to masters, which are excluded from LB target pools in k8s by
> design. This is not an ingress bug. Do not label masters as workers when
> using ingress published with LoadBalancer services unless you actively
> manage ingress node placement to stay off masters.

Thanks. 
I got this problem as well during update in my lab. As a temporary workaround, I added master nodes
in LB backend servers for ingress traffic, I was able to move forward.