1876919 – Unable to lookup oauth-openshift from console after OAuths updated

Bug 1876919 - Unable to lookup oauth-openshift from console after OAuths updated

Summary: Unable to lookup oauth-openshift from console after OAuths updated

Keywords:
Status:	CLOSED DUPLICATE of bug 1878148
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Stephen Greene
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1877928
TreeView+	depends on / blocked

Reported:	2020-09-08 14:02 UTC by Naveen Malik
Modified:	2022-08-04 22:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1877928 (view as bug list)
Environment:
Last Closed:	2020-09-14 18:20:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-authentication-operator pull 342	0	None	closed	Bug 1877960: [release-4.5] router-secrets: set named certificates to empty array instead of nil when none found	2021-02-15 22:01:25 UTC

Description Naveen Malik 2020-09-08 14:02:10 UTC

Description of problem:
Looks like on 4.5.7 after adding an additional identity provider console goes degraded with inability to contract auth provider.


Version-Release number of selected component (if applicable):
4.5.7

How reproducible:
100%?  We have seen issues with console on several OSD 4.5.7 clusters.

Steps to Reproduce:
1. Install new 4.5.7 cluster.
2. Verify console works.
3. Add identity provider.
4. Check console functionality.

Actual results:
Console doesn't work.  1 console pod is crashlooping.  2 older pods are still online.

Expected results:
Console works.

Additional info:
Will provide must-gather.
Am working on a simple reproducer in stage to verify how to reproduce and frequency of occurrence.

Comment 2 Naveen Malik 2020-09-08 15:15:01 UTC

I reproduced this on a stage OSD cluster by

1. installing 4.5.7
2. verified oauth and console worked
3. added github identity provider in OCM
4. verified console is not working

Note on OSD in stage we use LDAP for authn, so initial verification is with one valid identity provider.

Since this didn't work I tried to revert by taking the new github identity provider out of oauth and adding it back in.  After doing this authn works.  I was not able to fix the production cluster with this issue though.  It shows authentication degraded so may be exhibiting an additional problem.

Comment 4 Andrew McDermott 2020-09-08 16:08:54 UTC

Tagging with 4.7 while we investigate.

Comment 5 Naveen Malik 2020-09-08 17:14:47 UTC

Trying to see if I can figure out something concrete from the OSD perspective to reproduce.  Over lunch I created a new 4.5.7 cluster and let it bake a while.  Then I added a github identity provider and let that roll out, coming back maybe 30 min later.  Everything was working fine.

My next test is to add a second identity provider right after the cluster is installed by hive.  I will not wait for OSD things such as certificates to land completely.  This is how customers with automation (including the one that is the cause of this alert) are provisioning, via api and setting things up as soon as permitted.

Comment 13 Maru Newby 2020-09-10 20:37:21 UTC

The following errors logged repeatedly by the auth operator (as per #c12) is log spam:

E0909 19:49:58.876033       1 base_controller.go:180] "ConfigObserver" controller failed to sync "key", err: .servingInfo.namedCertificates accessor error: <nil> is of the type <nil>, expected []interface{}

This was fixed for 4.6 in https://github.com/openshift/cluster-authentication-operator/pull/288, and I'll make sure this gets backported to 4.5.

The other conditions reported by the auth operator indicate a problem with reading the openshift-config-managed/router-certs secret. The error messages are admittedly obscure, and the auth team is working on ensuring we report more informative conditions: 

    message: |-
      ConfigObservationDegraded: .servingInfo.namedCertificates accessor error: <nil> is of the type <nil>, expected []interface{}
      RouterCertsDegraded: secret/v4-0-config-system-router-certs.spec.data[apps.fastt02.i8v0.p1.openshiftapps.com] -n openshift-authentication: not found
      RouteHealthDegraded: failed to GET route: dial tcp: lookup oauth-openshift.apps.fastt02.i8v0.p1.openshiftapps.com on 172.30.0.10:53: no such host
    reason: ConfigObservation_Error::RouteHealth_FailedGet::RouterCerts_MissingRouterCertsPEM

I looked at the router-certs secret on a cluster provided by nmalik and the router-certs secret was as follows:

kind: Secret
metadata:
  creationTimestamp: "2020-09-09T17:03:37Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:type: {}
    manager: ingress-operator
    operation: Update
    time: "2020-09-09T17:24:14Z"
  name: router-certs
  namespace: openshift-config-managed
  resourceVersion: "37523"
  selfLink: /api/v1/namespaces/openshift-config-managed/secrets/router-certs
  uid: 87ac619b-9de7-4c26-99b2-02b2e6e0990d
type: Opaque

There were no certs present in the secret. This is an unusual condition, and one best investigated by the ingress team.

Comment 14 Maru Newby 2020-09-10 21:09:08 UTC

Adding link to backport of fix for auth operator log spam.

Comment 15 Stephen Greene 2020-09-14 18:20:53 UTC


*** This bug has been marked as a duplicate of bug 1878148 ***

Note You need to log in before you can comment on or make changes to this bug.