Bug 1856316

Summary:	Installer fails because openshift-authentication never gets Available
Product:	OpenShift Container Platform	Reporter:	David Sanz <dsanzmor>
Component:	apiserver-auth	Assignee:	Standa Laznicka <slaznick>
Status:	CLOSED ERRATA	QA Contact:	pmali
Severity:	high	Docs Contact:
Priority:	high
Version:	4.6	CC:	anusaxen, aos-bugs, mfojtik, pasik, pmali, slaznick, wjiang, wsun, xxia, yunjiang
Target Milestone:	---	Keywords:	Reopened
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 16:13:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description David Sanz 2020-07-13 11:53:52 UTC

Description of problem:
# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 False       True          True       10m

Events:
  Type     Reason       Age                From                                   Message
  ----     ------       ----               ----                                   -------
  Normal   Scheduled    91s                default-scheduler                      Successfully assigned openshift-authentication/oauth-openshift-6656c4cc8c-c9cw8 to mrnd-ocp22320-db49h-master-1
  Warning  FailedMount  27s (x8 over 91s)  kubelet, mrnd-ocp22320-db49h-master-1  MountVolume.SetUp failed for volume "v4-0-config-system-cliconfig" : configmap "v4-0-config-system-cliconfig" not found

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Install IPI on OSP
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Standa Laznicka 2020-07-13 12:28:01 UTC

fixed by https://github.com/openshift/cluster-authentication-operator/pull/302

Comment 2 Maru Newby 2020-07-13 16:05:03 UTC

*** Bug 1856425 has been marked as a duplicate of this bug. ***

Comment 3 Wei Sun 2020-07-14 04:02:15 UTC

Change the bug to QE status, it should be verified by QE before closing.

Comment 5 Maru Newby 2020-07-14 04:26:15 UTC

How did this pass CI if it prevents installation?

Comment 7 Standa Laznicka 2020-07-14 07:46:38 UTC

*** Bug 1856475 has been marked as a duplicate of this bug. ***

Comment 9 Standa Laznicka 2020-07-20 07:15:21 UTC

> How did this pass CI if it prevents installation?

it was a race condition, and I think we got very lucky in the CI when testing the PR

Comment 10 Yunfei Jiang 2020-07-21 07:21:16 UTC

reopen this bug, since met this problem again. (not sure if it is the same problem, let me know and I could open a new bug if this is a new issue.)


the frequency of reproducing the problem is not `always`, I reproduced problem 3 times out of 4.

LAST SEEN   TYPE      REASON                      OBJECT                                          MESSAGE
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    no nodes available to schedule pods
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    no nodes available to schedule pods
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
<unknown>   Normal    Scheduled                   pod/authentication-operator-7566665ccc-jv5c9    Successfully assigned openshift-authentication-operator/authentication-operator-7566665ccc-jv5c9 to ip-10-0-55-8.us-east-2.compute.internal
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "service-ca-bundle" : failed to sync configmap cache: timed out waiting for the condition
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "serving-cert" : failed to sync secret cache: timed out waiting for the condition
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "trusted-ca-bundle" : failed to sync configmap cache: timed out waiting for the condition

installation output:

level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=error msg="Cluster operator authentication Degraded is True with ConfigObservation_Error::IngressStateEndpoints_MissingSubsets::RouterCerts_NoRouterCertSecret: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret \"v4-0-config-system-router-certs\" not found\nIngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nConfigObservationDegraded: secret \"v4-0-config-system-router-certs\" not found"
level=info msg="Cluster operator authentication Progressing is Unknown with NoData: "
level=info msg="Cluster operator authentication Available is Unknown with NoData: "
level=error msg="Cluster operator cloud-credential Degraded is True with CredentialsFailing: 3 of 3 credentials requests are failing to sync."
level=info msg="Cluster operator cloud-credential Progressing is True with Reconciling: 0 of 3 credentials requests provisioned, 3 reporting errors."
level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstaller_InstallerPodFailed::StaticPods_Error: StaticPodsDegraded: pod/kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is not ready: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal_openshift-kube-apiserver(4524d59004962035e5c196e1396bc8f1)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal_openshift-kube-apiserver(4524d59004962035e5c196e1396bc8f1)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is not ready: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal_openshift-kube-apiserver(6cbbceab8ec9e144f33a7b1c41343b1c)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal_openshift-kube-apiserver(6cbbceab8ec9e144f33a7b1c41343b1c)\nStaticPodsDegraded: pods \"kube-apiserver-ip-10-0-55-8.us-east-2.compute.internal\" not found\nNodeInstallerDegraded: 1 nodes are failing on revision 2:\nNodeInstallerDegraded: static pod of revision 2 has been installed, but is not ready while new revision 3 is pending"
level=info msg="Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 3"
level=info msg="Cluster operator kube-apiserver Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 3"
level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstaller_InstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: static pod of revision 3 has been installed, but is not ready while new revision 4 is pending; 1 nodes are failing on revision 4:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-controller-manager Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 1 nodes are at revision 4; 0 nodes have achieved new revision 5"
level=info msg="Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available"
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/network-metrics-daemon\" is waiting for other operators to become ready"
level=info msg="Cluster operator openshift-apiserver Available is False with APIServices_PreconditionNotReady: APIServicesAvailable: PreconditionNotReady"
level=info msg="Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 0, desired generation is 10.\nProgressing: daemonset/controller-manager: number available is 0, desired number available > 1"
level=info msg="Cluster operator openshift-controller-manager Available is False with _NoPodsAvailable: Available: no daemon pods available on any node."
level=info msg="Use the following commands to gather logs from the cluster"
level=info msg="openshift-install gather bootstrap --help"
level=fatal msg="failed to wait for bootstrapping to complete: timed out waiting for the condition"

Comment 11 Standa Laznicka 2020-07-21 07:25:26 UTC

that looks like a different problem, more related to ingress than authentication

Comment 12 Yunfei Jiang 2020-07-21 09:20:58 UTC

Change back to VERIFIED status since https://bugzilla.redhat.com/show_bug.cgi?id=1856316#c10 is a different issue.

Comment 14 errata-xmlrpc 2020-10-27 16:13:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 15 egarcia 2020-12-07 17:21:37 UTC

*** Bug 1892187 has been marked as a duplicate of this bug. ***