Bug 1856316 - Installer fails because openshift-authentication never gets Available
Summary: Installer fails because openshift-authentication never gets Available
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Standa Laznicka
QA Contact: pmali
URL:
Whiteboard:
: 1856425 1856475 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-13 11:53 UTC by David Sanz
Modified: 2020-12-07 17:21 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:13:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:14:27 UTC

Description David Sanz 2020-07-13 11:53:52 UTC
Description of problem:
# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 False       True          True       10m

Events:
  Type     Reason       Age                From                                   Message
  ----     ------       ----               ----                                   -------
  Normal   Scheduled    91s                default-scheduler                      Successfully assigned openshift-authentication/oauth-openshift-6656c4cc8c-c9cw8 to mrnd-ocp22320-db49h-master-1
  Warning  FailedMount  27s (x8 over 91s)  kubelet, mrnd-ocp22320-db49h-master-1  MountVolume.SetUp failed for volume "v4-0-config-system-cliconfig" : configmap "v4-0-config-system-cliconfig" not found

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Install IPI on OSP
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Standa Laznicka 2020-07-13 12:28:01 UTC
fixed by https://github.com/openshift/cluster-authentication-operator/pull/302

Comment 2 Maru Newby 2020-07-13 16:05:03 UTC
*** Bug 1856425 has been marked as a duplicate of this bug. ***

Comment 3 Wei Sun 2020-07-14 04:02:15 UTC
Change the bug to QE status, it should be verified by QE before closing.

Comment 5 Maru Newby 2020-07-14 04:26:15 UTC
How did this pass CI if it prevents installation?

Comment 7 Standa Laznicka 2020-07-14 07:46:38 UTC
*** Bug 1856475 has been marked as a duplicate of this bug. ***

Comment 9 Standa Laznicka 2020-07-20 07:15:21 UTC
> How did this pass CI if it prevents installation?

it was a race condition, and I think we got very lucky in the CI when testing the PR

Comment 10 Yunfei Jiang 2020-07-21 07:21:16 UTC
reopen this bug, since met this problem again. (not sure if it is the same problem, let me know and I could open a new bug if this is a new issue.)


the frequency of reproducing the problem is not `always`, I reproduced problem 3 times out of 4.

LAST SEEN   TYPE      REASON                      OBJECT                                          MESSAGE
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    no nodes available to schedule pods
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    no nodes available to schedule pods
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
<unknown>   Warning   FailedScheduling            pod/authentication-operator-7566665ccc-jv5c9    0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
<unknown>   Normal    Scheduled                   pod/authentication-operator-7566665ccc-jv5c9    Successfully assigned openshift-authentication-operator/authentication-operator-7566665ccc-jv5c9 to ip-10-0-55-8.us-east-2.compute.internal
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "service-ca-bundle" : failed to sync configmap cache: timed out waiting for the condition
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "serving-cert" : failed to sync secret cache: timed out waiting for the condition
3h17m       Warning   FailedMount                 pod/authentication-operator-7566665ccc-jv5c9    MountVolume.SetUp failed for volume "trusted-ca-bundle" : failed to sync configmap cache: timed out waiting for the condition

installation output:

level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
level=error msg="Cluster operator authentication Degraded is True with ConfigObservation_Error::IngressStateEndpoints_MissingSubsets::RouterCerts_NoRouterCertSecret: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret \"v4-0-config-system-router-certs\" not found\nIngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nConfigObservationDegraded: secret \"v4-0-config-system-router-certs\" not found"
level=info msg="Cluster operator authentication Progressing is Unknown with NoData: "
level=info msg="Cluster operator authentication Available is Unknown with NoData: "
level=error msg="Cluster operator cloud-credential Degraded is True with CredentialsFailing: 3 of 3 credentials requests are failing to sync."
level=info msg="Cluster operator cloud-credential Progressing is True with Reconciling: 0 of 3 credentials requests provisioned, 3 reporting errors."
level=error msg="Cluster operator kube-apiserver Degraded is True with NodeInstaller_InstallerPodFailed::StaticPods_Error: StaticPodsDegraded: pod/kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is not ready: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal_openshift-kube-apiserver(4524d59004962035e5c196e1396bc8f1)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-76-192.us-east-2.compute.internal_openshift-kube-apiserver(4524d59004962035e5c196e1396bc8f1)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is not ready: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal_openshift-kube-apiserver(6cbbceab8ec9e144f33a7b1c41343b1c)\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal container \"kube-apiserver-check-endpoints\" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-ip-10-0-58-64.us-east-2.compute.internal_openshift-kube-apiserver(6cbbceab8ec9e144f33a7b1c41343b1c)\nStaticPodsDegraded: pods \"kube-apiserver-ip-10-0-55-8.us-east-2.compute.internal\" not found\nNodeInstallerDegraded: 1 nodes are failing on revision 2:\nNodeInstallerDegraded: static pod of revision 2 has been installed, but is not ready while new revision 3 is pending"
level=info msg="Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 3"
level=info msg="Cluster operator kube-apiserver Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 3"
level=error msg="Cluster operator kube-controller-manager Degraded is True with NodeInstaller_InstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 3:\nNodeInstallerDegraded: static pod of revision 3 has been installed, but is not ready while new revision 4 is pending; 1 nodes are failing on revision 4:\nNodeInstallerDegraded: "
level=info msg="Cluster operator kube-controller-manager Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 1 nodes are at revision 4; 0 nodes have achieved new revision 5"
level=info msg="Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available"
level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/network-metrics-daemon\" is waiting for other operators to become ready"
level=info msg="Cluster operator openshift-apiserver Available is False with APIServices_PreconditionNotReady: APIServicesAvailable: PreconditionNotReady"
level=info msg="Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: daemonset/controller-manager: observed generation is 0, desired generation is 10.\nProgressing: daemonset/controller-manager: number available is 0, desired number available > 1"
level=info msg="Cluster operator openshift-controller-manager Available is False with _NoPodsAvailable: Available: no daemon pods available on any node."
level=info msg="Use the following commands to gather logs from the cluster"
level=info msg="openshift-install gather bootstrap --help"
level=fatal msg="failed to wait for bootstrapping to complete: timed out waiting for the condition"

Comment 11 Standa Laznicka 2020-07-21 07:25:26 UTC
that looks like a different problem, more related to ingress than authentication

Comment 12 Yunfei Jiang 2020-07-21 09:20:58 UTC
Change back to VERIFIED status since https://bugzilla.redhat.com/show_bug.cgi?id=1856316#c10 is a different issue.

Comment 14 errata-xmlrpc 2020-10-27 16:13:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 15 egarcia 2020-12-07 17:21:37 UTC
*** Bug 1892187 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.