1779196 – vSphere UPI: failed to initialize the cluster: Some cluster operators are still updating: authentication, console

Bug 1779196 - vSphere UPI: failed to initialize the cluster: Some cluster operators are still updating: authentication, console

Summary: vSphere UPI: failed to initialize the cluster: Some cluster operators are sti...

Keywords:
Status:	CLOSED DUPLICATE of bug 1770658
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.2.z
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Oleg Bulatov
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-03 13:49 UTC by Anoel Yakoubov
Modified:	2019-12-31 08:42 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-12-19 10:48:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Anoel Yakoubov 2019-12-03 13:49:52 UTC

Description of problem:
After successfully completion of running ./openshift-install --dir=/root/cloudlet1/ wait-for bootstrap-complete --log-level debug I can't continue with the deployment of the cluster since I have not all cluster operators in True state.




Version-Release number of selected component (if applicable): 4.2.4


How reproducible:
Done more that once

Steps to Reproduce:
1.Following the guide
2.Running ./openshift-install --dir=/root/cloudlet1/ wait-for bootstrap-complete --log-level debug 
3.export KUBECONFIG 
4. oc whoami
5. oc get nodes

[root@ocp41-installer cloudlet1]# oc get nodes
NAME                                             STATUS                     ROLES    AGE   VERSION
control-plane-0.ocp41.sales.lab.tlv.redhat.com   Ready                      master   73m   v1.14.6+c7d2111b9
control-plane-1.ocp41.sales.lab.tlv.redhat.com   Ready                      master   73m   v1.14.6+c7d2111b9
control-plane-2.ocp41.sales.lab.tlv.redhat.com   Ready                      master   73m   v1.14.6+c7d2111b9
localhost                                        Ready,SchedulingDisabled   master   78m   v1.14.6+c7d2111b9

6. oc get csr 

NAME        AGE   REQUESTOR                                                                   CONDITION
csr-249jl   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-2nhr9   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-7x6cv   67m   system:node:localhost                                                       Approved,Issued
csr-94bmj   71m   system:node:localhost                                                       Approved,Issued
csr-95r9b   66m   system:node:control-plane-1.ocp41.sales.lab.tlv.redhat.com                  Approved,Issued
csr-9nzjh   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-bk674   70m   system:node:localhost                                                       Approved,Issued
csr-d8vpc   71m   system:node:localhost                                                       Approved,Issued
csr-dtqsk   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-fmqvp   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-grcbb   66m   system:node:control-plane-0.ocp41.sales.lab.tlv.redhat.com                  Approved,Issued
csr-jfcvg   71m   system:node:localhost                                                       Approved,Issued
csr-k6dsz   67m   system:node:localhost                                                       Approved,Issued
csr-nq86v   71m   system:node:localhost                                                       Approved,Issued
csr-shz78   71m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-t4tsq   71m   system:node:localhost                                                       Approved,Issued
csr-w6fz6   66m   system:node:control-plane-2.ocp41.sales.lab.tlv.redhat.com                  Approved,Issued
csr-wv5wm   67m   system:node:localhost                                                       Approved,Issued

oc get clusteroperators

[root@ocp41-installer cloudlet1]# oc get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                       Unknown     Unknown       True       59m
cloud-credential                           4.2.4     True        False         False      78m
cluster-autoscaler                         4.2.4     True        False         False      57m
console                                    4.2.4     Unknown     True          False      57m
dns                                        4.2.4     True        False         False      74m
image-registry                                       False       True          False      56m
ingress                                    unknown   False       True          True       56m
insights                                   4.2.4     True        False         False      78m
kube-apiserver                             4.2.4     True        False         False      70m
kube-controller-manager                    4.2.4     True        False         False      66m
kube-scheduler                             4.2.4     True        False         False      70m
machine-api                                4.2.4     True        False         False      78m
machine-config                             4.2.4     False       False         True       78m
marketplace                                4.2.4     True        False         False      58m
monitoring                                           False       True          True       54m
network                                    4.2.4     True        False         False      72m
node-tuning                                4.2.4     True        False         False      58m
openshift-apiserver                        4.2.4     True        False         False      58s
openshift-controller-manager               4.2.4     True        False         False      71m
openshift-samples                          4.2.4     True        False         False      35m
operator-lifecycle-manager                 4.2.4     True        False         False      73m
operator-lifecycle-manager-catalog         4.2.4     True        False         False      73m
operator-lifecycle-manager-packageserver   4.2.4     True        False         False      35m
service-ca                                 4.2.4     True        False         False      78m
service-catalog-apiserver                  4.2.4     True        False         False      59m
service-catalog-controller-manager         4.2.4     True        False         False      59m
storage                                    4.2.4     True        False         False      56m

Actual results:

Not all Cluster operators in Available True State and even after applying patch fo image-registry it's state not changing over time. As a result can't complete cluster deployment
Fail with error:

 ./openshift-install --dir=/root/cloudlet1/ wait-for install-complete --log-level debug
DEBUG OpenShift Installer v4.2.4
DEBUG Built from commit 425e4ff0037487e32571258640b39f56d5ee5572
INFO Waiting up to 30m0s for the cluster at https://api.ocp41.sales.lab.tlv.redhat.com:6443 to initialize...
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.4: 98% complete
DEBUG Still waiting for the cluster to initialize: Cluster operator machine-config is reporting a failure: Failed to resync 4.2.4 because: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 4, ready 0, updated: 0, unavailable: 1)


Expected results:

All cluster operatrors in Available true state, Cluster install-complete should be completed successfully.


Additional info:

I am trying with 4.2.4 since it was another issue with 4.2.8 version - openshift-install --dir=/root/cloudlet1/ wait-for bootstrap-complete was failed with error : api.domainname.com:6443 was refused

Comment 1 Abhinav Dahiya 2019-12-12 04:47:32 UTC

moving to machine-config as it's one of the failing operator.

Comment 3 Anoel Yakoubov 2019-12-17 08:16:25 UTC

Hi, I am not so understand which info you need me to provide?
Please elaborate little bit more!

Comment 4 Kirsten Garrison 2019-12-17 18:52:31 UTC

@Anoel: please attach the logs using this tool: https://github.com/openshift/must-gather

Comment 5 Anoel Yakoubov 2019-12-18 07:59:57 UTC

Hi, I uploaded the tar file with all logs to the Google drive
You can find it in the next link

https://drive.google.com/file/d/1nCmomYoInn0SWY0dcsmaOV0Te-_UykNz/view?usp=sharing

Comment 6 Kirsten Garrison 2019-12-18 18:01:56 UTC

Looking at the machineconfig pools both master and worker aren't degraded:
  degradedMachineCount: 0
  machineCount: 3
  observedGeneration: 2
  readyMachineCount: 3
  unavailableMachineCount: 0
  updatedMachineCount: 3

  degradedMachineCount: 0
  machineCount: 2
  observedGeneration: 2
  readyMachineCount: 2
  unavailableMachineCount: 0
  updatedMachineCount: 2

MCO is also fine:
  conditions:
  - lastTransitionTime: 2019-12-18T07:01:48Z
    message: Cluster has deployed 4.2.4
    status: "True"
    type: Available
  - lastTransitionTime: 2019-12-18T07:01:48Z
    message: Cluster version is 4.2.4
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-12-18T07:00:35Z
    status: "False"
    type: Degraded
  - lastTransitionTime: 2019-12-18T07:01:48Z
    reason: AsExpected
    status: "True"
    type: Upgradeable

Comment 7 Kirsten Garrison 2019-12-18 18:03:11 UTC

I issue that I do see is:

conditions:
  - lastTransitionTime: 2019-12-18T06:58:59Z
    status: "False"
    type: Available
  - lastTransitionTime: 2019-12-18T07:19:24Z
    message: Cluster operator image-registry is still updating
    reason: ClusterOperatorNotAvailable
    status: "True"
    type: Failing
  - lastTransitionTime: 2019-12-18T06:58:59Z
    message: 'Unable to apply 4.2.4: the cluster operator image-registry has not yet
      successfully rolled out'
    reason: ClusterOperatorNotAvailable
    status: "True"
    type: Progressing


Reassigning: Image-Registry, PTAL

Comment 8 Oleg Bulatov 2019-12-19 10:48:42 UTC


*** This bug has been marked as a duplicate of bug 1770658 ***

Comment 9 Anoel Yakoubov 2019-12-31 08:42:36 UTC

Why I am still receiving email every day with this outstanding request?

[Red Hat Bugzilla] Your Outstanding Requests
Inbox
x

bugzilla
4:24 AM (6 hours ago)
to me

The following is a list of bugs or attachments to bugs in which a user has been
waiting more than 3 days for a response from you. Please take
action on these requests as quickly as possible. (Note that some of these bugs
might already be closed, but a user is still waiting for your response.)

We'll remind you again tomorrow if these requests are still outstanding, or if
there are any new requests where users have been waiting more than 3
days for your response.

needinfo
--------

  Bug 1779196: vSphere UPI: failed to initialize the cluster: Some cluster operators are still updating: authentication, console (19 days old)
    https://bugzilla.redhat.com/show_bug.cgi?id=1779196

Note You need to log in before you can comment on or make changes to this bug.