1729091 – Cluster does not come up when networkType set to OVNKubernetes

Bug 1729091 - Cluster does not come up when networkType set to OVNKubernetes

Summary: Cluster does not come up when networkType set to OVNKubernetes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Dan Winship
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1737122 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-11 11:51 UTC by Anurag saxena
Modified:	2019-10-16 06:33 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:33:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Install logs collected on nightly build containing ovn-kubenetes pull#14 (2.19 MB, application/gzip) 2019-08-01 19:42 UTC, Anurag saxena	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 14	'None'	closed	Merge upstream from 2019-07-29	2021-02-05 06:24:06 UTC
Github	openshift ovn-kubernetes pull 17	'None'	closed	Revert "gateway/local: stop double-NAT-ing"	2021-02-05 06:24:06 UTC
Github	ovn-org ovn-kubernetes pull 771	'None'	closed	master: don't block waiting for gateway load balancer creation	2021-02-05 06:24:06 UTC
Red Hat Product Errata	RHBA-2019:2922	None	None	None	2019-10-16 06:33:39 UTC

Description Anurag saxena 2019-07-11 11:51:07 UTC

Description of problem: I am not sure how to collect more logs on this problem as the install fails midway. FOllowing is --log-level=debug excerpt

INFO Waiting up to 30m0s for the cluster at https://api.anusaxen-ovntest.qe.devcluster.openshift.com:6443 to initialize... 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-07-11-023129: 98% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-07-11-023129: 99% complete 
DEBUG Still waiting for the cluster to initialize: Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available 
DEBUG Still waiting for the cluster to initialize: Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available 
FATAL failed to initialize the cluster: Cluster operator dns is reporting a failure: Not all desired DNS DaemonSets available 


Version-Release number of selected component (if applicable):4.2.0-0.nightly-2019-07-11-023129


How reproducible: Always


Steps to Reproduce:
1.Extract build installer
2. ./openshift create install-config
3.change networkType set to OVNKubernetes
4../openshift create cluster

Actual results:Cluster fails to come up

Expected results: Expecting cluster to come up when networkType set to OVNKubernetes


Additional info:
# cat install-config.yaml 
apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: anusaxen-ovn
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
.
.
.

Comment 1 Anurag saxena 2019-07-12 03:38:32 UTC

Just to make sure cluster installs successfully when the networkType reverted back to OpenshiftSDN. Thanks!

Comment 2 Anurag saxena 2019-07-16 08:43:03 UTC

Apparently we can't test OVN unless this bug is fixed. I think the priority should be medium at least. Thanks! Let us know if dev thinks otherwise.

Comment 3 Wei Sun 2019-07-16 08:55:27 UTC

Per the #comment2,add testblocker keyword since it's blocking OVN related test.

Comment 4 Anurag saxena 2019-07-16 08:57:51 UTC

(In reply to Wei Sun from comment #3)
> Per the #comment2,add testblocker keyword since it's blocking OVN related
> test.

Right, thanks Wei

Comment 5 Dan Winship 2019-07-16 14:45:36 UTC

This is similar to https://github.com/ovn-org/ovn-kubernetes/issues/531.

The DNS pods are marked:

      tolerations:
      # tolerate all taints so that DNS is always present on all nodes
      - operator: Exists

Thus, the pod may be scheduled to the node before the node is Ready:

Jul 16 14:14:45 ip-10-0-143-236 hyperkube[975]: E0716 14:14:45.155912     975 pod_workers.go:190] Error syncing pod 0fc8b3de-a7d4-11e9-82e6-067f47fb93b4 ("dns-default-fzrs6_openshift-dns(0fc8b3de-a7d4-11e9-82e6-067f47fb93b4)"), skipping: network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network

The ovn-master will get the Pod ADD event and call waitForNodeLogicalSwitch(). But that will only wait so long:

time="2019-07-16T14:15:15Z" level=error msg="timed out waiting for node \"ip-10-0-143-236.us-east-2.compute.internal\" logical switch: timed out waiting for the condition"

Then addLogicalPort() returns without creating the logical port, and (here's the problem) the master will never attempt to call addLogicalPort() again, because it only ever does that when the Pod was first created. So while the node will continue to try to create the pod, it will always fail:

time="2019-07-16T14:43:16Z" level=info msg="Waiting for ADD result for pod openshift-dns/dns-default-fzrs6"
time="2019-07-16T14:43:16Z" level=info msg="Dispatching pod network request &{ADD openshift-dns dns-default-fzrs6 6033ece8aa63d234a8e96f5bf48f83e6ca515aeac1c2115387d1f8938a2b4463 /proc/22043/ns/net eth0 0xc0005a2a00 0xc0001c6420}"
time="2019-07-16T14:43:38Z" level=error msg="failed to get pod annotation - timed out waiting for the condition"

Comment 6 Dan Winship 2019-07-16 17:44:27 UTC

More specifically, the problem is that the node comes up, and the dns and ovn pods are created for it:

  metadata:
    creationTimestamp: "2019-07-16T17:15:55Z"
    name: ovnkube-node-t4jh8

  metadata:
    creationTimestamp: "2019-07-16T17:15:55Z"
    name: dns-default-ltz79

which the kubelet observes and starts dealing with:

  Jul 16 17:15:55 ip-10-0-135-93 hyperkube[973]: I0716 17:15:55.712689     973 kubelet.go:1894] SyncLoop (ADD, "api"): "ovnkube-node-t4jh8_openshift-ovn-kubernetes(5f1d707e-a7ed-11e9-81c9-06eed9bfdac4)"
  Jul 16 17:15:55 ip-10-0-135-93 hyperkube[973]: I0716 17:15:55.746507     973 kubelet.go:1894] SyncLoop (ADD, "api"): "dns-default-ltz79_openshift-dns(5f263e31-a7ed-11e9-81c9-06eed9bfdac4)"

and the ovn master also observes the Pod creation, but does not log anything about it right away. But 30 seconds later it times out waiting for the pod's node's logical switch to have been created:

  time="2019-07-16T17:16:25Z" level=error msg="timed out waiting for node \"ip-10-0-135-93.us-east-2.compute.internal\" logical switch: timed out waiting for the condition"

meanwhile, back on the node, 16 seconds later we see:

  Jul 16 17:16:41 ip-10-0-135-93 hyperkube[973]: I0716 17:16:41.542098     973 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-ovn-kubernetes", Name:"ovnkube-node-t4jh8", UID:"5f1d707e-a7ed-11e9-81c9-06eed9bfdac4", APIVersion:"v1", ResourceVersion:"14775", FieldPath:"spec.containers{ovs-daemons}"}): type: 'Normal' reason: 'Pulled' Successfully pulled image "registry.svc.ci.openshift.org/ocp/4.2-2019-07-15-105756@sha256:94088ffad840bc113c6504c8e2cd09d7353266542f6bf622c4ca4f2856c41ffa"

So the master gave up waiting for ovnkube-node to start before the node even managed to finish pulling the ovn image.


You could argue that the DNS pod is lying and should not actually claim to tolerate all taints, but there's no way for it to say "tolerates all taints except network not ready".

My first thought on fixing this was that the ovnkube master could retroactively process all existing pods on a node after creating the node's logical switch, but this currenty turns out to be hard due to the split between "master" and "controller" in the master.

Comment 7 Casey Callendrello 2019-07-17 13:23:59 UTC

Arguably, OVN should not, itself, contain any sort of timeout or failure. I would expect it to keep trying forever, until the pod is finally enters state CreateContainerFailed. There shouldn't be competing timeouts.

Comment 8 Dan Winship 2019-07-17 13:30:02 UTC

(The timeout was based on the assumption that a non-hostNetwork pod can't have been scheduled to the node until after ovnkube-node has declared the node ready. If that assumption was correct then there's basically like 1 second worth of race condition, and a 30 second timeout is pretty reasonable. But the assumption wasn't correct.)

Comment 9 Anurag saxena 2019-07-30 17:37:43 UTC

Apparently working on the latest nightlies say, 4.2.0-0.nightly-2019-07-30-073644. OVNKubernetes taken successfully and noticed openshift-ovn-kubernetes project created with ovnkube pods. However, CDO is stuck with "Error while reconciling 4.2.0-0.nightly-2019-07-30-073644: the cluster operator dns is degraded"

[root@localhost ocp]# oc get pods -n openshift-dns
NAME                READY   STATUS              RESTARTS   AGE
dns-default-5ngp9   0/2     ContainerCreating   0          80m   <<<<<<<<<<<<< Stuck on worker node
dns-default-8vmk4   2/2     Running             0          83m
dns-default-8x6b7   2/2     Running             0          83m
dns-default-b2x5z   2/2     Running             0          80m
dns-default-d6gzc   0/2     ContainerCreating   0          79m   <<<<<<<<<<<<< Stuck on worker node
dns-default-qsbds   2/2     Running             0          83m

oc describe below complains NetworkNotReady issues

 Type     Reason                  Age                 From                                   Message
  ----     ------                  ----                ----                                   -------
  Normal   Scheduled               81m                 default-scheduler                      Successfully assigned openshift-dns/dns-default-5ngp9 to ip-10-0-129-179.ec2.internal
  Warning  NetworkNotReady         81m (x25 over 81m)  kubelet, ip-10-0-129-179.ec2.internal  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
  Warning  FailedCreatePodSandBox  76m                 kubelet, ip-10-0-129-179.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_dns-default-5ngp9_openshift-dns_f662a953-b2e4-11e9-a51b-0ec73087e0cc_0(c6591ceadfab51e6e021035758fc6e52c870b9c0d0560a2569ab0a0a0d156ab5): CNI request failed with status 400: 'Nil response to CNI request
'
  Warning  FailedCreatePodSandBox  92s (x125 over 74m)  kubelet, ip-10-0-129-179.ec2.internal  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_dns-default-5ngp9_openshift-dns_f662a953-b2e4-11e9-a51b-0ec73087e0cc_0(9a841388203784188b96526e719b5a29f594530008d3b08187082d77b98aba8e): CNI request failed with status 400: 'Nil response to CNI request
'

Digging more though, I guess i will file a separate bug for this but just pasting updated here as well

Comment 10 Dan Winship 2019-07-30 20:19:39 UTC

There haven't been any updates to the ovn-kubernetes images since this was filed. The bug is fixed upstream but we haven't merged the fix into our fork of the repo yet. I'll update this bug when we do.

Comment 11 Anurag saxena 2019-07-30 20:55:11 UTC

Sure. Thanks, Dan.

Comment 12 Dan Winship 2019-07-31 20:30:58 UTC

Fixed in git. Should make it into builds by tomorrow.

Comment 14 Anurag saxena 2019-07-31 21:53:45 UTC

(In reply to Dan Winship from comment #12)
> Fixed in git. Should make it into builds by tomorrow.

Thanks, Dan for the update.

Comment 15 Anurag saxena 2019-08-01 19:42:13 UTC

Created attachment 1599251 [details]
Install logs collected on nightly build containing ovn-kubenetes pull#14

Comment 16 Anurag saxena 2019-08-01 19:42:32 UTC

The latest nightly 4.2.0-0.nightly-2019-08-01-113533 contains the pull #14 is consistently timing out with "FATAL failed to wait for bootstrapping to complete: timed out waiting for the condition" on OVNKubernetes while works okay with OpenShiftSDN. Tried it 3 times to make sure.

I am attaching Install_logs.tar.gz collected during install contains logs pertaining to control-plane, bootstrap,auth etc for reference. Moving bug to assigned for now. Thanks

Comment 19 Dan Winship 2019-08-13 11:56:22 UTC

OK, merged a fix. However, it is possible that the bug in bug 1732598 will also prevent ovn-kubernetes from working. (If so, the symptom would be that the network never becomes ready, with the clusteroperator status reporting that multus-admission-controller hasn't started.)

Comment 21 Anurag saxena 2019-08-13 13:12:28 UTC

(In reply to Dan Winship from comment #19)
> OK, merged a fix. However, it is possible that the bug in bug 1732598 will
> also prevent ovn-kubernetes from working. (If so, the symptom would be that
> the network never becomes ready, with the clusteroperator status reporting
> that multus-admission-controller hasn't started.)

Thanks Dan. Will take a look assuming its Pull #17

Comment 22 Dan Winship 2019-08-13 14:36:07 UTC

*** Bug 1737122 has been marked as a duplicate of this bug. ***

Comment 23 Glenn West 2019-08-13 14:53:25 UTC

Had a successful install using openshift-install-mac-4.2.0-0.ci-2019-08-13-132059.tar.gz 
openshift-install-mac-4.2.0-0.ci-2019-08-13-132059.tar.gz

3 master 3 workers, and ovn.

https://pastebin.com/NFwc6uK3

Bare metal install.

Minor flake on authentication unreleated to ovn issue.

DEBUG OpenShift Installer unreleased-master-1578-gfee8c84ba56ecfda0ceca2db90c6b44f03e7512f-dirty 
DEBUG Built from commit fee8c84ba56ecfda0ceca2db90c6b44f03e7512f 
INFO Waiting up to 30m0s for the Kubernetes API at https://api.gw.lo:6443... 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.201:6443: i/o timeout 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.202:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.200:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.201:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.202:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.203:6443: i/o timeout 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.201:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.200:6443: connect: connection refused 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource 
DEBUG Still waiting for the Kubernetes API: Get https://api.gw.lo:6443/version?timeout=32s: dial tcp 192.168.1.200:6443: connect: connection refused 
INFO API v1.14.0+e116320 up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
DEBUG Bootstrap status: complete                   
INFO It is now safe to remove the bootstrap resources 
X11 forwarding request failed on channel 0
Powering off VM:
DEBUG OpenShift Installer unreleased-master-1578-gfee8c84ba56ecfda0ceca2db90c6b44f03e7512f-dirty 
DEBUG Built from commit fee8c84ba56ecfda0ceca2db90c6b44f03e7512f 
INFO Waiting up to 30m0s for the cluster at https://api.gw.lo:6443 to initialize... 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 81% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: downloading update 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 0% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 1% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 17% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 66% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 80% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 81% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 82% complete 
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Could not update oauthclient "console" (255 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (401 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (366 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (405 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (372 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (382 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (386 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (390 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (392 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (395 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (375 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (378 of 406): the server does not recognize this resource, check extension API servers 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 87% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 88% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 89% complete 
DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress:
* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (401 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (366 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (405 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-image-registry/image-registry" (372 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (382 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (386 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (390 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (392 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (395 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (375 of 406): the server does not recognize this resource, check extension API servers
* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (378 of 406): the server does not recognize this resource, check extension API servers 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 95% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 96% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 96% complete, waiting on authentication, console, marketplace, monitoring, node-tuning, openshift-samples, operator-lifecycle-manager-packageserver, service-catalog-apiserver, service-catalog-controller-manager 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 96% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 97% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 97% complete, waiting on authentication, console, node-tuning, openshift-samples, operator-lifecycle-manager-packageserver, service-catalog-apiserver, service-catalog-controller-manager 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 97% complete, waiting on authentication, console, node-tuning, openshift-samples, operator-lifecycle-manager-packageserver, service-catalog-apiserver, service-catalog-controller-manager 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 99% complete 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 100% complete, waiting on authentication 
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.ci-2019-08-13-132059: 100% complete, waiting on authentication 
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is still updating 
FATAL failed to initialize the cluster: Cluster operator authentication is still updating

Comment 24 Anurag saxena 2019-08-14 20:27:52 UTC

Yep, it looks good to me on latest nightly green 4.2.0-0.nightly-2019-08-14-112500. Verifying this bug based on following observations.

[core@ip-10-0-133-187 ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-08-14-112500   True        False         False      17m
cloud-credential                           4.2.0-0.nightly-2019-08-14-112500   True        False         False      30m
cluster-autoscaler                         4.2.0-0.nightly-2019-08-14-112500   True        False         False      22m
console                                    4.2.0-0.nightly-2019-08-14-112500   True        False         False      20m
dns                                        4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
image-registry                             4.2.0-0.nightly-2019-08-14-112500   True        False         False      22m
ingress                                    4.2.0-0.nightly-2019-08-14-112500   True        False         False      23m
insights                                   4.2.0-0.nightly-2019-08-14-112500   True        False         False      30m
kube-apiserver                             4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
kube-controller-manager                    4.2.0-0.nightly-2019-08-14-112500   True        False         False      27m
kube-scheduler                             4.2.0-0.nightly-2019-08-14-112500   True        False         False      27m
machine-api                                4.2.0-0.nightly-2019-08-14-112500   True        False         False      30m
machine-config                             4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
marketplace                                4.2.0-0.nightly-2019-08-14-112500   True        False         False      24m
monitoring                                 4.2.0-0.nightly-2019-08-14-112500   True        False         False      21m
network                                    4.2.0-0.nightly-2019-08-14-112500   True        False         False      28m
node-tuning                                4.2.0-0.nightly-2019-08-14-112500   True        False         False      26m
openshift-apiserver                        4.2.0-0.nightly-2019-08-14-112500   True        False         False      26m
openshift-controller-manager               4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
openshift-samples                          4.2.0-0.nightly-2019-08-14-112500   True        False         False      18m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2019-08-14-112500   True        False         False      28m
service-ca                                 4.2.0-0.nightly-2019-08-14-112500   True        False         False      29m
service-catalog-apiserver                  4.2.0-0.nightly-2019-08-14-112500   True        False         False      26m
service-catalog-controller-manager         4.2.0-0.nightly-2019-08-14-112500   True        False         False      26m
storage                                    4.2.0-0.nightly-2019-08-14-112500   True        False         False      25m

[core@ip-10-0-133-187 ~]$ oc get pods -n openshift-ovn-kubernetes 
NAME                              READY   STATUS    RESTARTS   AGE
ovnkube-master-7459b7bd9b-h5p7d   4/4     Running   0          31m
ovnkube-node-4lnwf                3/3     Running   0          31m
ovnkube-node-dnchj                3/3     Running   0          25m
ovnkube-node-mc74l                3/3     Running   0          25m
ovnkube-node-spzvp                3/3     Running   0          25m
ovnkube-node-v5vpn                3/3     Running   0          31m
ovnkube-node-vxgj6                3/3     Running   0          31m

[core@ip-10-0-133-187 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-08-14-112500   True        False         18m     Cluster version is 4.2.0-0.nightly-2019-08-14-112500

Comment 25 errata-xmlrpc 2019-10-16 06:33:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.