1748162 – [GCP]Failed to install cluster with OVN network type

Bug 1748162 - [GCP]Failed to install cluster with OVN network type

Summary: [GCP]Failed to install cluster with OVN network type

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Phil Cameron
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1745546 1769136 1774594 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-03 03:28 UTC by zhaozhanqi
Modified:	2020-05-04 11:14 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-04 11:13:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
OVN_logs_GCP (24.46 KB, text/plain) 2019-09-18 10:34 UTC, zhaozhanqi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 313	'None'	closed	Bug 1748162: Shift ovn db ports to 9000-9999 range	2021-01-21 17:28:33 UTC
Github	openshift cluster-network-operator pull 386	'None'	closed	Add "node.kubernetes.io/network-unavailable" toleration to master	2021-01-21 17:28:33 UTC
Github	openshift installer pull 2335	'None'	closed	Bug 1748162: Open GENEVE wherever possible	2021-01-21 17:28:33 UTC
Github	openshift ovn-kubernetes pull 26	'None'	closed	bug 1748162: - [GCP]Failed to install cluster with OVN network type	2021-01-21 17:28:33 UTC
Red Hat Product Errata	RHBA-2020:0581	None	None	None	2020-05-04 11:14:08 UTC

Description zhaozhanqi 2019-09-03 03:28:58 UTC

Description of problem:
Setup the cluster with OVN network type, the cluster cannot be worked. Check the ovn pod:

# oc get pod -n openshift-ovn-kubernetes
NAME                              READY   STATUS             RESTARTS   AGE
ovnkube-master-785b7b768d-mhbhq   0/4     Pending            0          33m
ovnkube-node-bq4d5                1/3     CrashLoopBackOff   9          35m
ovnkube-node-jvv5p                1/3     CrashLoopBackOff   9          35m


Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-02-172410

How reproducible:
always

Steps to Reproduce:
1. install cluster on vsphere with OVN network type
2. check the pod in openshift-ovn-kubernetes, ovn-master pod cannot be scheduled.

  # oc describe pod ovnkube-master-785b7b768d-mhbhq -n openshift-ovn-kubernetes
Name:               ovnkube-master-785b7b768d-mhbhq
Namespace:          openshift-ovn-kubernetes
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               <none>
Labels:             component=network
                    kubernetes.io/os=linux
                    name=ovnkube-master
                    openshift.io/component=network
                    pod-template-hash=785b7b768d
                    type=infra
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/ovnkube-master-785b7b768d
Containers:
  run-ovn-northd:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c1d6e6c987fda3b4ba12af083c0a40f10eef92b1f2711cbf02f677fa61848f8
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/ovnkube.sh
      run-ovn-northd
    Requests:
      cpu:     100m
      memory:  300Mi
    Environment:
      OVN_DAEMONSET_VERSION:     3
      OVN_LOG_NORTHD:            -vconsole:info
      OVN_NET_CIDR:              <set to the key 'net_cidr' of config map 'ovn-config'>  Optional: false
      OVN_SVC_CIDR:              <set to the key 'svc_cidr' of config map 'ovn-config'>  Optional: false
      K8S_NODE:                   (v1:spec.nodeName)
      K8S_APISERVER:             <set to the key 'k8s_apiserver' of config map 'ovn-config'>  Optional: false
      OVN_KUBERNETES_NAMESPACE:  openshift-ovn-kubernetes (v1:metadata.namespace)
    Mounts:
      /etc/openvswitch/ from host-var-lib-ovs (rw)
      /var/lib/openvswitch/ from host-var-lib-ovs (rw)
      /var/run/openvswitch/ from host-var-run-ovs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-controller-token-8dm2j (ro)
  nb-ovsdb:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c1d6e6c987fda3b4ba12af083c0a40f10eef92b1f2711cbf02f677fa61848f8
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/ovnkube.sh
      nb-ovsdb
    Requests:
      cpu:     100m
      memory:  300Mi
    Environment:
      OVN_DAEMONSET_VERSION:     3
      OVN_LOG_NB:                -vconsole:info -vfile:info
      OVN_NET_CIDR:              <set to the key 'net_cidr' of config map 'ovn-config'>  Optional: false
      OVN_SVC_CIDR:              <set to the key 'svc_cidr' of config map 'ovn-config'>  Optional: false
      K8S_NODE:                   (v1:spec.nodeName)
      K8S_APISERVER:             <set to the key 'k8s_apiserver' of config map 'ovn-config'>  Optional: false
      OVN_KUBERNETES_NAMESPACE:  openshift-ovn-kubernetes (v1:metadata.namespace)
    Mounts:
      /etc/openvswitch/ from host-var-lib-ovs (rw)
      /var/lib/openvswitch/ from host-var-lib-ovs (rw)
      /var/run/openvswitch/ from host-var-run-ovs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-controller-token-8dm2j (ro)
  sb-ovsdb:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c1d6e6c987fda3b4ba12af083c0a40f10eef92b1f2711cbf02f677fa61848f8
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/ovnkube.sh
      sb-ovsdb
    Requests:
      cpu:     100m
      memory:  300Mi
    Environment:
      OVN_DAEMONSET_VERSION:     3
      OVN_LOG_SB:                -vconsole:info -vfile:info
      OVN_NET_CIDR:              <set to the key 'net_cidr' of config map 'ovn-config'>  Optional: false
      OVN_SVC_CIDR:              <set to the key 'svc_cidr' of config map 'ovn-config'>  Optional: false
      K8S_NODE:                   (v1:spec.nodeName)
      K8S_APISERVER:             <set to the key 'k8s_apiserver' of config map 'ovn-config'>  Optional: false
      OVN_KUBERNETES_NAMESPACE:  openshift-ovn-kubernetes (v1:metadata.namespace)
    Mounts:
      /etc/openvswitch/ from host-var-lib-ovs (rw)
      /var/lib/openvswitch/ from host-var-lib-ovs (rw)
      /var/run/kubernetes/ from host-var-run-kubernetes (rw)
      /var/run/openvswitch/ from host-var-run-ovs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-controller-token-8dm2j (ro)
  ovnkube-master:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9c1d6e6c987fda3b4ba12af083c0a40f10eef92b1f2711cbf02f677fa61848f8
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/ovnkube.sh
      ovn-master
    Requests:
      cpu:     100m
      memory:  300Mi
    Environment:
      OVN_DAEMONSET_VERSION:     3
      OVN_MASTER:                true
      OVNKUBE_LOGLEVEL:          4
      OVN_NET_CIDR:              <set to the key 'net_cidr' of config map 'ovn-config'>  Optional: false
      OVN_SVC_CIDR:              <set to the key 'svc_cidr' of config map 'ovn-config'>  Optional: false
      K8S_NODE:                   (v1:spec.nodeName)
      K8S_APISERVER:             <set to the key 'k8s_apiserver' of config map 'ovn-config'>  Optional: false
      OVN_KUBERNETES_NAMESPACE:  openshift-ovn-kubernetes (v1:metadata.namespace)
    Mounts:
      /etc/openvswitch/ from host-var-lib-ovs (rw)
      /var/lib/openvswitch/ from host-var-lib-ovs (rw)
      /var/run/kubernetes/ from host-var-run-kubernetes (rw)
      /var/run/openvswitch/ from host-var-run-ovs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-controller-token-8dm2j (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  host-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  host-var-lib-ovs:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/openvswitch
    HostPathType:  
  host-var-run-ovs:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/openvswitch
    HostPathType:  
  host-var-run-kubernetes:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubernetes
    HostPathType:  
  ovn-kubernetes-controller-token-8dm2j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ovn-kubernetes-controller-token-8dm2j
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
                 kubernetes.io/os=linux
                 node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/not-ready:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  46m (x6 over 46m)   default-scheduler  0/2 nodes are available: 2 node(s) were unschedulable.
  Warning  FailedScheduling  37m (x6 over 37m)   default-scheduler  0/2 nodes are available: 2 node(s) were unschedulable.
  Warning  FailedScheduling  34m (x2 over 34m)   default-scheduler  0/2 nodes are available: 2 node(s) were unschedulable.
  Warning  FailedScheduling  13s (x24 over 34m)  default-scheduler  0/2 nodes are available: 2 node(s) were unschedulable.
  
3. oc get node --show-labels
NAME              STATUS                        ROLES    AGE   VERSION             LABELS
compute-0         NotReady,SchedulingDisabled   worker   41m   v1.14.0+2b7562925   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
control-plane-0   NotReady,SchedulingDisabled   master   41m   v1.14.0+2b7562925   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos 
 

Actual results:
cluster cannot be setup with OVN

Expected results:


Additional info:

Comment 1 zhaozhanqi 2019-09-03 09:42:03 UTC

this issue can be reproduced in vsphere and GCP cluster.

Comment 2 zhaozhanqi 2019-09-03 10:55:22 UTC

Have no idea why the node changed the scheduler to disable

When I change the schedule by manual to 'enable' by 'oc adm uncordon', the ovn pod can be scheduled and running

Comment 8 Ben Bennett 2019-09-05 13:43:13 UTC

Bringing this back into the tech preview for ovn-kubernetes in 4.2.

We should support:
 - IPI AWS
 - IPI Azure
 - UPI

If only IPI vSphere and IPI GCP do not work, that is okay as a limitation of the tech preview.

Comment 9 Ben Bennett 2019-09-05 13:56:44 UTC

Note that this means we need to make vSphere UPI work for 4.2.

Comment 10 Dan Williams 2019-09-06 14:47:31 UTC

Whatever cloud is used needs to have the right ports open between machines. We have this for AWS in the installer right now. How does this work for vSphere?

Also, can we get ovnkube master pod/container logs from a failed vSphere install?

Comment 11 zhaozhanqi 2019-09-09 03:04:06 UTC

hi, Dan

When I using OVN as network type to install cluster on vsphere.  the ovn master pod is pending due to all master and worker were marked as 'SchedulingDisabled', I did not find the reason why and which step make it. I tried many times and get same result.
 

[root@dhcp-140-66 ~]# oc get node
NAME              STATUS                        ROLES    AGE   VERSION
compute-0         NotReady,SchedulingDisabled   worker   47m   v1.14.6+82219910a
control-plane-0   NotReady,SchedulingDisabled   master   47m   v1.14.6+82219910a


When I tried to change it to 'scheduleing' by ` oc adm uncordon control-plane-0`, the OVN pod can be running

# oc get pod -n openshift-ovn-kubernetes
NAME                              READY   STATUS    RESTARTS   AGE
ovnkube-master-78c6798568-x9sbv   4/4     Running   2          56m
ovnkube-node-ggpxc                3/3     Running   13         58m
ovnkube-node-h5bt6                3/3     Running   13         58m

but seems the other components of cluster cannot be started up:

# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                           4.2.0-0.nightly-2019-09-08-180038   True        False         False      56m
dns                                        4.2.0-0.nightly-2019-09-08-180038   True        False         False      55m
insights                                   4.2.0-0.nightly-2019-09-08-180038   True        True          False      56m
kube-apiserver                             4.2.0-0.nightly-2019-09-08-180038   True        False         False      55m
kube-controller-manager                    4.2.0-0.nightly-2019-09-08-180038   False       True          False      56m
kube-scheduler                             4.2.0-0.nightly-2019-09-08-180038   False       True          False      56m
machine-api                                4.2.0-0.nightly-2019-09-08-180038   True        False         False      56m
machine-config                             4.2.0-0.nightly-2019-09-08-180038   False       True          False      56m
network                                    4.2.0-0.nightly-2019-09-08-180038   True        False         False      7m58s
openshift-apiserver                        4.2.0-0.nightly-2019-09-08-180038   False       False         False      55m
openshift-controller-manager                                                   False       True          False      56m
operator-lifecycle-manager                 4.2.0-0.nightly-2019-09-08-180038   True        True          False      55m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2019-09-08-180038   True        True          False      55m
operator-lifecycle-manager-packageserver                                       False       True          False      55m
service-ca                                 4.2.0-0.nightly-2019-09-08-180038   True        True          False      56m



# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          58m     Unable to apply 4.2.0-0.nightly-2019-09-08-180038: an unknown error has occurred
[root@dhcp-140-66 ~]# oc get clusterversion -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-09-09T02:01:50Z"
    generation: 1
    name: version
    resourceVersion: "11439"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: c96109c3-d2a5-11e9-86b6-0050568b99b8
  spec:
    channel: stable-4.2
    clusterID: df5120e1-96f0-408a-b518-8af75a89aa5b
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-09-09T02:02:06Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-09-09T02:58:08Z"
      message: |-
        Multiple errors are preventing progress:
        * Could not update oauthclient "console" (263 of 416): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (214 of 416): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (411 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (376 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (8 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (415 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (382 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (393 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (397 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (401 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (152 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-machine-api/machine-api-operator" (403 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (405 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (385 of 416): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (388 of 416): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-09-09T02:02:06Z"
      message: 'Unable to apply 4.2.0-0.nightly-2019-09-08-180038: an unknown error
        has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-09-09T02:02:06Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.2.0-0.nightly-2019-09-08-180038 not found in the "stable-4.2" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: registry.svc.ci.openshift.org/ocp/release@sha256:7862f9777e846c23fefeac77dc58c8107616acd65707c8437d06d29d2e4990ad
      version: 4.2.0-0.nightly-2019-09-08-180038
    history:
    - completionTime: null
      image: registry.svc.ci.openshift.org/ocp/release@sha256:7862f9777e846c23fefeac77dc58c8107616acd65707c8437d06d29d2e4990ad
      startedTime: "2019-09-09T02:02:06Z"
      state: Partial
      verified: false
      version: 4.2.0-0.nightly-2019-09-08-180038
    observedGeneration: 1
    versionHash: MBWmxYuYaYQ=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 12 Casey Callendrello 2019-09-09 15:26:49 UTC

I've filed the PR to open the ports for GCP and AWS IPI.

This is not the same bug as whatever is wrong with vsphere - it doesn't have an intra-cluster firewall.

Zhanqi, could you please re-test with vsphere and open a separate bug (And keep the cluster up - I don't have a vsphere cluster handy...).

Comment 13 zhaozhanqi 2019-09-10 03:05:15 UTC

Thanks. Casey.  I will file another bug for vsphere

Comment 16 Casey Callendrello 2019-09-17 12:31:25 UTC

Assigning to Phil, who is actually working on this.

Comment 21 zhaozhanqi 2019-09-18 10:34:38 UTC

Created attachment 1616172 [details]
OVN_logs_GCP

Comment 27 Ben Bennett 2019-09-19 13:07:50 UTC

GCP is not a supported platform for OVN on 4.2.  Bumping.

Comment 29 Phil Cameron 2019-09-19 14:39:50 UTC

@zhaozhanqi comment #25 oc get all --all-namespaces. Everything that targets a master is pending. There is something that is preventing pods from starting on master. I can't access the cluster any more.

kube-system                                  pod/gcp-routes-controller-zhaoov-9jl59-m-0.c.openshift-qe.internal   1/1     Running   0          9h
kube-system                                  pod/gcp-routes-controller-zhaoov-9jl59-m-1.c.openshift-qe.internal   1/1     Running   0          9h
kube-system                                  pod/gcp-routes-controller-zhaoov-9jl59-m-2.c.openshift-qe.internal   1/1     Running   0          9h
openshift-apiserver-operator                 pod/openshift-apiserver-operator-6f45554457-b576t                    0/1     Pending   0          9h
openshift-cloud-credential-operator          pod/cloud-credential-operator-7b4c65dbd5-kmrjf                       0/1     Pending   0          9h
openshift-cluster-machine-approver           pod/machine-approver-7bf6885dff-rp9hm                                0/1     Pending   0          9h
openshift-cluster-version                    pod/cluster-version-operator-bf9c75cc4-4djcs                         0/1     Pending   0          9h
openshift-controller-manager-operator        pod/openshift-controller-manager-operator-7c474d6cfc-wcvxf           0/1     Pending   0          9h
openshift-dns-operator                       pod/dns-operator-79dbd8d86f-v67xv                                    0/1     Pending   0          9h
openshift-etcd                               pod/etcd-member-zhaoov-9jl59-m-0.c.openshift-qe.internal             2/2     Running   0          9h
openshift-etcd                               pod/etcd-member-zhaoov-9jl59-m-1.c.openshift-qe.internal             2/2     Running   0          9h
openshift-etcd                               pod/etcd-member-zhaoov-9jl59-m-2.c.openshift-qe.internal             2/2     Running   0          9h
openshift-insights                           pod/insights-operator-646489b44d-jcdp8                               0/1     Pending   0          9h
openshift-kube-apiserver-operator            pod/kube-apiserver-operator-65fc497c9-pbwp6                          0/1     Pending   0          9h
openshift-kube-controller-manager-operator   pod/kube-controller-manager-operator-7f65ffd9b9-66rth                0/1     Pending   0          9h
openshift-kube-scheduler-operator            pod/openshift-kube-scheduler-operator-75bd9d6b59-v5rcp               0/1     Pending   0          9h
openshift-machine-api                        pod/machine-api-operator-7f496594d4-mcxh2                            0/1     Pending   0          9h
openshift-machine-config-operator            pod/machine-config-operator-55f5c9d548-m9qrw                         0/1     Pending   0          9h
openshift-multus                             pod/multus-58wt5                                                     1/1     Running   59         9h
openshift-multus                             pod/multus-snp52                                                     1/1     Running   59         9h
openshift-multus                             pod/multus-zxlcj                                                     1/1     Running   59         9h
openshift-network-operator                   pod/network-operator-74b8d64fc5-gncfc                                1/1     Running   1          9h
openshift-operator-lifecycle-manager         pod/catalog-operator-57b6884cd6-gqvg5                                0/1     Pending   0          9h
openshift-operator-lifecycle-manager         pod/olm-operator-7554464b74-kstts                                    0/1     Pending   0          9h
openshift-ovn-kubernetes                     pod/ovnkube-master-644d65f44-zwmpt                                   0/4     Pending   0          9h
openshift-ovn-kubernetes                     pod/ovnkube-node-2gbxg                                               2/3     Running   94         9h
openshift-ovn-kubernetes                     pod/ovnkube-node-6qg4r                                               2/3     Running   94         9h
openshift-ovn-kubernetes                     pod/ovnkube-node-wfld8                                               2/3     Running   94         9h
openshift-service-ca-operator                pod/service-ca-operator-674ccdc57d-55cq7                             0/1     Pending   0          9h

Comment 40 Anurag saxena 2019-11-05 22:02:51 UTC

@Phil, i am afraid it also failed on 4.3.0-0.nightly-2019-11-01-215341 with same reasoning as in comment 37 and comment 38.
Did it work in your local env?

Comment 42 Casey Callendrello 2019-11-06 10:48:45 UTC

*** Bug 1769136 has been marked as a duplicate of this bug. ***

Comment 53 Anurag saxena 2019-11-11 15:31:30 UTC

Moving it to Assigned state as per ongoing comments

Comment 55 Casey Callendrello 2019-11-21 13:39:03 UTC

*** Bug 1774594 has been marked as a duplicate of this bug. ***

Comment 56 Phil Cameron 2019-11-25 20:09:04 UTC

https://github.com/openshift/cluster-network-operator/pull/396/
change getent hosts to getent ahostsv4

Previous was grabbing ipv6 addresses and miss handling them. This version is ipv4 only.

Comment 57 Phil Cameron 2019-11-25 20:11:40 UTC

Problem oc commands don't work on 4.3 from laptop, do work when ssh to bootstrap node.
openshift-install-linux-4.4.0-0.ci-2019-11-25-114444.tar.gz works.
ovn still doesn't come up but we can once again work with debug images.

Comment 58 Phil Cameron 2019-11-25 20:36:32 UTC

ovnkube-node fails with:
time="2019-11-25T16:08:05Z" level=error msg="Error while obtaining addresses for k8s-pcamer-qkfnm-m-0.c.openshift-gce-devel.internal on node pcamer-qkfnm-m-0.c.openshift-gce-devel.internal - Error while obtaining dynamic addresses for k8s-pcamer-qkfnm-m-0.c.openshift-gce-devel.internal: OVN command '/usr/bin/ovn-nbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:10.0.0.6:9641,ssl:10.0.0.5:9641,ssl:10.0.0.3:9641 --timeout=15 get logical_switch_port k8s-pcamer-qkfnm-m-0.c.openshift-gce-devel.internal dynamic_addresses' failed: exit status 1"

rsh onto the container and the command works. oc delete the pod and it comes back up correctly. Looking like a race. Continuing debug...

Comment 59 Anurag saxena 2019-11-25 21:11:48 UTC

Thank you, Phil for the update.

Comment 64 zhaozhanqi 2020-01-17 06:14:02 UTC

Verified this bug on 4.4.0-0.nightly-2020-01-16-113546

OVN can be installed in GCP cluster.

Comment 65 Aniket Bhat 2020-01-21 21:04:39 UTC

*** Bug 1745546 has been marked as a duplicate of this bug. ***

Comment 67 errata-xmlrpc 2020-05-04 11:13:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.