Description of problem: OCP 4.6 latest build install is failing during the bootstrap process due to an issue with the openshift-multus/network-metrics-daemon startup Version-Release number of selected component (if applicable): 4.6 How reproducible: Noticed same error with below 2 builds. 4.6.0-0.nightly-ppc64le-2020-09-14-031547 4.6.0-0.nightly-ppc64le-2020-09-13-000045 Steps to Reproduce: 1. 2. 3. Actual results: "stderr": "level=info msg=\"Waiting up to 20m0s for the Kubernetes API at https://api.test-arc46.redhat.com:6443...\"\nlevel=info msg=\"API v1.19.0+35ab7c5 up\"\nlevel=info msg=\"Waiting up to 30m0s for bootstrapping to complete...\"\nW0914 02:15:58.095544 14636 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received\nE0914 02:15:59.097186 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:00.098748 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:01.105320 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:02.106922 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:03.108358 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:04.109853 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:05.111416 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:06.113125 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:07.114638 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nE0914 02:16:08.116268 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF\nlevel=info msg=\"Cluster operator network Progressing is True with Deploying: DaemonSet \\\"openshift-multus/network-metrics-daemon\\\" is not available (awaiting 3 nodes)\\nDaemonSet \\\"openshift-multus/multus-admission-controller\\\" is not available (awaiting 3 nodes)\"\nlevel=info msg=\"Use the following commands to gather logs from the cluster\"\nlevel=info msg=\"openshift-install gather bootstrap --help\"\nlevel=fatal msg=\"failed to wait for bootstrapping to complete: timed out waiting for the condition\"", module.install.null_resource.install (remote-exec): "stderr_lines": [ module.install.null_resource.install (remote-exec): "level=info msg=\"Waiting up to 20m0s for the Kubernetes API at https://api.test-arc46.redhat.com:6443...\"", module.install.null_resource.install (remote-exec): "level=info msg=\"API v1.19.0+35ab7c5 up\"", module.install.null_resource.install (remote-exec): "level=info msg=\"Waiting up to 30m0s for bootstrapping to complete...\"", module.install.null_resource.install (remote-exec): "W0914 02:15:58.095544 14636 reflector.go:326] k8s.io/client-go/tools/watch/informerwatcher.go:146: watch of *v1.ConfigMap ended with: very short watch: k8s.io/client-go/tools/watch/informerwatcher.go:146: Unexpected watch close - watch lasted less than a second and no items received", module.install.null_resource.install (remote-exec): "E0914 02:15:59.097186 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:00.098748 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:01.105320 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:02.106922 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:03.108358 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:04.109853 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:05.111416 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:06.113125 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:07.114638 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "E0914 02:16:08.116268 14636 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list *v1.ConfigMap: Get \"https://api.test-arc46.redhat.com:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0\": EOF", module.install.null_resource.install (remote-exec): "level=info msg=\"Cluster operator network Progressing is True with Deploying: DaemonSet \\\"openshift-multus/network-metrics-daemon\\\" is not available (awaiting 3 nodes)\\nDaemonSet \\\"openshift-multus/multus-admission-controller\\\" is not available (awaiting 3 nodes)\"", Expected results: The openshift-multus/network-metrics-daemon should start up successfully. Additional info: oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication cloud-credential True False False 4h1m cluster-autoscaler config-operator console csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network 4.6.0-0.nightly-ppc64le-2020-09-14-031547 True True False 3h46m node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 3h48m v1.19.0+35ab7c5 master-1 Ready master 3h47m v1.19.0+35ab7c5 master-2 Ready master 3h48m v1.19.0+35ab7c5 oc get pods --all-namespaces | grep -vi completed | grep -vi running NAMESPACE NAME READY STATUS RESTARTS AGE openshift-apiserver-operator openshift-apiserver-operator-844999d7f6-jm4qb 0/1 CrashLoopBackOff 40 3h42m openshift-authentication-operator authentication-operator-777dcb644d-tdkk7 0/1 CrashLoopBackOff 40 3h42m openshift-cluster-machine-approver machine-approver-548d59dc7-jvsj8 0/2 ContainerCreating 0 3h42m openshift-cluster-node-tuning-operator cluster-node-tuning-operator-6f9659cc44-6smkh 0/1 CrashLoopBackOff 44 3h42m openshift-cluster-storage-operator cluster-storage-operator-98795794c-46xxd 0/1 Error 40 3h42m openshift-cluster-storage-operator csi-snapshot-controller-operator-6c94d485cd-sx79b 0/1 CrashLoopBackOff 40 3h42m openshift-cluster-version cluster-version-operator-56588d8f8d-zs5x4 0/1 ContainerCreating 0 3h42m openshift-controller-manager-operator openshift-controller-manager-operator-74fd7c5476-scxz8 0/1 CrashLoopBackOff 40 3h42m openshift-dns-operator dns-operator-76f898d794-22q8h 0/2 ContainerCreating 0 3h42m openshift-etcd-operator etcd-operator-655c8f58c7-4d69m 0/1 CrashLoopBackOff 40 3h42m openshift-kube-apiserver-operator kube-apiserver-operator-5c776b8cf6-kflvc 0/1 CrashLoopBackOff 40 3h42m openshift-kube-controller-manager-operator kube-controller-manager-operator-84b8676ff7-2hwdz 0/1 CrashLoopBackOff 40 3h42m openshift-kube-scheduler-operator openshift-kube-scheduler-operator-859ccd48db-8z9mq 0/1 CrashLoopBackOff 40 3h42m openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-5995764d6b-xrvdj 0/1 CrashLoopBackOff 40 3h42m openshift-multus multus-admission-controller-dj7mk 0/2 ContainerCreating 0 3h25m openshift-multus multus-admission-controller-pw9rj 0/2 ContainerCreating 0 3h25m openshift-multus multus-admission-controller-ztwrp 0/2 ContainerCreating 0 3h26m openshift-multus network-metrics-daemon-4b5g4 0/2 ContainerCreating 0 3h27m openshift-multus network-metrics-daemon-cgrhz 0/2 ContainerCreating 0 3h26m openshift-multus network-metrics-daemon-xgbjd 0/2 ContainerCreating 0 3h27m openshift-operator-lifecycle-manager catalog-operator-7876fb8df9-2dv6x 0/1 ContainerCreating 0 3h42m openshift-operator-lifecycle-manager olm-operator-66cbb8c4c9-p7zxm 0/1 ContainerCreating 0 3h42m openshift-service-ca-operator service-ca-operator-745cb9db56-ndf69 0/1 Error 40 3h42m oc describe pod multus-admission-controller-dj7mk -n openshift-multus Name: multus-admission-controller-dj7mk Namespace: openshift-multus Priority: 2000000000 Priority Class Name: system-cluster-critical Node: master-1/9.114.98.125 Start Time: Mon, 14 Sep 2020 02:12:12 -0400 Labels: app=multus-admission-controller component=network controller-revision-hash=764b48c4f5 namespace=openshift-multus openshift.io/component=network pod-template-generation=1 type=infra Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: DaemonSet/multus-admission-controller Containers: multus-admission-controller: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5c1e97d5fccb61d21128deb7fea015485530e778fbf32f5547078f1b7f640f51 Image ID: Port: 9091/TCP Host Port: 0/TCP Command: /usr/bin/webhook Args: -bind-address=0.0.0.0 -port=6443 -tls-private-key-file=/etc/webhook/tls.key -tls-cert-file=/etc/webhook/tls.crt -alsologtostderr=true -metrics-listen-address=127.0.0.1:9091 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m Environment: <none> Mounts: /etc/webhook from webhook-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from multus-token-sl9w7 (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f947b498107b67350f1725e1044b4997ba47c1bf7788aca64d46077e99df6dc Image ID: Port: 8443/TCP Host Port: 0/TCP Args: --logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9091/ --tls-private-key-file=/etc/webhook/tls.key --tls-cert-file=/etc/webhook/tls.crt State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/webhook from webhook-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from multus-token-sl9w7 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: webhook-certs: Type: Secret (a volume populated by a Secret) SecretName: multus-admission-controller-secret Optional: false multus-token-sl9w7: Type: Secret (a volume populated by a Secret) SecretName: multus-token-sl9w7 Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 39m (x61 over 3h27m) kubelet Unable to attach or mount volumes: unmounted volumes=[webhook-certs], unattached volumes=[webhook-certs multus-token-sl9w7]: timed out waiting for the condition Warning FailedMount 18m (x16 over 3h22m) kubelet Unable to attach or mount volumes: unmounted volumes=[webhook-certs], unattached volumes=[multus-token-sl9w7 webhook-certs]: timed out waiting for the condition Warning FailedMount 3m43s (x109 over 3h29m) kubelet MountVolume.SetUp failed for volume "webhook-certs" : secret "multus-admission-controller-secret" not found oc get secrets -n openshift-multus NAME TYPE DATA AGE default-token-7bzbf kubernetes.io/service-account-token 3 3h31m metrics-daemon-sa-token-sfnmd kubernetes.io/service-account-token 3 3h31m multus-token-sl9w7 kubernetes.io/service-account-token 3 3h31m
This issue is noticed on the powervm environment.
Created attachment 1714830 [details] bootstrap gather logs
Created attachment 1714831 [details] Rendered assets logs
Created attachment 1714832 [details] Resources json
Created attachment 1714834 [details] master-1 logs
Created attachment 1714835 [details] master-2 logs
Created attachment 1714836 [details] master 3 logs
Looks like the cluster-policy-controller on the bootstrap is crashing with this exception: goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc00083bc01, 0xc0001da1c0, 0x7b, 0xd1) k8s.io/klog/v2.0/klog.go:996 +0xac k8s.io/klog/v2.(*loggingT).output(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0xc000a137a0, 0x12f03a66, 0x6, 0x37, 0x0) k8s.io/klog/v2.0/klog.go:945 +0x1b4 k8s.io/klog/v2.(*loggingT).printDepth(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0x1, 0xc0008dbb30, 0x1, 0x1) k8s.io/klog/v2.0/klog.go:718 +0x128 k8s.io/klog/v2.(*loggingT).print(...) k8s.io/klog/v2.0/klog.go:703 k8s.io/klog/v2.Fatal(...) k8s.io/klog/v2.0/klog.go:1443 github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller.NewClusterPolicyControllerCommand.func1(0xc0001c5080, 0xc00087b4a0, 0x0, 0x5) github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller/cmd.go:55 +0x31c github.com/spf13/cobra.(*Command).execute(0xc0001c5080, 0xc00087b450, 0x5, 0x5, 0xc0001c5080, 0xc00087b450) github.com/spf13/cobra.0/command.go:846 +0x208 github.com/spf13/cobra.(*Command).ExecuteC(0xc0001c4840, 0xc0001c4840, 0x0, 0x0) github.com/spf13/cobra.0/command.go:950 +0x294 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra.0/command.go:887 main.main() github.com/openshift/cluster-policy-controller/cmd/cluster-policy-controller/main.go:67 +0x298 goroutine 19 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x12fe49e0) k8s.io/klog/v2.0/klog.go:1131 +0x78 created by k8s.io/klog/v2.init.0 k8s.io/klog/v2.0/klog.go:416 +0xe0 goroutine 92 [syscall]: os/signal.signal_recv(0x0) runtime/sigqueue.go:147 +0xf8 os/signal.loop() os/signal/signal_unix.go:23 +0x24 created by os/signal.Notify.func1 os/signal/signal.go:127 +0x4c goroutine 94 [chan receive]: k8s.io/apiserver/pkg/server.SetupSignalContext.func1(0xc00096bd40) k8s.io/apiserver.0-rc.2/pkg/server/signal.go:48 +0x38 created by k8s.io/apiserver/pkg/server.SetupSignalContext k8s.io/apiserver.0-rc.2/pkg/server/signal.go:47 +0xf0 goroutine 95 [select]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x11c185a0, 0x11e8e920, 0xc000802030, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:167 +0x120 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x11c185a0, 0x12a05f200, 0x0, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:133 +0x90 k8s.io/apimachinery/pkg/util/wait.Until(...) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:90 k8s.io/apimachinery/pkg/util/wait.Forever(0x11c185a0, 0x12a05f200) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:81 +0x50 created by k8s.io/component-base/logs.InitLogs k8s.io/component-base.0-rc.2/logs/logs.go:58 +0x88
ah..seems like that was the side effect - the fatal exception was because of : 16:14:22.617615 1 cmd.go:55] open /etc/kubernetes/config/cluster-policy-config.yaml: no such file or directory
On the bootstrap node, [root@bootstrap core]# ls -l /etc/kubernetes/bootstrap-configs total 16 -rw-r--r--. 1 root root 6635 Sep 14 05:54 kube-apiserver-config.yaml -rw-r--r--. 1 root root 1341 Sep 14 05:54 kube-controller-manager-config.yaml -rw-r--r--. 1 root root 81 Sep 14 05:54 kube-scheduler-config.yaml [root@bootstrap core]# pwd /var/home/core
[root@bootstrap core]# ls -l /etc/kubernetes/ total 16 drwxr-xr-x. 2 root root 117 Sep 14 05:54 bootstrap-configs drwxr-xr-x. 2 root root 6 Sep 14 05:53 bootstrap-manifests drwx------. 2 root root 4096 Sep 15 13:59 bootstrap-secrets drwxr-xr-x. 2 root root 6 Sep 14 05:54 cloud drwxr-xr-x. 3 root root 19 Sep 1 03:51 cni -rw-------. 1 root root 5795 Sep 14 05:54 kubeconfig -rw-r--r--. 1 root root 146 Sep 14 05:53 kubelet-pause-image-override drwxr-xr-x. 3 root root 20 Sep 14 05:53 kubelet-plugins drwxr-xr-x. 2 root root 252 Sep 15 13:59 manifests drwxr-xr-x. 3 root root 25 Sep 14 05:54 static-pod-resources
ignore previous analysis..looks like that error occurs but the installation still proceeds. what i do see on the master node's sdn pods though is concerning: I0914 06:12:20.799579 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:20.925805 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:20.926133 1996 proxier.go:349] userspace syncProxyRules took 49.44885ms I0914 06:12:21.081209 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:21.082842 1996 proxier.go:349] userspace syncProxyRules took 50.442527ms I0914 06:12:21.306427 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:21.937474 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:22.724906 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:23.707811 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:24.935827 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:26.469006 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:28.382423 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:30.772801 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:33.759059 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface F0914 06:12:33.759135 1996 healthcheck.go:99] SDN healthcheck detected unhealthy OVS server, restarting: plugin is not setup
ovsdb server fails to start Failed to connect to bus: No data available openvswitch is running in container /etc/openvswitch/conf.db does not exist ... (warning). Creating empty database /etc/openvswitch/conf.db. ovsdb-server: /var/run/openvswitch/ovsdb-server.pid: pidfile check failed (No such process), aborting Starting ovsdb-server ... failed!
Added network dev from POWER team Mick and squad lead Manju.
Noticing no space errors on the interfaces. Below is the data from one of the master nodes. The RHCOS46 images had 20GB disk size. [root@master-2 core]# ovs-vsctl list interface _uuid : eea736c0-7b57-4e1e-aa18-dfc6f20900db admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth657700ea to ofproto (No space left on device)" external_ids : {ip="10.128.0.9", sandbox="0e899417fbcbd8b49c511fcbe273ccd2ec07520f94298a71024b3ee4b6b79661"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth657700ea ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 01c25001-64ca-44c4-bdaa-fcb63a7d9de1 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device vethf20e8e29 to ofproto (No space left on device)" external_ids : {ip="10.128.0.15", sandbox="130cab3a6214363bdb4994202336fd267ddd9a4a55d9a4a7fac706806f46f026"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : vethf20e8e29 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 51dae8f1-ccee-4752-af78-1861dd366e30 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth6cec1255 to ofproto (No space left on device)" external_ids : {ip="10.128.0.6", sandbox=b6bc118838dce0ca5252d8efe90172a354ebeedb4c59158b5da4b03c38f5827a} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth6cec1255 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 15a1fb82-79a2-4b10-9192-b4f09cfad6be admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 25 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1a:81:c6:da:2a:09" mtu : [] mtu_request : [] name : vxlan0 ofport : 1 ofport_request : 1 options : {dst_port="4789", key=flow, remote_ip=flow} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {} type : vxlan _uuid : 3752b713-ecab-4798-8167-810bdbf8bdfb admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 27 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "02:eb:92:6e:4e:87" mtu : 1450 mtu_request : 1450 name : tun0 ofport : 2 ofport_request : 2 options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=125108, tx_dropped=0, tx_errors=0, tx_packets=2968} status : {driver_name=openvswitch} type : internal _uuid : cf1d8504-e77a-41c4-8bdd-b265c3a2ba67 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth05b2efc4 to ofproto (No space left on device)" external_ids : {ip="10.128.0.18", sandbox=dce0bbe07172b35471abf89641c28d0ec9dcd4e26fd9f671eaaf03cd6ca62ee0} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth05b2efc4 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 1d1ecee5-7e76-4b50-aa67-a637ecdf3438 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth5f6cb8b8 to ofproto (No space left on device)" external_ids : {ip="10.128.0.11", sandbox="7bfaf063fb454194c9d02fd74d3bcbb90c6a68967ae28e6a2457b5929bd5e990"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth5f6cb8b8 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : f6bb3176-06fb-492b-9229-ffa0b901c029 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth9dd59021 to ofproto (No space left on device)" external_ids : {ip="10.128.0.10", sandbox="04af944f504e0dbfa1079b438537c655e4c63b7389266b1ea3083a55b31684cf"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth9dd59021 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : e6c5d907-3957-4d00-961c-baf4e63e2257 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth73694404 to ofproto (No space left on device)" external_ids : {ip="10.128.0.8", sandbox="197847dbee9666f86dde221c719358228109d39b8444aa3172195eb8ce10dfb1"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth73694404 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 3f5e0158-b9f6-4f2f-8cac-c30c84e96906 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth64602b83 to ofproto (No space left on device)" external_ids : {ip="10.128.0.12", sandbox="47548db0199e0a59ac7fc86397e9db8877b4318a64ef2b142e268e8153c0f2bc"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth64602b83 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : cf23fbd8-64ec-4027-8670-4a8bf5dc42bf admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth94fb7f81 to ofproto (No space left on device)" external_ids : {ip="10.128.0.14", sandbox=ab7f309f6c0e65f32a1515685da90189a3d09521036600e1be0b811f5518e999} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth94fb7f81 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 528ed08d-970e-4620-9654-b9dc01752df6 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device vethc71f3834 to ofproto (No space left on device)" external_ids : {ip="10.128.0.13", sandbox="4b5145dcd172516b36a35b377df2f34765841f378b6ce6d58c82fdecb14b9a0a"} ifindex : [] Since it complained about space, I retried using a bigger image of RHCOS46.82 with 120GB and reran the install with latest build. Again, I see same failures as discussed in this bug. Even the network interface attach fails with no space on device as shown above.
@psundra is there any reason you re-targeted this to 4.7. We need this resolved for 4.6 and it is impacting our ability to complete OCP 4.6 testing in sprint 1.
@lmcfadde that was a mistake . it should be 4.6. After some debugging and talking to Mick Tarsel looks like a ovs issue. He will update with more details.
Created attachment 1715123 [details] ovs-vswitchd logs on master-2 So with Prashanth's help we narrowed down the error to the openshift-sdn pod on master-2. Something appears to be wrong with openvswitch and openflow. We then looked at the sdn container logs via crictl which showed a lot of... I0916 17:17:46.160062 193200 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) (I can attach these logs if needed). Looks like there is a problem executing OpenFlow commands from here: https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/ovs/ovs.go#L155 So I looked at the ovs-vswitchd logs in /var/log/openvswitch on master-2 which is attached to this comment. Things start getting funky right after br0 is created (at top of log) 2020-09-16T14:09:44.182Z|00029|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:09:44.183Z|00030|bridge|INFO|bridge br0: using datapath ID 0000e64086f8f744 2020-09-16T14:09:44.183Z|00031|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt" 2020-09-16T14:09:44.291Z|00032|bridge|INFO|bridge br0: added interface vxlan0 on port 1 2020-09-16T14:09:44.325Z|00033|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:09:44.331Z|00034|bridge|INFO|bridge br0: added interface tun0 on port 2 And later on… 2020-09-16T14:10:44.543Z|00042|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.548Z|00043|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.650Z|00044|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:44.655Z|00045|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.661Z|00046|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.764Z|00047|bridge|INFO|bridge br0: deleted interface br0 on port 65534 Right after br0 is deleted, it’s created again and then it goes into this loop of adding and deleting the tun0 interface from br0 all while vxlan_sys_4789 is still not found. 2020-09-16T14:10:44.854Z|00048|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:10:44.855Z|00001|dpif(revalidator6)|WARN|system@ovs-system: failed to flow_get (No such file or directory) recirc_id(0),dp_hash(0),skb_priority(0),in_port(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), packets:0, bytes:0, used:never 2020-09-16T14:10:44.855Z|00002|ofproto_dpif_upcall(revalidator6)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (No such file or directory): ufid:4cd769d5-92b6-4acd-80b3-5aa16991b6d8 2020-09-16T14:10:44.858Z|00049|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.865Z|00050|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.868Z|00051|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.959Z|00052|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on tun0 device failed: No such device 2020-09-16T14:10:44.960Z|00053|bridge|WARN|could not add network device tun0 to ofproto (No such device) 2020-09-16T14:10:44.962Z|00054|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.078Z|00055|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.079Z|00056|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.086Z|00057|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.089Z|00058|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.205Z|00059|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.208Z|00060|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.294Z|00061|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.300Z|00062|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.300Z|00063|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.397Z|00064|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.399Z|00065|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:47.905Z|00066|timeval|WARN|Unreasonably long 1048ms poll interval (5ms user, 7ms system) 2020-09-16T14:10:47.905Z|00067|timeval|WARN|context switches: 14 voluntary, 7 involuntary So looking at the master-2 interfaces, things get more weird. There are no interfaces on br0 at all. The veth interfaces should be there which are present on the machine: [root@master-2 core]# ovs-vsctl show def875a0-54d7-4075-9468-84ab195df9ad Bridge br0 fail_mode: secure Port br0 Interface br0 type: internal ovs_version: "2.11.4" [root@master-2 core]# ip link show | grep veth 29: veth6c5907ed@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 30: vethfd738fd5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 32: vethd2ab6e37@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 33: veth0cac84d7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 34: veth9474b203@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 35: vethad383ec8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 36: vethf2b7c335@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 37: vetheb944dc6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 38: veth95cbfe1f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 39: veth026eaffb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 40: veth6affbeee@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 41: vetha6b49861@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 42: vethc168c2be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 43: veth68364631@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default And here are the problem interfaces which ovs said “No such device”: 26: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 1e:d3:73:f0:0a:47 brd ff:ff:ff:ff:ff:ff inet6 fe80::1cd3:73ff:fef0:a47/64 scope link valid_lft forever preferred_lft forever 28: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether b2:d6:15:c7:f1:bd brd ff:ff:ff:ff:ff:ff inet 10.128.0.1/23 brd 10.128.1.255 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::b0d6:15ff:fec7:f1bd/64 scope link valid_lft forever preferred_lft forever Also note that tun0 is in UNKNOWN state. Dave Wilder on the LTC Networking team at IBM logged in and was not able to add a veth interface to br0. After he restarted the openvswitch service via systemctl, the interfaces were present on br0. [root@master-2 openvswitch]# ovs-vsctl show f64925b1-a493-4dee-976a-80ba5cf705a4 Bridge br0 fail_mode: secure Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key=flow, remote_ip=flow} Port br0 Interface br0 type: internal Port tun0 Interface tun0 type: internal ovs_version: "2.13.2" Looks like a problem with openvswitch.
> Dave Wilder on the LTC Networking team at IBM logged in and was not able to > add a veth interface to br0. After he restarted the openvswitch service via > systemctl, the interfaces were present on br0. > > [root@master-2 openvswitch]# ovs-vsctl show > f64925b1-a493-4dee-976a-80ba5cf705a4 > Bridge br0 > fail_mode: secure > Port vxlan0 > Interface vxlan0 > type: vxlan > options: {dst_port="4789", key=flow, remote_ip=flow} > Port br0 > Interface br0 > type: internal > Port tun0 > Interface tun0 > type: internal > ovs_version: "2.13.2" > > > Looks like a problem with openvswitch. Some clarification on the test I did: 1)Created a veth pair: # ip link add vethA type veth peer name vethB 2)Attempted to add vethA to br0 # ovs-vsctl add-port br0 vethA This hung..... (ctrl-c out) #ovs-vsctl show did not list vethA. Restart ovs. # systemctl restart openvswitch. [root@master-2 openvswitch]# ovs-vsctl show f64925b1-a493-4dee-976a-80ba5cf705a4 Bridge br0 fail_mode: secure Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key=flow, remote_ip=flow} Port br0 Interface br0 type: internal Port tun0 Interface tun0 type: internal ovs_version: "2.13.2" It appears ovs was not responding until after reboot but still added vethA to db. I rebooted the node and observed new interfaces were added to br0. [root@master-2 core]# ovs-vsctl show def875a0-54d7-4075-9468-84ab195df9ad Bridge br0 fail_mode: secure Port veth5f017582 Interface veth5f017582 Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key=flow, remote_ip=flow} Port br0 Interface br0 type: internal Port veth08774018 Interface veth08774018 Port vetheb89cc8b Interface vetheb89cc8b Port vethfded5e2e Interface vethfded5e2e Port veth888a39e8 Interface veth888a39e8 Port tun0 Interface tun0 type: internal Port veth8878b0bc Interface veth8878b0bc Port veth93cb26e2 Interface veth93cb26e2 Port veth31496ad7 Interface veth31496ad7 Port vethfc335352 Interface vethfc335352 Port veth8d94567f Interface veth8d94567f Port vethde565abf Interface vethde565abf Port vethf3069cbd Interface vethf3069cbd Port vethf36bb791 Interface vethf36bb791 Port veth1d8a33aa Interface veth1d8a33aa Port vethb1a5133f Interface vethb1a5133f ovs_version: "2.11.4"
This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1874696 Adding ovs-devel to CC as well.
I see the following results installing OCP with the docker.io/prashanths684/openshift-release:4.6-ppc64le-ovs image # oc get nodes NAME STATUS ROLES AGE VERSION master-0 NotReady master 4h3m v1.19.0+35ab7c5 master-1 NotReady master 4h4m v1.19.0+35ab7c5 master-2 NotReady master 4h3m v1.19.0+35ab7c5 # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication cloud-credential True False False 4h17m cluster-autoscaler config-operator console csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network False True True 4h4m node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage The only pods in Running/ContainerCreating state. Rest of the pods are in Pending state. # oc get pods --all-namespaces | grep "Running\|ContainerCrea" openshift-multus multus-c9f7c 1/1 Running 14 3h56m openshift-multus multus-mwv68 1/1 Running 12 3h57m openshift-multus multus-zksjc 1/1 Running 4 3h56m openshift-multus network-metrics-daemon-bb4m2 0/2 ContainerCreating 0 3h56m openshift-multus network-metrics-daemon-cdt5b 0/2 ContainerCreating 0 3h57m openshift-multus network-metrics-daemon-rc77h 0/2 ContainerCreating 0 3h56m openshift-network-operator network-operator-5bfcfc7cb6-fhbm9 1/1 Running 0 154m openshift-sdn sdn-controller-c5hmd 1/1 Running 0 3h56m openshift-sdn sdn-controller-d6n9w 1/1 Running 0 3h56m openshift-sdn sdn-controller-gbqxf 1/1 Running 0 3h56m # oc describe pod network-metrics-daemon-bb4m2 -n openshift-multus Name: network-metrics-daemon-bb4m2 Namespace: openshift-multus Priority: 0 Node: master-0/9.114.98.137 Start Time: Thu, 17 Sep 2020 03:38:58 -0400 Labels: app=network-metrics-daemon component=network controller-revision-hash=77dd98c48b openshift.io/component=network pod-template-generation=1 type=infra Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: DaemonSet/network-metrics-daemon Containers: network-metrics-daemon: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:83e81b9a1307b2c4137acb9b0fc940e42c245d5c823cbf4860f1ce41deed050d Image ID: Port: <none> Host Port: <none> Command: /usr/bin/network-metrics Args: --node-name $(NODE_NAME) State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 100Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fdb7e386b43c9f56189e90912b33b1cba9c24a09bde6a3067ff92ef529f71dd Image ID: Port: 8443/TCP Host Port: 0/TCP Args: --logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9091/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/metrics from metrics-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: metrics-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-daemon-secret Optional: false metrics-daemon-sa-token-t95pz: Type: Secret (a volume populated by a Secret) SecretName: metrics-daemon-sa-token-t95pz Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning NetworkNotReady 159m (x2552 over 4h4m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Warning FailedMount 96m (x35 over 151m) kubelet MountVolume.SetUp failed for volume "metrics-certs" : secret "metrics-daemon-secret" not found Warning NetworkNotReady 106s (x4502 over 151m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? # ssh core@master-1 'ip a' 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: env32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether fa:16:3e:f3:6e:61 brd ff:ff:ff:ff:ff:ff inet 9.114.98.247/22 brd 9.114.99.255 scope global dynamic noprefixroute env32 valid_lft 12803sec preferred_lft 12803sec inet6 fe80::f816:3eff:fef3:6e61/64 scope link noprefixroute valid_lft forever preferred_lft forever Restarted the nodes to see any change in the results but that did not help.
After doing some investigation with Mick and talking to the sdn team , they suggested it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1874696. Today i built a patched ppc64le image with these two PRs: https://github.com/openshift/machine-config-operator/pull/2102 https://github.com/openshift/cluster-network-operator/pull/785 Archana was able to test the image and the install succeeded on powerVM. Marking it as a dup of 1874696 *** This bug has been marked as a duplicate of bug 1874696 ***