Description of problem: When I deprovision multus (as a part of kubevirt-apb) https://github.com/kubevirt/kubevirt-ansible/blob/master/roles/network-multus/tasks/deprovision.yml#L17 then I can not schedule any other pods, these are stuck in ContainerCreating state brew2-virtualization-depr-7hltt bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 0/1 ContainerCreating 0 23m In events I can read: Warning FailedCreatePodSandBox 3m (x79 over 21m) kubelet, cnv-executor-ysegev-node1.example.com (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3_brew2-virtualization-depr-7hltt_a4585a2c-ecd0-11e8-b487-fa163eeeea38_0(bfdfc5584571af9e321b847573746f5dc00612d0da3d2a135eca46c7d24c733c): Multus: Err in loading K8s Delegates k8s args: Multus: Err in getting k8s network from pod: getPodNetworkAnnotation: failed to query the pod bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 in out of cluster comm: Unauthorized Version-Release number of selected component (if applicable): OCP-3.11.43 Image ID: registry.access.stage.redhat.com/cnv-tech-preview/multus-cni@sha256:a6a2e146251c14ee1be71dfc0c9f14bb1107c125ee0669e55ef69b34fcd34b71 How reproducible: 100% Steps to Reproduce: 1. Deploy & Delete multus 2. Spawn new pod 3. Observe events of your pod Actual results: Pod is stuck in ContainerCreating and errors in events Expected results: Pod should get started successfully Additional info: [root@cnv-executor-ysegev-master1 ~]# oc describe pod -n brew2-virtualization-depr-7hltt bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 Name: bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 Namespace: brew2-virtualization-depr-7hltt Priority: 0 PriorityClassName: <none> Node: cnv-executor-ysegev-node1.example.com/172.16.0.15 Start Time: Tue, 20 Nov 2018 09:29:08 -0500 Labels: bundle-action=deprovision bundle-fqname=brew2-virtualization bundle-pod-name=bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 Annotations: openshift.io/scc=restricted Status: Pending IP: Containers: apb: Container ID: Image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/cnv-tech-preview/kubevirt-apb:v3.11 Image ID: Port: <none> Host Port: <none> Args: deprovision --extra-vars {"_apb_last_requesting_user":"test_admin","_apb_plan_id":"default","_apb_provision_creds":{},"_apb_service_class_id":"d87125ed7e015c489588f56c665f9f7a","_apb_service_instance_id":"3bcf37ee-ecd0-11e8-812b-0a580a800012","admin_password":123456,"admin_user":"test_admin","cluster":"openshift","docker_tag":"v1.3.0","namespace":"kube-system","registry_namespace":"cnv-tech-preview","registry_url":"brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"} State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: POD_NAME: bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 (v1:metadata.name) POD_NAMESPACE: brew2-virtualization-depr-7hltt (v1:metadata.namespace) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm: Type: Secret (a volume populated by a Secret) SecretName: bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 24m default-scheduler Successfully assigned brew2-virtualization-depr-7hltt/bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 to cnv-executor-ysegev-node1.example.com Warning FailedCreatePodSandBox 24m kubelet, cnv-executor-ysegev-node1.example.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3_brew2-virtualization-depr-7hltt_a4585a2c-ecd0-11e8-b487-fa163eeeea38_0(c408791b69c0edf126123c2a0c3bd7c2a24757e5613e8fc683d0309cc94190e3): Multus: Err in loading K8s Delegates k8s args: Multus: Err in getting k8s network from pod: getPodNetworkAnnotation: failed to query the pod bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 in out of cluster comm: Unauthorized
Hi Lukas, Looks like this is not a networking related issue, but a problem of the ansible playbook. When you deploying the multus, it will create all the resource that multus needed. 1. The CRD of network-attachment-definition 2. The clusterrole/clusterrolebinding/serviceaccount which related to the permission controls 3. The daemonset and configmap which makes multus working The daemonset will also create the the cni related file on each host under /etc/cni/net.d/ and /opt/cni/bin/ But when you de-provision, it just deleted the resource which created in k8s cluster, but did not clean the files on the hosts. This will cause, when you trying to spawn new pod after deprovision, it will find the /etc/cni/net.d/00-multus.conf on the node which has a higher priority, and try to call the /opt/cni/bin/multus to setup the network for the pod. But at this time, the resources in above step1 and step2 are all deleted. And the multus binary will not know how to deal with that in k8s cluster. Can you check if the high priority multus conf exists under the /etc/cni/net.d/ on node? And can you try spawn pod again with delete the multus conf manually?(This will bring your cluster back to use the default CNI features) And I am also curious that why this bug is reported to the OCP product, since it is a CNV bug indeed. Thanks.
As Meng described, it is a bug in kubevirt apb deprovision. Multus pod installs configuration files on the host. However, during deprovision [1], only the pod is removed, config file (with the highest priority in this case) remains on the host. Not sure what would be the best approach to solve it, but one of the options is to explicitly remove it in the deprovision run. [1] https://github.com/kubevirt/kubevirt-ansible/blob/master/roles/network-multus/tasks/deprovision.yml#L17
Thank you Meng Bo for detailed description, I opened issue for APB to fix that: https://github.com/kubevirt/kubevirt-ansible/issues/477 .
Looks like this can be closed; feel free to reopen if this in error.