Bug 1651693 - Can not spawn new containers after removing multus pods
Summary: Can not spawn new containers after removing multus pods
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Casey Callendrello
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-20 15:27 UTC by Lukas Bednar
Modified: 2018-11-23 15:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-23 15:28:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lukas Bednar 2018-11-20 15:27:11 UTC
Description of problem:

When I deprovision multus (as a part of kubevirt-apb)
https://github.com/kubevirt/kubevirt-ansible/blob/master/roles/network-multus/tasks/deprovision.yml#L17

then I can not schedule any other pods, these are stuck in ContainerCreating state

brew2-virtualization-depr-7hltt     bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3                  0/1       ContainerCreating   0          23m

In events I can read:
  Warning  FailedCreatePodSandBox  3m (x79 over 21m)  kubelet, cnv-executor-ysegev-node1.example.com  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3_brew2-virtualization-depr-7hltt_a4585a2c-ecd0-11e8-b487-fa163eeeea38_0(bfdfc5584571af9e321b847573746f5dc00612d0da3d2a135eca46c7d24c733c): Multus: Err in loading K8s Delegates k8s args: Multus: Err in getting k8s network from pod: getPodNetworkAnnotation: failed to query the pod bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 in out of cluster comm: Unauthorized



Version-Release number of selected component (if applicable):
OCP-3.11.43
    Image ID:      registry.access.stage.redhat.com/cnv-tech-preview/multus-cni@sha256:a6a2e146251c14ee1be71dfc0c9f14bb1107c125ee0669e55ef69b34fcd34b71



How reproducible: 100%


Steps to Reproduce:
1. Deploy & Delete multus
2. Spawn new pod
3. Observe events of your pod 

Actual results: Pod is stuck in ContainerCreating and errors in events


Expected results: Pod should get started successfully 


Additional info:

[root@cnv-executor-ysegev-master1 ~]# oc describe pod -n brew2-virtualization-depr-7hltt bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3
Name:               bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3
Namespace:          brew2-virtualization-depr-7hltt
Priority:           0
PriorityClassName:  <none>
Node:               cnv-executor-ysegev-node1.example.com/172.16.0.15
Start Time:         Tue, 20 Nov 2018 09:29:08 -0500
Labels:             bundle-action=deprovision
                    bundle-fqname=brew2-virtualization
                    bundle-pod-name=bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3
Annotations:        openshift.io/scc=restricted
Status:             Pending
IP:                 
Containers:
  apb:
    Container ID:  
    Image:         brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/cnv-tech-preview/kubevirt-apb:v3.11
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      deprovision
      --extra-vars
      {"_apb_last_requesting_user":"test_admin","_apb_plan_id":"default","_apb_provision_creds":{},"_apb_service_class_id":"d87125ed7e015c489588f56c665f9f7a","_apb_service_instance_id":"3bcf37ee-ecd0-11e8-812b-0a580a800012","admin_password":123456,"admin_user":"test_admin","cluster":"openshift","docker_tag":"v1.3.0","namespace":"kube-system","registry_namespace":"cnv-tech-preview","registry_url":"brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"}
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      POD_NAME:       bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 (v1:metadata.name)
      POD_NAMESPACE:  brew2-virtualization-depr-7hltt (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3-token-pcfkm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason                  Age                From                                            Message
  ----     ------                  ----               ----                                            -------
  Normal   Scheduled               24m                default-scheduler                               Successfully assigned brew2-virtualization-depr-7hltt/bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 to cnv-executor-ysegev-node1.example.com
  Warning  FailedCreatePodSandBox  24m                kubelet, cnv-executor-ysegev-node1.example.com  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3_brew2-virtualization-depr-7hltt_a4585a2c-ecd0-11e8-b487-fa163eeeea38_0(c408791b69c0edf126123c2a0c3bd7c2a24757e5613e8fc683d0309cc94190e3): Multus: Err in loading K8s Delegates k8s args: Multus: Err in getting k8s network from pod: getPodNetworkAnnotation: failed to query the pod bundle-f2366e95-3990-47c2-aa06-4b77829c8cc3 in out of cluster comm: Unauthorized

Comment 1 Meng Bo 2018-11-21 03:06:21 UTC
Hi Lukas,

Looks like this is not a networking related issue, but a problem of the ansible playbook.

When you deploying the multus, it will create all the resource that multus needed.
1. The CRD of network-attachment-definition 
2. The clusterrole/clusterrolebinding/serviceaccount which related to the permission controls
3. The daemonset and configmap which makes multus working

The daemonset will also create the the cni related file on each host under /etc/cni/net.d/ and /opt/cni/bin/


But when you de-provision, it just deleted the resource which created in k8s cluster, but did not clean the files on the hosts.

This will cause, when you trying to spawn new pod after deprovision, it will find the /etc/cni/net.d/00-multus.conf on the node which has a higher priority, and try to call the /opt/cni/bin/multus to setup the network for the pod.
But at this time, the resources in above step1 and step2 are all deleted. And the multus binary will not know how to deal with that in k8s cluster.

Can you check if the high priority multus conf exists under the /etc/cni/net.d/ on node?
And can you try spawn pod again with delete the multus conf manually?(This will bring your cluster back to use the default CNI features)

And I am also curious that why this bug is reported to the OCP product, since it is a CNV bug indeed.

Thanks.

Comment 2 Petr Horáček 2018-11-21 15:06:47 UTC
As Meng described, it is a bug in kubevirt apb deprovision. Multus pod installs configuration files on the host. However, during deprovision [1], only the pod is removed, config file (with the highest priority in this case) remains on the host. Not sure what would be the best approach to solve it, but one of the options is to explicitly remove it in the deprovision run.

[1] https://github.com/kubevirt/kubevirt-ansible/blob/master/roles/network-multus/tasks/deprovision.yml#L17

Comment 3 Lukas Bednar 2018-11-21 15:11:38 UTC
Thank you Meng Bo for detailed description, I opened issue for APB to fix that: https://github.com/kubevirt/kubevirt-ansible/issues/477 .

Comment 4 Casey Callendrello 2018-11-23 15:28:18 UTC
Looks like this can be closed; feel free to reopen if this in error.


Note You need to log in before you can comment on or make changes to this bug.