Bug 1869606 - crio's failing to delete a network namespace
Summary: crio's failing to delete a network namespace
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Michał Dulko
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-18 10:52 UTC by Michał Dulko
Modified: 2020-09-15 03:15 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
kuryr-cni logs crashing due to NetlinkError after OCP upgrade (164.78 KB, text/plain)
2020-09-07 10:51 UTC, rlobillo
no flags Details
output of journalctl | grep crio > crio_logs.txt in the affected worker (884.12 KB, application/gzip)
2020-09-07 10:52 UTC, rlobillo
no flags Details
occurence after 24h running (3.14 MB, application/gzip)
2020-09-08 06:32 UTC, rlobillo
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 330 None closed Bug 1869606: Remove the pod interface even if VIF is gone 2020-09-21 06:37:03 UTC

Description Michał Dulko 2020-08-18 10:52:27 UTC
Description of problem:
We've noticed some orphaned crio network namespaces hanging on a setup, which prevented Kuryr-Kubernetes from plugging interfaces for other containers due to IP/VLAN ID conflicts.

This is what we saw in crio regarding offending interfaces:

Aug 18 10:30:04 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-18 10:30:04.946146953Z" level=error msg="Error deleting network: Multus: [openshift-marketplace/community-operators-64k69]: error reading the delegates: open /var/lib/cni/multus/904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725: no such file or directory"
Aug 18 10:30:04 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-18 10:30:04.946195165Z" level=error msg="Error while removing pod from CNI network \"multus-cni-network\": Multus: [openshift-marketplace/community-operators-64k69]: error reading the delegates: open /var/lib/cni/multus/904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725: no such file or directory"

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Seems to be related to some race condition, happens quite rarely.

Steps to Reproduce (more like how I got this):
1. Install OpenShift 4.6 on OpenStack with Kuryr.
2. Let it sit for ~24h.

Actual results:
Notice one kuryr-cni pod is crashlooping and some pods are stuck in ContainerCreating. In kuryr-cni logs you can see that it gets NetlinkError 17 - file exists, meaning that there's some other namespace that uses the same VLAN ID as kuryr-cni tries to set. This is due to orphaned netns'es.

Expected results:
Pods should run just fine.

Additional info:
Here's full crio log (600 MBs): https://drive.google.com/file/d/13wOLbyGECka8UAuKsETyzxqIyKU8t7g3/view?usp=sharing

Comment 1 Peter Hunt 2020-08-19 20:45:24 UTC
I am seeing a whole lot of:
```
Aug 17 21:08:26 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-17 21:08:26.928756325Z" level=error msg="Error stopping network on cleanup: failed to destroy network for pod sandbox k8s_redhat-marketplace-s9s82_openshift-marketplace_8bdec5f4-b417-4374-90bc-cd365f22f2ee_0(5d0bdafa7e112977413c7092c9cac95ab78431f7c261a5e7c9570e82749c258b): delegateDel: error invoking DelegateDel - \"kuryr-cni\": error in getting result from DelNetwork: Looks like http://localhost:5036/delNetwork cannot be reached. Is kuryr-daemon running?: Post \"http://localhost:5036/delNetwork\": EOF" id=04efa482-f265-4348-a735-8516c8cb7c77 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
```

what does this mean?

Comment 2 Michał Dulko 2020-08-20 08:37:21 UTC
(In reply to Peter Hunt from comment #1)
> I am seeing a whole lot of:
> ```
> Aug 17 21:08:26 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-17
> 21:08:26.928756325Z" level=error msg="Error stopping network on cleanup:
> failed to destroy network for pod sandbox
> k8s_redhat-marketplace-s9s82_openshift-marketplace_8bdec5f4-b417-4374-90bc-
> cd365f22f2ee_0(5d0bdafa7e112977413c7092c9cac95ab78431f7c261a5e7c9570e82749c25
> 8b): delegateDel: error invoking DelegateDel - \"kuryr-cni\": error in
> getting result from DelNetwork: Looks like http://localhost:5036/delNetwork
> cannot be reached. Is kuryr-daemon running?: Post
> \"http://localhost:5036/delNetwork\": EOF"
> id=04efa482-f265-4348-a735-8516c8cb7c77
> name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
> ```
> 
> what does this mean?

Right, I could have explained that. Basically all CNI requests go to kuryr-daemon process that has knowledge about what VIFs map to which pod. If there's a persistent issue with plugging VIFs, kuryr-daemon livenessProbe fails and process is restarted with crashloop (that sometimes helped recovering from weird issues). This log is from those moments - when kuryr-daemon is not running.

But the root issue here is the orphaned network namespace, which has the interface with conflicting VLAN ID that raises the NetlinkError in kuryr-daemon.

Now that I think about it we *may* be able to try to improve that by making sure on CNI DEL we always clean up those interfaces - right now there might be cases when we don't.

Anyway I think this wasn't happening in 4.5, so there has to be a change somewhere.

Comment 3 Peter Hunt 2020-08-20 14:04:55 UTC
to begin finding the issue, I've opened https://github.com/cri-o/cri-o/pull/4102. That will tell us if we're leaking a namespace in cases where the pod failed to create and we're cleaning up. I am not sure that's the case we're seeing, but it's a nice to have nonetheless.

It is unclear where you're seeing that we have orphaned network namespaces. I can't find any reporting that we're failing to clean up a namespace.

One thing I just thought about is: in 4.6 we've moved the managed namespace removal from StopPodSandbox to RemovePodSandbox requests. However, the kubelet calls RemovePodSandbox much later than StopPodSandbox. Is it possible kuryr is trying to reuse VLAN IDs after a pod is stopped but before it's removed?

Comment 4 Michał Dulko 2020-08-20 14:51:56 UTC
(In reply to Peter Hunt from comment #3)
> to begin finding the issue, I've opened
> https://github.com/cri-o/cri-o/pull/4102. That will tell us if we're leaking
> a namespace in cases where the pod failed to create and we're cleaning up. I
> am not sure that's the case we're seeing, but it's a nice to have
> nonetheless.
> 
> It is unclear where you're seeing that we have orphaned network namespaces.
> I can't find any reporting that we're failing to clean up a namespace.

How about those logs?

Aug 18 10:30:04 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-18 10:30:04.946146953Z" level=error msg="Error deleting network: Multus: [openshift-marketplace/community-operators-64k69]: error reading the delegates: open /var/lib/cni/multus/904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725: no such file or directory"
Aug 18 10:30:04 ostest-wlfr4-worker-lc6sp crio[1505]: time="2020-08-18 10:30:04.946195165Z" level=error msg="Error while removing pod from CNI network \"multus-cni-network\": Multus: [openshift-marketplace/community-operators-64k69]: error reading the delegates: open /var/lib/cni/multus/904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725: no such file or directory"

When debugging this my impression was that this happened for the conflicting namespaces - and those namespaces stayed there.

> One thing I just thought about is: in 4.6 we've moved the managed namespace
> removal from StopPodSandbox to RemovePodSandbox requests. However, the
> kubelet calls RemovePodSandbox much later than StopPodSandbox. Is it
> possible kuryr is trying to reuse VLAN IDs after a pod is stopped but before
> it's removed?

Wow, nice thinking! Basically with Kuryr that's driven by K8s API (I know, that's bad), so once pod resource in the K8s API is deleted, we can reuse the VLAN ID. So what you describe totally fits as a culprit.

Comment 5 Peter Hunt 2020-08-20 20:34:41 UTC
it is unlikely it will be fixed this sprint, bumping to next one

> When debugging this my impression was that this happened for the conflicting namespaces - and those namespaces stayed there.

AFAIK, `/var/lib/cni/multus/904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725` is not a namespace. The network namespaces cri-o creates are of the form `/var/run/netns/$RANDOM_ID`. I am not sure what that multus file is

Comment 6 Michał Dulko 2020-08-21 08:24:41 UTC
(In reply to Peter Hunt from comment #5)
> it is unlikely it will be fixed this sprint, bumping to next one
> 
> > When debugging this my impression was that this happened for the conflicting namespaces - and those namespaces stayed there.
> 
> AFAIK,
> `/var/lib/cni/multus/
> 904b0f1e18d0b07b9f58dfb1cb8e8ddc80690edefbd87e1f0275363377a6b725` is not a
> namespace. The network namespaces cri-o creates are of the form
> `/var/run/netns/$RANDOM_ID`. I am not sure what that multus file is

I only matched that log through pod name, nevertheless I was able to find conflicting orphaned namespaces in /var/run/netns. It might be due to Kuryr's behaviour though.

Comment 7 Michał Dulko 2020-08-27 12:54:29 UTC
Seems like we are able to work this around in Kuryr so moving it there. It's most likely not cri-o's fault as we probably assumed an implementation detail is API.

Comment 11 rlobillo 2020-09-07 10:50:39 UTC
Failed on 4.6.0-0.nightly-2020-09-05-015624 over OSP16.1 RHOS-16.1-RHEL-8-20200821.n.0 with OVN.

The issue is observed during upgrade from 4.5.8 to 4.6.0-0.nightly-2020-09-05-015624

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.8     True        False         69s     Cluster version is 4.5.8

$ oc get pods -n openshift-kuryr -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE                        NOMINATED NODE   READINESS GATES
kuryr-cni-6sfjr                     1/1     Running   0          39m   10.196.3.94    ostest-xcbsx-master-0       <none>           <none>
kuryr-cni-g5jk6                     1/1     Running   0          39m   10.196.0.33    ostest-xcbsx-master-2       <none>           <none>
kuryr-cni-kp8cr                     1/1     Running   1          19m   10.196.1.58    ostest-xcbsx-worker-cb5kt   <none>           <none>
kuryr-cni-qgc7s                     1/1     Running   1          19m   10.196.2.90    ostest-xcbsx-worker-8sc92   <none>           <none>
kuryr-cni-tnqbg                     1/1     Running   0          19m   10.196.2.16    ostest-xcbsx-worker-ctm65   <none>           <none>
kuryr-cni-tsj6x                     1/1     Running   0          39m   10.196.3.204   ostest-xcbsx-master-1       <none>           <none>
kuryr-controller-6f999575cd-qsxdz   1/1     Running   2          39m   10.196.3.94    ostest-xcbsx-master-0       <none>           <none>


$ oc adm upgrade --to-image="registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-05-015624"  --allow-explicit-upgrade --force=true

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.8     True        True          113m    Unable to apply 4.6.0-0.nightly-2020-09-05-015624: the cluster operator monitoring is degraded


$ oc get clusteroperators
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2020-09-05-015624   True        False         False      33m
cloud-credential                           4.6.0-0.nightly-2020-09-05-015624   True        False         False      160m
cluster-autoscaler                         4.6.0-0.nightly-2020-09-05-015624   True        False         False      146m
config-operator                            4.6.0-0.nightly-2020-09-05-015624   True        False         False      146m
console                                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      81m
csi-snapshot-controller                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      132m
dns                                        4.5.8                               True        False         False      151m
etcd                                       4.6.0-0.nightly-2020-09-05-015624   True        False         False      151m
image-registry                             4.6.0-0.nightly-2020-09-05-015624   True        False         False      133m
ingress                                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      82m
insights                                   4.6.0-0.nightly-2020-09-05-015624   True        False         False      148m
kube-apiserver                             4.6.0-0.nightly-2020-09-05-015624   True        False         False      151m
kube-controller-manager                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      150m
kube-scheduler                             4.6.0-0.nightly-2020-09-05-015624   True        False         False      150m
kube-storage-version-migrator              4.6.0-0.nightly-2020-09-05-015624   True        False         False      133m
machine-api                                4.6.0-0.nightly-2020-09-05-015624   True        False         False      144m
machine-approver                           4.6.0-0.nightly-2020-09-05-015624   True        False         False      149m
machine-config                             4.5.8                               True        False         False      144m
marketplace                                4.6.0-0.nightly-2020-09-05-015624   True        False         False      83m
monitoring                                 4.6.0-0.nightly-2020-09-05-015624   False       True          True       63m
network                                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      154m
node-tuning                                4.6.0-0.nightly-2020-09-05-015624   True        False         False      83m
openshift-apiserver                        4.6.0-0.nightly-2020-09-05-015624   True        False         False      83m
openshift-controller-manager               4.6.0-0.nightly-2020-09-05-015624   True        False         False      148m
openshift-samples                          4.6.0-0.nightly-2020-09-05-015624   True        False         False      82m
operator-lifecycle-manager                 4.6.0-0.nightly-2020-09-05-015624   True        False         False      151m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2020-09-05-015624   True        False         False      151m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2020-09-05-015624   True        False         False      82m
service-ca                                 4.6.0-0.nightly-2020-09-05-015624   True        False         False      153m
storage                                    4.6.0-0.nightly-2020-09-05-015624   True        False         False      74m


$ oc get pods -n openshift-kuryr -o wide
NAME                                READY   STATUS             RESTARTS   AGE   IP             NODE                        NOMINATED NODE   READINESS GATES
kuryr-cni-2sxpr                     1/1     Running            0          86m   10.196.2.16    ostest-xcbsx-worker-ctm65   <none>           <none>
kuryr-cni-5l4r6                     0/1     CrashLoopBackOff   19         82m   10.196.2.90    ostest-xcbsx-worker-8sc92   <none>           <none>
kuryr-cni-rqzhs                     1/1     Running            17         84m   10.196.3.94    ostest-xcbsx-master-0       <none>           <none>
kuryr-cni-vcqff                     1/1     Running            0          81m   10.196.3.204   ostest-xcbsx-master-1       <none>           <none>
kuryr-cni-zdfr7                     1/1     Running            0          77m   10.196.0.33    ostest-xcbsx-master-2       <none>           <none>
kuryr-cni-zq8gj                     1/1     Running            0          79m   10.196.1.58    ostest-xcbsx-worker-cb5kt   <none>           <none>
kuryr-controller-76c88885d6-jgxfv   1/1     Running            0          86m   10.196.3.94    ostest-xcbsx-master-0       <none>           <none>




kuryr-cni-5l4r6 is restarting all the time due to NetlinkError:

  58 2020-09-07 10:07:37.272 53 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/var/run/netns/0b8e87     09-f43d-4173-ad99-41fb7afcbd39', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin:/usr/libexec/cni', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '438baf2b8de5daeb7efc0fce2a20dd922a224a800d211     ee9e85ac502634c0a21', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-marketplace;K8S_POD_NAME=redhat-operators-thvcs;K8S_POD_INFRA_CONTAINER_ID=438baf2b8de5daeb7efc0fce2a20     dd922a224a800d211ee9e85ac502634c0a21'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')


openshifft-marketplace pods are hung on ContainerCreating:

$ oc get pods -A -o wide| grep -v -e Running -e Completed
NAMESPACE                                          NAME                                                         READY   STATUS              RESTARTS   AGE     IP               NODE                        NOMINATED NODE   READINESS GATES
openshift-kuryr                                    kuryr-cni-5l4r6                                              0/1     CrashLoopBackOff    17         70m     10.196.2.90      ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-marketplace                              certified-operators-bcgvf                                    0/1     ContainerCreating   0          8m35s   <none>           ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-marketplace                              certified-operators-frbv9                                    0/1     ContainerCreating   0          23m     <none>           ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-marketplace                              redhat-marketplace-5tlnn                                     0/1     ContainerCreating   0          23m     <none>           ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-marketplace                              redhat-marketplace-l4lsp                                     0/1     ContainerCreating   0          8m35s   <none>           ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-marketplace                              redhat-operators-qxmkr                                       0/1     ContainerCreating   0          8m35s   <none>           ostest-xcbsx-worker-8sc92   <none>           <none>
openshift-monitoring                               prometheus-operator-79b6894bf6-5wl6z                         0/2     ContainerCreating   0          69m     <none>           ostest-xcbsx-master-0       <none>           <none>


Below errors are observed on CRIO at worker ostest-xcbsx-worker-8sc92:

Sep 07 10:14:13 ostest-xcbsx-worker-8sc92 crio[1634]: 2020-09-07T10:14:13Z [error] Multus: [openshift-marketplace/redhat-marketplace-tnbmp]: error reading the delegates: open /var/lib/cni/multus/99cbc7f33a04c3fe9a350a135fde92590da00fce0828dcdcad2f7a28ddc380c2: no such file or directory

Attaching CNI and CRIO logs.

Comment 12 rlobillo 2020-09-07 10:51:29 UTC
Created attachment 1713947 [details]
kuryr-cni logs crashing due to NetlinkError after OCP upgrade

Comment 13 rlobillo 2020-09-07 10:52:11 UTC
Created attachment 1713950 [details]
output of journalctl | grep crio > crio_logs.txt in the affected worker

Comment 14 rlobillo 2020-09-08 06:32:10 UTC
Created attachment 1714026 [details]
occurence after 24h running

Hit the issue after keeping the cluster running for 24 hours. Attaching the logs under kuryr_cni_Crashloopback_after24h_4.6.0-0.nightly-2020-09-05-015624.tgz


Note You need to log in before you can comment on or make changes to this bug.