Bug 1956309 - FailedCreatePodSandBox failed to create container veth: failed to make veth pair: operation not permitted
Summary: FailedCreatePodSandBox failed to create container veth: failed to make veth p...
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Surya Seetharaman
QA Contact: zhaozhanqi
URL:
Whiteboard: TechnicalReleaseBlocker
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 12:56 UTC by Clayton Coleman
Modified: 2021-05-14 15:05 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Clayton Coleman 2021-05-03 12:56:02 UTC
: [sig-network] pods should successfully create sandboxes by other expand_less 	0s
1 failures to create the sandbox

ns/e2e-persistent-local-volumes-test-7134 pod/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 node/ip-10-0-190-119.ec2.internal - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-a5752853-17d6-40c3-b890-d25af8acbfa0_e2e-persistent-local-volumes-test-7134_4b3ff219-de7b-4336-abf2-f06313cdac06_0(fa8eeac03896cdf03c4f067c917397e030add4150d78a9629491987cbf262ee6): [e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0:openshift-sdn]: error adding container to network "openshift-sdn": failed to create container veth: failed to make veth pair: operation not permitted

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-serial/1388595670943797248

About 2% of CI runs have this failure in it in 4.8/master which is fairly high.

https://search.ci.openshift.org/?search=failed+to+make+veth+pair%3A+operation+not+permitted&maxAge=48h&context=1&type=bug%2Bjunit&name=4%5C.8%7Cmaster&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Peter Hunt 2021-05-03 19:14:37 UTC
I'm not sure yet if this is a blocker, I've set it on the wrong bug, as well as attached the wrong PR and set the wrong status, sorry for the noise

Comment 3 Peter Hunt 2021-05-04 18:51:29 UTC
I am naively guessing it's coming from https://github.com/containernetworking/plugins/blob/e27c48b391539f5536918b9c379c59ef2793cb0d/pkg/ip/link_linux.go#L36
though it's definitely coming from the sdn plugin somehow

Tossing over to networking

Comment 5 Surya Seetharaman 2021-05-14 15:05:39 UTC
Investigating the CI failure mentioned in #1: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-serial/1388595670943797248

SDN logs:

I0501 21:48:43.550057    2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0
I0501 21:48:43.603407    2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0
I0501 21:48:54.363427    2637 pod.go:508] CNI_ADD e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 got IP 10.130.2.43, ofport 44
I0501 21:48:56.173815    2637 proxier.go:824] "syncProxyRules complete" elapsed="186.024749ms"
I0501 21:48:56.244799    2637 proxier.go:854] "Syncing iptables rules"
I0501 21:48:56.417458    2637 proxier.go:824] "syncProxyRules complete" elapsed="172.746104ms"
I0501 21:48:57.431283    2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0

These suggest, the CNI_DEL was issued before the CNI_ADD request could get processed, possibly a creation timeout ? So DEL happens before ADD. and then the add fails.

Multus logs are not useful because they are not on debug mode.


Also looking at other CI failures: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.8/1393181488270282752

 [sig-network] pods should successfully create sandboxes by other expand_less	0s
1 failures to create the sandbox

ns/e2e-configmap-2364 pod/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc node/ci-op-xby2gqnn-3283a-2kc76-worker-c-s2wzc - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc_e2e-configmap-2364_d9fa8ca7-92b9-43fa-a537-65ac881283a8_0(5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1): [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] failed to configure pod interface: failed to make veth pair: operation not permitted
'

This is deff not a SDN bug.


Note You need to log in before you can comment on or make changes to this bug.