Bug 1956309
Summary: | FailedCreatePodSandBox failed to create container veth: failed to make veth pair: operation not permitted | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Networking | Assignee: | Surya Seetharaman <surya> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED WORKSFORME | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aconstan, anbhat, aos-bugs, obulatov, pehunt, pmuller |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | TechnicalReleaseBlocker | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-25 06:34:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Clayton Coleman
2021-05-03 12:56:02 UTC
I'm not sure yet if this is a blocker, I've set it on the wrong bug, as well as attached the wrong PR and set the wrong status, sorry for the noise I am naively guessing it's coming from https://github.com/containernetworking/plugins/blob/e27c48b391539f5536918b9c379c59ef2793cb0d/pkg/ip/link_linux.go#L36 though it's definitely coming from the sdn plugin somehow Tossing over to networking Investigating the CI failure mentioned in #1: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-serial/1388595670943797248 SDN logs: I0501 21:48:43.550057 2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 I0501 21:48:43.603407 2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 I0501 21:48:54.363427 2637 pod.go:508] CNI_ADD e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 got IP 10.130.2.43, ofport 44 I0501 21:48:56.173815 2637 proxier.go:824] "syncProxyRules complete" elapsed="186.024749ms" I0501 21:48:56.244799 2637 proxier.go:854] "Syncing iptables rules" I0501 21:48:56.417458 2637 proxier.go:824] "syncProxyRules complete" elapsed="172.746104ms" I0501 21:48:57.431283 2637 pod.go:544] CNI_DEL e2e-persistent-local-volumes-test-7134/pod-a5752853-17d6-40c3-b890-d25af8acbfa0 These suggest, the CNI_DEL was issued before the CNI_ADD request could get processed, possibly a creation timeout ? So DEL happens before ADD. and then the add fails. Multus logs are not useful because they are not on debug mode. Also looking at other CI failures: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.8/1393181488270282752 [sig-network] pods should successfully create sandboxes by other expand_less 0s 1 failures to create the sandbox ns/e2e-configmap-2364 pod/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc node/ci-op-xby2gqnn-3283a-2kc76-worker-c-s2wzc - never deleted - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc_e2e-configmap-2364_d9fa8ca7-92b9-43fa-a537-65ac881283a8_0(5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1): [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] failed to configure pod interface: failed to make veth pair: operation not permitted ' This is deff not a SDN bug. Looking into the OVN-k8s logs, things look different: I0514 13:11:41.058777 3165 cni.go:213] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] ADD starting CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] I0514 13:11:41.072232 3165 cni.go:223] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] ADD finished CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1], result "", err failed to configure pod interface: failed to make veth pair: operation not permitted I0514 13:11:41.094925 3165 cni.go:213] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] DEL starting CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] I0514 13:11:41.130893 3165 healthcheck.go:142] Opening healthcheck "openshift-ingress/router-e2e-test-router-h2spec-ghwdb" on port 30317 I0514 13:11:41.137232 3165 cni.go:223] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] DEL finished CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1], result "", err <nil> I0514 13:11:41.259022 3165 cni.go:213] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] DEL starting CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] I0514 13:11:41.286788 3165 cni.go:223] [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1] DEL finished CNI request [e2e-configmap-2364/pod-configmaps-7a4d568c-8859-4cea-a14d-fab2aaf5afbc 5a670dfca585f2d4cb206655ead549c491584c1017c47c8787786f5ece3e7db1], result "", err <nil> It seems like CNI_ADD failed with that error and only then the delete is triggered ? Maybe for the SDN run, since I cannot see the initial sandbox request, maybe the logs got rotated and I missed the first failure... Seems to be similar/linked to https://github.com/containers/buildah/issues/2628? https://search.ci.openshift.org/?search=failed+to+make+veth+pair%3A+operation+not+permitted&maxAge=48h&context=1&type=bug%2Bjunit&name=4%5C.8%7Cmaster&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job The error seems to have gone away magically ? I don't know what changed. I'll keep an eye out for the next 24hrs. If no more failures come I'll close the bug. Unless I am missing something, the issue seems to have gotten resolved on its own (based on the search results)? Perhaps it was a kernel version on the CI nodes, not sure. However the failures seem to have gone away. Closing this bug for now. Feel free to (re)open a new one if it still persists. |