Bug 1414963 - SetupNetworkError "Failed to setup network for pod" after 3.4 Upgrade
Summary: SetupNetworkError "Failed to setup network for pod" after 3.4 Upgrade
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-19 20:17 UTC by Nick Schuetz
Modified: 2023-09-14 03:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-24 19:27:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nick Schuetz 2017-01-19 20:17:17 UTC
Description of problem:

After following the manual upgrade process for a 3.3.1 installation to 3.4 my router, docker-registry and registry-console pods are unable to setup their networks when scheduled.

Version-Release number of selected component (if applicable):

openshift v3.4.0.39
kubernetes v1.4.0+776c994

How reproducible:

Always

Steps to Reproduce:
1. Follow the manual upgrade outlined in the documentation for 3.4 from 3.3.1.7
2.
3.

Actual results:

  9m	1s	82	{kubelet ocp-master1.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "router-9-deploy_default" with SetupNetworkError: "Failed to setup network for pod \"router-9-deploy_default(b13f4b14-de7e-11e6-968a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod"

  13m	13m	1	{kubelet ocp-master1.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "docker-registry-11-3ykik_default" with SetupNetworkError: "Failed to setup network for pod \"docker-registry-11-3ykik_default(01aee4e0-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/25346/ns/net\": no such file or directory; Skipping pod"


  16m	23s	46	{kubelet ocp-node2.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "registry-console-1-kn442_default" with SetupNetworkError: "Failed to setup network for pod \"registry-console-1-kn442_default(01af2237-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod"

Expected results:

The SetupNetwork routine does not fail.

Additional info:

This is a single schedule=true master with two additional node installation.

Comment 1 Scott Dodson 2017-01-19 21:22:19 UTC
If you remove /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf and restart docker does the problem go away?

Comment 2 Nick Schuetz 2017-01-19 21:25:05 UTC
That file does not exist. I went to remove it as a part of the upgrade process and it wasn't there then either.

# stat /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf
stat: cannot stat ‘/etc/systemd/system/docker.service.d/docker-sdn-ovs.conf’: No such file or directory

Comment 3 Nick Schuetz 2017-01-19 21:29:43 UTC
The following directory does not exist either:

/etc/systemd/system/docker.service.d

Comment 5 Nick Schuetz 2017-01-19 21:56:49 UTC
# grep ovs /etc/origin/master/master-config.yaml
  networkPluginName: redhat/openshift-ovs-subnet

Comment 6 Nick Schuetz 2017-01-19 22:48:26 UTC
This seems to be selinux related. I turned off selinux (setenforce 0) and all my pods were able to SetupNetwork and start. And the error went away.

Comment 7 Ben Bennett 2017-01-19 22:50:09 UTC
Can you get the selinux audit entries please?

Comment 8 Nick Schuetz 2017-01-19 22:58:06 UTC
This may be the culprit:

type=SYSCALL msg=audit(1484866535.244:2144): arch=c000003e syscall=59 success=no exit=-13 a0=c42010ca3a a1=c42010ca40 a2=c4200bc1e0 a3=0 items=0 ppid=32195 pid=32211 auid=4294967295 uid=1001 gid=0 euid=1001 suid=1001 fsuid=1001 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="exe" exe="/usr/libexec/docker/docker-runc-current" subj=system_u:system_r:unconfined_service_t:s0 key=(null)

# ls -lZ /usr/libexec/docker/*
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/docker-proxy-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/docker-runc-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/rhel-push-plugin

# ls -lZ /usr/bin/docker*
-rwxr-xr-x. root root system_u:object_r:docker_exec_t:s0 /usr/bin/docker
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-shim
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-shim-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-ctr-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/dockerd-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-storage-setup

Comment 9 Nick Schuetz 2017-01-19 23:02:15 UTC
FYI While doing the package updates I got:

  Updating   : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch                                                                                                                                                                                                                                                                                                  36/97 
Re-declaration of type docker_t
Failed to create node
Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1
/usr/sbin/semodule:  Failed!

Comment 10 Nick Schuetz 2017-01-20 00:21:01 UTC
# chcon -t container_runtime_exec_t /usr/bin/docker*
chcon: failed to change context of ‘/usr/bin/docker’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-shim’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-shim-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-ctr-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/dockerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-storage-setup’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument

Am i doing something wrong here?

Comment 11 Nick Schuetz 2017-01-20 22:31:25 UTC
Nothing is returned from the following command:

semanage fcontext -l | grep container_runtime_exec_t

Comment 12 Meng Bo 2017-01-22 03:23:38 UTC
Add Dan Walsh to see the selinux problem

Comment 14 Nick Schuetz 2017-01-24 15:26:00 UTC
Forcing a reinstall container-selinux package fixed the issue and labeled the docker components appropriately:

# ls -lZ /usr/bin/docker*
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-ctr-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/dockerd-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-storage-setup
# ls -laZ /usr/libexec/docker/*
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-proxy-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-runc-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/rhel-push-plugin

Comment 18 Josh Baird 2017-02-09 04:02:26 UTC
I just hit the same bug after upgrading from OCP 3.3 to OCP 3.4 and performing a 'yum update' to the latest RHEL 7.3.z:

Updating   : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch                                                                                                                                                                                                                                                                                                  36/97 
Re-declaration of type docker_t
Failed to create node
Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1
/usr/sbin/semodule:  Failed!

To fix the issue I had to perform a 'yum reinstall container-selinux' on each node and then restart/relabel.

It would appear that this is actually a problem for multiple users.

Why is this happening?

Comment 19 Ben Bennett 2017-02-09 12:27:51 UTC
You should open a bug against docker / selinux... that's not really a networking problem.

Comment 20 Steven Walter 2017-02-09 16:09:58 UTC
(In reply to Ben Bennett from comment #19)
> You should open a bug against docker / selinux... that's not really a
> networking problem.

Does this bug apply: https://bugzilla.redhat.com/show_bug.cgi?id=1413536

Comment 21 Josh Baird 2017-02-09 16:18:39 UTC
Steve - yes, it would appear to be the same problem.

Comment 23 Red Hat Bugzilla 2023-09-14 03:52:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.