Bug 1414963

Summary: SetupNetworkError "Failed to setup network for pod" after 3.4 Upgrade
Product: OpenShift Container Platform Reporter: Nick Schuetz <nschuetz>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED NOTABUG QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: aos-bugs, dwalsh, jbaird, jokerman, mmccomas, nschuetz, stwalter
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-24 19:27:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nick Schuetz 2017-01-19 20:17:17 UTC
Description of problem:

After following the manual upgrade process for a 3.3.1 installation to 3.4 my router, docker-registry and registry-console pods are unable to setup their networks when scheduled.

Version-Release number of selected component (if applicable):

openshift v3.4.0.39
kubernetes v1.4.0+776c994

How reproducible:

Always

Steps to Reproduce:
1. Follow the manual upgrade outlined in the documentation for 3.4 from 3.3.1.7
2.
3.

Actual results:

  9m	1s	82	{kubelet ocp-master1.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "router-9-deploy_default" with SetupNetworkError: "Failed to setup network for pod \"router-9-deploy_default(b13f4b14-de7e-11e6-968a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod"

  13m	13m	1	{kubelet ocp-master1.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "docker-registry-11-3ykik_default" with SetupNetworkError: "Failed to setup network for pod \"docker-registry-11-3ykik_default(01aee4e0-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/25346/ns/net\": no such file or directory; Skipping pod"


  16m	23s	46	{kubelet ocp-node2.thelinuxshack.com}		Warning	FailedSync	Error syncing pod, skipping: failed to "SetupNetwork" for "registry-console-1-kn442_default" with SetupNetworkError: "Failed to setup network for pod \"registry-console-1-kn442_default(01af2237-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod"

Expected results:

The SetupNetwork routine does not fail.

Additional info:

This is a single schedule=true master with two additional node installation.

Comment 1 Scott Dodson 2017-01-19 21:22:19 UTC
If you remove /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf and restart docker does the problem go away?

Comment 2 Nick Schuetz 2017-01-19 21:25:05 UTC
That file does not exist. I went to remove it as a part of the upgrade process and it wasn't there then either.

# stat /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf
stat: cannot stat ‘/etc/systemd/system/docker.service.d/docker-sdn-ovs.conf’: No such file or directory

Comment 3 Nick Schuetz 2017-01-19 21:29:43 UTC
The following directory does not exist either:

/etc/systemd/system/docker.service.d

Comment 5 Nick Schuetz 2017-01-19 21:56:49 UTC
# grep ovs /etc/origin/master/master-config.yaml
  networkPluginName: redhat/openshift-ovs-subnet

Comment 6 Nick Schuetz 2017-01-19 22:48:26 UTC
This seems to be selinux related. I turned off selinux (setenforce 0) and all my pods were able to SetupNetwork and start. And the error went away.

Comment 7 Ben Bennett 2017-01-19 22:50:09 UTC
Can you get the selinux audit entries please?

Comment 8 Nick Schuetz 2017-01-19 22:58:06 UTC
This may be the culprit:

type=SYSCALL msg=audit(1484866535.244:2144): arch=c000003e syscall=59 success=no exit=-13 a0=c42010ca3a a1=c42010ca40 a2=c4200bc1e0 a3=0 items=0 ppid=32195 pid=32211 auid=4294967295 uid=1001 gid=0 euid=1001 suid=1001 fsuid=1001 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="exe" exe="/usr/libexec/docker/docker-runc-current" subj=system_u:system_r:unconfined_service_t:s0 key=(null)

# ls -lZ /usr/libexec/docker/*
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/docker-proxy-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/docker-runc-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/libexec/docker/rhel-push-plugin

# ls -lZ /usr/bin/docker*
-rwxr-xr-x. root root system_u:object_r:docker_exec_t:s0 /usr/bin/docker
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-shim
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-containerd-shim-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-ctr-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/dockerd-current
-rwxr-xr-x. root root system_u:object_r:bin_t:s0       /usr/bin/docker-storage-setup

Comment 9 Nick Schuetz 2017-01-19 23:02:15 UTC
FYI While doing the package updates I got:

  Updating   : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch                                                                                                                                                                                                                                                                                                  36/97 
Re-declaration of type docker_t
Failed to create node
Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1
/usr/sbin/semodule:  Failed!

Comment 10 Nick Schuetz 2017-01-20 00:21:01 UTC
# chcon -t container_runtime_exec_t /usr/bin/docker*
chcon: failed to change context of ‘/usr/bin/docker’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-shim’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-containerd-shim-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-ctr-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/dockerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument
chcon: failed to change context of ‘/usr/bin/docker-storage-setup’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument

Am i doing something wrong here?

Comment 11 Nick Schuetz 2017-01-20 22:31:25 UTC
Nothing is returned from the following command:

semanage fcontext -l | grep container_runtime_exec_t

Comment 12 Meng Bo 2017-01-22 03:23:38 UTC
Add Dan Walsh to see the selinux problem

Comment 14 Nick Schuetz 2017-01-24 15:26:00 UTC
Forcing a reinstall container-selinux package fixed the issue and labeled the docker components appropriately:

# ls -lZ /usr/bin/docker*
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-ctr-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/dockerd-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-storage-setup
# ls -laZ /usr/libexec/docker/*
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-proxy-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-runc-current
-rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/rhel-push-plugin

Comment 18 Josh Baird 2017-02-09 04:02:26 UTC
I just hit the same bug after upgrading from OCP 3.3 to OCP 3.4 and performing a 'yum update' to the latest RHEL 7.3.z:

Updating   : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch                                                                                                                                                                                                                                                                                                  36/97 
Re-declaration of type docker_t
Failed to create node
Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1
/usr/sbin/semodule:  Failed!

To fix the issue I had to perform a 'yum reinstall container-selinux' on each node and then restart/relabel.

It would appear that this is actually a problem for multiple users.

Why is this happening?

Comment 19 Ben Bennett 2017-02-09 12:27:51 UTC
You should open a bug against docker / selinux... that's not really a networking problem.

Comment 20 Steven Walter 2017-02-09 16:09:58 UTC
(In reply to Ben Bennett from comment #19)
> You should open a bug against docker / selinux... that's not really a
> networking problem.

Does this bug apply: https://bugzilla.redhat.com/show_bug.cgi?id=1413536

Comment 21 Josh Baird 2017-02-09 16:18:39 UTC
Steve - yes, it would appear to be the same problem.

Comment 23 Red Hat Bugzilla 2023-09-14 03:52:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days