Description of problem: After following the manual upgrade process for a 3.3.1 installation to 3.4 my router, docker-registry and registry-console pods are unable to setup their networks when scheduled. Version-Release number of selected component (if applicable): openshift v3.4.0.39 kubernetes v1.4.0+776c994 How reproducible: Always Steps to Reproduce: 1. Follow the manual upgrade outlined in the documentation for 3.4 from 3.3.1.7 2. 3. Actual results: 9m 1s 82 {kubelet ocp-master1.thelinuxshack.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "router-9-deploy_default" with SetupNetworkError: "Failed to setup network for pod \"router-9-deploy_default(b13f4b14-de7e-11e6-968a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod" 13m 13m 1 {kubelet ocp-master1.thelinuxshack.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "docker-registry-11-3ykik_default" with SetupNetworkError: "Failed to setup network for pod \"docker-registry-11-3ykik_default(01aee4e0-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/25346/ns/net\": no such file or directory; Skipping pod" 16m 23s 46 {kubelet ocp-node2.thelinuxshack.com} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "registry-console-1-kn442_default" with SetupNetworkError: "Failed to setup network for pod \"registry-console-1-kn442_default(01af2237-de7c-11e6-a85a-52540089f0cc)\" using network plugins \"cni\": failed to Statfs \"/proc/0/ns/net\": no such file or directory; Skipping pod" Expected results: The SetupNetwork routine does not fail. Additional info: This is a single schedule=true master with two additional node installation.
If you remove /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf and restart docker does the problem go away?
That file does not exist. I went to remove it as a part of the upgrade process and it wasn't there then either. # stat /etc/systemd/system/docker.service.d/docker-sdn-ovs.conf stat: cannot stat ‘/etc/systemd/system/docker.service.d/docker-sdn-ovs.conf’: No such file or directory
The following directory does not exist either: /etc/systemd/system/docker.service.d
# grep ovs /etc/origin/master/master-config.yaml networkPluginName: redhat/openshift-ovs-subnet
This seems to be selinux related. I turned off selinux (setenforce 0) and all my pods were able to SetupNetwork and start. And the error went away.
Can you get the selinux audit entries please?
This may be the culprit: type=SYSCALL msg=audit(1484866535.244:2144): arch=c000003e syscall=59 success=no exit=-13 a0=c42010ca3a a1=c42010ca40 a2=c4200bc1e0 a3=0 items=0 ppid=32195 pid=32211 auid=4294967295 uid=1001 gid=0 euid=1001 suid=1001 fsuid=1001 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="exe" exe="/usr/libexec/docker/docker-runc-current" subj=system_u:system_r:unconfined_service_t:s0 key=(null) # ls -lZ /usr/libexec/docker/* -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/libexec/docker/docker-proxy-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/libexec/docker/docker-runc-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/libexec/docker/rhel-push-plugin # ls -lZ /usr/bin/docker* -rwxr-xr-x. root root system_u:object_r:docker_exec_t:s0 /usr/bin/docker -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-containerd -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-containerd-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-containerd-shim -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-containerd-shim-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-ctr-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/dockerd-current -rwxr-xr-x. root root system_u:object_r:bin_t:s0 /usr/bin/docker-storage-setup
FYI While doing the package updates I got: Updating : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch 36/97 Re-declaration of type docker_t Failed to create node Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1 /usr/sbin/semodule: Failed!
# chcon -t container_runtime_exec_t /usr/bin/docker* chcon: failed to change context of ‘/usr/bin/docker’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-containerd’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-containerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-containerd-shim’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-containerd-shim-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-ctr-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/dockerd-current’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument chcon: failed to change context of ‘/usr/bin/docker-storage-setup’ to ‘system_u:object_r:container_runtime_exec_t:s0’: Invalid argument Am i doing something wrong here?
Nothing is returned from the following command: semanage fcontext -l | grep container_runtime_exec_t
Add Dan Walsh to see the selinux problem
Forcing a reinstall container-selinux package fixed the issue and labeled the docker components appropriately: # ls -lZ /usr/bin/docker* -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-containerd-shim-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-ctr-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/dockerd-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/bin/docker-storage-setup # ls -laZ /usr/libexec/docker/* -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-proxy-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/docker-runc-current -rwxr-xr-x. root root system_u:object_r:container_runtime_exec_t:s0 /usr/libexec/docker/rhel-push-plugin
I just hit the same bug after upgrading from OCP 3.3 to OCP 3.4 and performing a 'yum update' to the latest RHEL 7.3.z: Updating : selinux-policy-targeted-3.13.1-102.el7_3.13.noarch 36/97 Re-declaration of type docker_t Failed to create node Bad type declaration at /etc/selinux/targeted/tmp/modules/100/docker/cil:1 /usr/sbin/semodule: Failed! To fix the issue I had to perform a 'yum reinstall container-selinux' on each node and then restart/relabel. It would appear that this is actually a problem for multiple users. Why is this happening?
You should open a bug against docker / selinux... that's not really a networking problem.
(In reply to Ben Bennett from comment #19) > You should open a bug against docker / selinux... that's not really a > networking problem. Does this bug apply: https://bugzilla.redhat.com/show_bug.cgi?id=1413536
Steve - yes, it would appear to be the same problem.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days