Created attachment 1318123 [details] Openshift ansible log file. Description of problem: Pods fails to start during containerized install in RHEL Atomic Host 7.4 and ocp 3.7 Version-Release number of selected component (if applicable): oc v3.7.0-0.109.0 kubernetes v1.7.0+695f48a16f features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-31-58-50.us-west-2.compute.internal:8443 openshift v3.7.0-0.109.0 kubernetes v1.7.0+695f48a16f How reproducible: Always, during advanced install. Steps to Reproduce: 1. Use the advanced method of installation to install ocp 3.7 on Atomic Host Actual results: $ oc get pods NAME READY STATUS RESTARTS AGE router-1-deploy 0/1 ContainerCreating 0 10m $ journalctl -xe|grep router|tail -n1 Aug 25 11:01:02 ip-172-31-58-50.us-west-2.compute.internal dockerd-current[38691]: E0825 11:01:02.053510 2841 pod_workers.go:182] Error syncing pod 2e512fea-8983-11e7-8064-026e9c4855a2 ("router-1-deploy_default(2e512fea-8983-11e7-8064-026e9c4855a2)"), skipping: failed to "KillPodSandbox" for "2e512fea-8983-11e7-8064-026e9c4855a2" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"router-1-deploy_default\" network: failed to find plugin \"openshift-sdn\" in path [/opt/openshift-sdn/bin /opt/cni/bin]" Expected results: Pods starting correctly. Additional info: ~/openshift-ansible $ git describe openshift-ansible-3.7.0-0.109.0 $ rpm -q ansible ansible-2.3.1.0-3.el7.noarch $ ansible --version ansible 2.3.1.0 config file = /root/openshift-ansible/ansible.cfg configured module search path = Default w/o overrides python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] https://github.com/openshift/origin/issues/15953 I managed to install OCP 3.6 on the same RHEL Atomic Host image without any issues.
Can you verify what is the output for? "docker exec $NODE_CONTAINER ls /opt/cni" If the files are missing there then the change (https://github.com/openshift/origin/pull/15468) didn't probably go into the image you are using.
Looks like it. root@ip-172-31-58-50: ~ # docker ps|grep node 976e126bbae3 openshift3/node:v3.7.0-0.109.0 "/usr/local/bin/origi" 2 minutes ago Up 2 minutes atomic-openshift-node root@ip-172-31-58-50: ~ # docker exec 976e126bbae3 ls /opt/cni ls: cannot access /opt/cni: No such file or directory
Verified the image is bad: (from inside the image) # rpm -q origin-sdn-ovs package origin-sdn-ovs is not installed [root@2a68f132e017 opt]# ls /opt/ [root@2a68f132e017 opt]#
Moving to Release component.
It does look like the sdn-ovs RPM was lost in the Dockerfile reconciliation process. I've fixed the OCP version which you can find here: http://dist-git.host.prod.eng.bos.redhat.com/cgit/rpms/openshift-enterprise-node-docker/commit/Dockerfile?h=rhaos-3.7-rhel-7&id=51d624bbd507f304fddf1d09e0aad0b04187db23 This should be included in the next build of 3.7. * Note that for OCP, the rpm name would be atomic-openshift-sdn-ovs (not origin-sdn-ovs).
This should be addressed as of: v3.7.0-0.117.0
Managed to get it working with v3.7.0-0.117.0 puddle. [root@rhel-7 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Atomic Host release 7.4 [root@rhel-7 ~]# oc version oc v3.7.0-0.117.0 kubernetes v1.7.0+695f48a16f features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://rhel-7.4.novalocal:8443 openshift v3.7.0-0.117.0 kubernetes v1.7.0+695f48a16f [root@rhel-7 ~]# oc get pods NAME READY STATUS RESTARTS AGE docker-registry-1-spd74 1/1 Running 0 2m registry-console-1-fw2xq 1/1 Running 0 2m router-1-8rblt 1/1 Running 0 3m Thank you!
Verified this bug with v3.7.0-0.117.0 images, and passed. [root@qe-wmeng37-master-etcd-1 ~]# openshift version openshift v3.7.0-0.117.0 kubernetes v1.7.0+695f48a16f etcd 3.2.1 [root@qe-wmeng37-master-etcd-1 ~]# oc get pods NAME READY STATUS RESTARTS AGE docker-registry-3-px6fs 1/1 Running 0 5h registry-console-1-g5mz0 1/1 Running 0 5h router-1-094tp 1/1 Running 0 5h [root@qe-wmeng37-master-etcd-1 ~]# docker ps|grep node 64f27b3d8a0d openshift3/node:v3.7.0 "/usr/local/bin/origi" 5 hours ago Up 5 hours atomic-openshift-node [root@qe-wmeng37-master-etcd-1 ~]# docker exec 64f27b3d8a0d ls /opt/cni bin [root@qe-wmeng37-master-etcd-1 ~]# docker exec 64f27b3d8a0d rpm -q atomic-openshift-sdn-ovs atomic-openshift-sdn-ovs-3.7.0-0.117.0.git.0.b5a2a69.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636