Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1485312

Summary:

Pods fail to start during containerized install

Product:

OpenShift Container Platform

Reporter:

Jiří Mencák <jmencak>

Component:

Release

Assignee:

Justin Pierce <jupierce>

Status:

CLOSED ERRATA

QA Contact:

Wei Sun <wsun>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

3.7.0

CC:

aos-bugs, ekuric, gscrivan, jeder, jialiu, jmencak, jokerman, mmccomas, nelluri, sdodson, smunilla

Target Milestone:

---

Keywords:

TestBlocker

Target Release:

3.7.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

aos-scalability-37

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-05 09:28:25 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Openshift ansible log file.	none

Description Jiří Mencák 2017-08-25 11:25:32 UTC

Created attachment 1318123 [details]
Openshift ansible log file.

Description of problem:
Pods fails to start during containerized install in RHEL Atomic Host 7.4 and ocp 3.7

Version-Release number of selected component (if applicable):
oc v3.7.0-0.109.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-58-50.us-west-2.compute.internal:8443
openshift v3.7.0-0.109.0
kubernetes v1.7.0+695f48a16f

How reproducible:
Always, during advanced install.

Steps to Reproduce:
1. Use the advanced method of installation to install ocp 3.7 on Atomic Host

Actual results:
$ oc get pods
NAME              READY     STATUS              RESTARTS   AGE
router-1-deploy   0/1       ContainerCreating   0          10m

$ journalctl -xe|grep router|tail -n1
Aug 25 11:01:02 ip-172-31-58-50.us-west-2.compute.internal dockerd-current[38691]: E0825 11:01:02.053510    2841 pod_workers.go:182] Error syncing pod 2e512fea-8983-11e7-8064-026e9c4855a2 ("router-1-deploy_default(2e512fea-8983-11e7-8064-026e9c4855a2)"), skipping: failed to "KillPodSandbox" for "2e512fea-8983-11e7-8064-026e9c4855a2" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"router-1-deploy_default\" network: failed to find plugin \"openshift-sdn\" in path [/opt/openshift-sdn/bin /opt/cni/bin]"

Expected results:
Pods starting correctly.

Additional info:
~/openshift-ansible $ git describe
openshift-ansible-3.7.0-0.109.0

$ rpm -q ansible
ansible-2.3.1.0-3.el7.noarch

$ ansible --version
ansible 2.3.1.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

https://github.com/openshift/origin/issues/15953

I managed to install OCP 3.6 on the same RHEL Atomic Host image without any issues.

Comment 1 Giuseppe Scrivano 2017-08-25 13:54:23 UTC

Can you verify what is the output for?

"docker exec $NODE_CONTAINER ls /opt/cni"

If the files are missing there then the change (https://github.com/openshift/origin/pull/15468) didn't probably go into the image you are using.

Comment 2 Jiří Mencák 2017-08-25 14:03:33 UTC

Looks like it.

root@ip-172-31-58-50: ~ # docker ps|grep node
976e126bbae3        openshift3/node:v3.7.0-0.109.0          "/usr/local/bin/origi"   2 minutes ago       Up 2 minutes                            atomic-openshift-node
root@ip-172-31-58-50: ~ # docker exec 976e126bbae3 ls /opt/cni
ls: cannot access /opt/cni: No such file or directory

Comment 3 Steve Milner 2017-08-25 16:02:59 UTC

Verified the image is bad:

(from inside the image)
# rpm -q origin-sdn-ovs
package origin-sdn-ovs is not installed
[root@2a68f132e017 opt]# ls /opt/
[root@2a68f132e017 opt]#

Comment 5 Steve Milner 2017-08-25 16:07:24 UTC

Moving to Release component.

Comment 6 Justin Pierce 2017-08-25 17:49:02 UTC

It does look like the sdn-ovs RPM was lost in the Dockerfile reconciliation process. I've fixed the OCP version which you can find here: http://dist-git.host.prod.eng.bos.redhat.com/cgit/rpms/openshift-enterprise-node-docker/commit/Dockerfile?h=rhaos-3.7-rhel-7&id=51d624bbd507f304fddf1d09e0aad0b04187db23

This should be included in the next build of 3.7. 

* Note that for OCP, the rpm name would be atomic-openshift-sdn-ovs (not origin-sdn-ovs).

Comment 7 Justin Pierce 2017-08-28 21:18:10 UTC

This should be addressed as of: v3.7.0-0.117.0

Comment 8 Jiří Mencák 2017-08-29 06:50:52 UTC

Managed to get it working with v3.7.0-0.117.0 puddle.

[root@rhel-7 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Atomic Host release 7.4
[root@rhel-7 ~]# oc version
oc v3.7.0-0.117.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://rhel-7.4.novalocal:8443
openshift v3.7.0-0.117.0
kubernetes v1.7.0+695f48a16f
[root@rhel-7 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-spd74    1/1       Running   0          2m
registry-console-1-fw2xq   1/1       Running   0          2m
router-1-8rblt             1/1       Running   0          3m

Thank you!

Comment 9 Johnny Liu 2017-08-29 07:10:52 UTC

Verified this bug with v3.7.0-0.117.0 images, and passed.

[root@qe-wmeng37-master-etcd-1 ~]# openshift version
openshift v3.7.0-0.117.0
kubernetes v1.7.0+695f48a16f
etcd 3.2.1
[root@qe-wmeng37-master-etcd-1 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-3-px6fs    1/1       Running   0          5h
registry-console-1-g5mz0   1/1       Running   0          5h
router-1-094tp             1/1       Running   0          5h
[root@qe-wmeng37-master-etcd-1 ~]# docker ps|grep node
64f27b3d8a0d        openshift3/node:v3.7.0                  "/usr/local/bin/origi"   5 hours ago         Up 5 hours                              atomic-openshift-node
[root@qe-wmeng37-master-etcd-1 ~]# docker exec 64f27b3d8a0d ls /opt/cni
bin
[root@qe-wmeng37-master-etcd-1 ~]# docker exec 64f27b3d8a0d rpm -q atomic-openshift-sdn-ovs
atomic-openshift-sdn-ovs-3.7.0-0.117.0.git.0.b5a2a69.el7.x86_64

Comment 13 errata-xmlrpc 2018-04-05 09:28:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636