Bug 1565482
Summary: | SDN initialization failed for system container installation on RHEL | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gan Huang <ghuang> |
Component: | Installer | Assignee: | Vadim Rutkovsky <vrutkovs> |
Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.10.0 | CC: | aos-bugs, dma, jokerman, mmccomas, vrutkovs, wmeng, xtian |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | 3.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-07-30 19:12:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gan Huang
2018-04-10 06:18:01 UTC
This is system container installation, the node service file looks incorrect. # systemctl cat atomic-openshift-node.service # /usr/lib/systemd/system/atomic-openshift-node.service [Unit] Description=Atomic OpenShift Node After=docker.service After=openvswitch.service Wants=docker.service Documentation=https://github.com/openshift/origin [Service] Type=notify EnvironmentFile=/etc/sysconfig/atomic-openshift-node Environment=GOTRACEBACK=crash ExecStart=/usr/bin/openshift start node --config=${CONFIG_FILE} $OPTIONS LimitNOFILE=65536 LimitCORE=infinity WorkingDirectory=/var/lib/origin/ SyslogIdentifier=atomic-openshift-node Restart=always RestartSec=5s OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target # /etc/systemd/system/atomic-openshift-node.service.d/override.conf [Unit] After=cloud-init.service The node service unit file had been corrected somehow. # systemctl cat atomic-openshift-node.service # /etc/systemd/system/atomic-openshift-node.service [Unit] After=container-engine.service After=openvswitch.service Wants=container-engine.service After=atomic-openshift-node-dep.service After=atomic-openshift-master-controllers.service Requires=dnsmasq.service After=dnsmasq.service [Service] Type=notify EnvironmentFile=/etc/sysconfig/atomic-openshift-node ExecStartPre=/bin/bash -c 'export -p > /run/atomic-openshift-node-env' ExecStart=/usr/bin/runc --systemd-cgroup run 'atomic-openshift-node' ExecStop=/usr/bin/runc --systemd-cgroup kill 'atomic-openshift-node' SyslogIdentifier=atomic-openshift-node Restart=always RestartSec=5s WorkingDirectory=/var/lib/containers/atomic/atomic-openshift-node.0 RuntimeDirectory=atomic-openshift-node [Install] WantedBy=container-engine.service # /etc/systemd/system/atomic-openshift-node.service.d/override.conf [Unit] After=cloud-init.service But hitting another issue. The sdn pods are CrashLoopBackOff # oc get pod -n openshift-sdn NAME READY STATUS RESTARTS AGE ovs-nwxv7 1/1 Running 0 1h ovs-xf7h6 1/1 Running 0 1h sdn-s65k9 0/1 CrashLoopBackOff 19 1h sdn-t8prw 0/1 CrashLoopBackOff 21 1h # oc logs sdn-s65k9 -n openshift-sdn failed to open log file "/var/log/pods/ca1f0816-3ee7-11e8-8ee1-42010af0001f/sdn_19.log": open /var/log/pods/ca1f0816-3ee7-11e8-8ee1-42010af0001f/sdn_19.log: no such file or directory # oc describe po sdn-s65k9 -n openshift-sdn <--snip--> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-opt-cni-bin" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-openshift-sdn" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-dbus" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-config" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-etc-cni-netd" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-lib-cni-networks-openshift-sdn" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-kubernetes" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-sysconfig-node" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-modules" Normal SuccessfulMountVolume 1h (x3 over 1h) kubelet, qe-ghuang-node-registry-router-1 (combined from similar events): MountVolume.SetUp succeeded for volume "sdn-token-n9vqb" Normal Pulling 1h kubelet, qe-ghuang-node-registry-router-1 pulling image "registry.reg-aws.openshift.com:443/openshift3/node:v3.10" Normal Pulled 1h kubelet, qe-ghuang-node-registry-router-1 Successfully pulled image "registry.reg-aws.openshift.com:443/openshift3/node:v3.10" Normal Pulled 1h kubelet, qe-ghuang-node-registry-router-1 Container image "registry.reg-aws.openshift.com:443/openshift3/node:v3.10" already present on machine Normal Created 1h (x2 over 1h) kubelet, qe-ghuang-node-registry-router-1 Created container Warning Failed 1h (x2 over 1h) kubelet, qe-ghuang-node-registry-router-1 Error: failed to start container "sdn": Error response from daemon: error while creating mount source path '/opt/cni/bin': mkdir /opt/cni: read-only file system Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-kubernetes" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-sysconfig-node" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-dbus" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-lib-cni-networks-openshift-sdn" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-var-run-ovs" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-etc-cni-netd" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-modules" Normal SuccessfulMountVolume 1h kubelet, qe-ghuang-node-registry-router-1 MountVolume.SetUp succeeded for volume "host-config" Normal SuccessfulMountVolume 1h (x3 over 1h) kubelet, qe-ghuang-node-registry-router-1 (combined from similar events): MountVolume.SetUp succeeded for volume "sdn-token-n9vqb" Normal Pulled 1h (x3 over 1h) kubelet, qe-ghuang-node-registry-router-1 Container image "registry.reg-aws.openshift.com:443/openshift3/node:v3.10" already present on machine Normal Created 1h (x3 over 1h) kubelet, qe-ghuang-node-registry-router-1 Created container Warning Failed 1h (x3 over 1h) kubelet, qe-ghuang-node-registry-router-1 Error: failed to start container "sdn": Error response from daemon: error while creating mount source path '/opt/cni/bin': mkdir /opt/cni: read-only file system Warning BackOff 2m (x308 over 1h) kubelet, qe-ghuang-node-registry-router-1 Back-off restarting failed container Note: this is an system container installation on RHEL. TASK [openshift_node : Install or Update node system container] **************** Friday 13 April 2018 02:45:49 -0400 (0:01:43.746) 0:02:52.121 ********** changed: [qe-ghuang-master-etcd-1.0413-cot.qe.rhcloud.com] => {"changed": true, "failed": false, "msg": "Extracting to /var/lib/containers/atomic/atomic-openshift-node.0\nCreated file /opt/cni/bin/host-local\nCreated file /opt/cni/bin/openshift-sdn\nCreated file /opt/cni/bin/loopback\nsystemctl daemon-reload\nsystemd-tmpfiles --create /etc/tmpfiles.d/atomic-openshift-node.conf\nsystemctl enable atomic-openshift-node\n"} changed: [qe-ghuang-node-registry-router-1.0413-cot.qe.rhcloud.com] => {"changed": true, "failed": false, "msg": "Extracting to /var/lib/containers/atomic/atomic-openshift-node.0\nCreated file /opt/cni/bin/host-local\nCreated file /opt/cni/bin/openshift-sdn\nCreated file /opt/cni/bin/loopback\nsystemctl daemon-reload\nsystemd-tmpfiles --create /etc/tmpfiles.d/atomic-openshift-node.conf\nsystemctl enable atomic-openshift-node\n"} > Warning Failed 1h (x2 over 1h) kubelet, qe-ghuang-node-registry-router-1 Error: failed to start container "sdn": Error response from daemon: error while creating mount source path '/opt/cni/bin': mkdir /opt/cni: read-only file system /opt/cni/bin needs to be mounted, created https://github.com/openshift/origin/pull/19427 to fix it in the system container image merged https://github.com/openshift/origin/pull/19427 and the follow up fix https://github.com/openshift/origin/pull/19445 The fix for this is available in atomic-openshift-3.10.0-0.28.0.git.0.66790cb.el7 and openshift-ansible-3.10.0-0.28.0.git.0.439cb5c.el7 Verified in openshift-ansible-3.10.0-0.28.0.git.0.439cb5c.el7.noarch.rpm Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |