Description of problem: After stop node service, the service enter failed state. Version-Release number of selected component (if applicable): openshift v3.9.11 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16 How reproducible: Always Steps to Reproduce: 1.SSH into an OCP node. 2.Check the node service. 3.Stop node service. 4.Check node service. Actual results: After stop node service, the service enter failed state. Expected results: Node service should be in inactive state. Additional info: # systemctl status atomic-openshift-node.service ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: failed (Result: exit-code) since Sun 2018-03-18 23:39:05 EDT; 54min ago Docs: https://github.com/openshift/origin Process: 61696 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS) Process: 61694 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS) Process: 61455 ExecStart=/usr/bin/openshift start node --config=${CONFIG_FILE} $OPTIONS (code=exited, status=1/FAILURE) Process: 61453 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS) Process: 61452 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS) Main PID: 61455 (code=exited, status=1/FAILURE) Mar 18 23:39:04 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:04.868180 61455 generic.go:183] GenericPLEG: Relisting Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.144519 61455 fs.go:406] got devicemapper fs capacity stats: capacity: 8829009920 free: 6667370496 available: 6667370496: Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.156792 61455 kubelet.go:1924] SyncLoop (housekeeping) Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.273272 61455 config.go:141] Calling handler.OnEndpointsUpdate Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopping OpenShift Node... Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.865112 61455 docker_server.go:73] Stop docker server Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopped OpenShift Node. Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service failed. # journalctl -u atomic-openshift-node.service Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopping OpenShift Node... Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.865112 61455 docker_server.go:73] Stop docker server Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopped OpenShift Node. Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service failed.
Andrew, could you take a look? Basically openshift node is returning error on what should be a clean shutdown. Not a huge deal but it is messy.
I am able to reproduce this with 3.9.12 too. Investigating.
PR - https://github.com/openshift/origin/pull/19063
Might still make 3.10, but not a blocker.
Checked with version v3.11.0-0.17.0, the node enters inactive state, # systemctl status atomic-openshift-node.service ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Active: inactive (dead) since Thu 2018-08-23 04:35:01 EDT; 2s ago Docs: https://github.com/openshift/origin Process: 4740 ExecStart=/usr/local/bin/openshift-node (code=exited, status=0/SUCCESS) Main PID: 4740 (code=exited, status=0/SUCCESS)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652