Bug 1557851 - Node service enter failed state after stop node service.
Summary: Node service enter failed state after stop node service.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On: 1570461
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-19 05:20 UTC by Liang Xia
Modified: 2018-10-11 07:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-10-11 07:19:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:20:11 UTC

Description Liang Xia 2018-03-19 05:20:32 UTC
Description of problem:
After stop node service, the service enter failed state.

Version-Release number of selected component (if applicable):
openshift v3.9.11
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1.SSH into an OCP node.
2.Check the node service.
3.Stop node service.
4.Check node service.

Actual results:
After stop node service, the service enter failed state.

Expected results:
Node service should be in inactive state.

Additional info:
# systemctl status atomic-openshift-node.service 
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: failed (Result: exit-code) since Sun 2018-03-18 23:39:05 EDT; 54min ago
     Docs: https://github.com/openshift/origin
  Process: 61696 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
  Process: 61694 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
  Process: 61455 ExecStart=/usr/bin/openshift start node --config=${CONFIG_FILE} $OPTIONS (code=exited, status=1/FAILURE)
  Process: 61453 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
  Process: 61452 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
 Main PID: 61455 (code=exited, status=1/FAILURE)

Mar 18 23:39:04 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:04.868180   61455 generic.go:183] GenericPLEG: Relisting
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.144519   61455 fs.go:406] got devicemapper fs capacity stats: capacity: 8829009920 free: 6667370496 available: 6667370496:
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.156792   61455 kubelet.go:1924] SyncLoop (housekeeping)
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.273272   61455 config.go:141] Calling handler.OnEndpointsUpdate
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopping OpenShift Node...
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.865112   61455 docker_server.go:73] Stop docker server
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopped OpenShift Node.
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service failed.


# journalctl -u atomic-openshift-node.service
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopping OpenShift Node...
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 atomic-openshift-node[61455]: I0318 23:39:05.865112   61455 docker_server.go:73] Stop docker server
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Stopped OpenShift Node.
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Mar 18 23:39:05 qe-lxia-master-etcd-nfs-1 systemd[1]: atomic-openshift-node.service failed.

Comment 1 Seth Jennings 2018-03-19 23:12:23 UTC
Andrew, could you take a look?  Basically openshift node is returning error on what should be a clean shutdown.  Not a huge deal but it is messy.

Comment 2 Andrew McDermott 2018-03-21 15:39:15 UTC
I am able to reproduce this with 3.9.12 too. Investigating.

Comment 3 Andrew McDermott 2018-03-22 15:06:57 UTC
PR - https://github.com/openshift/origin/pull/19063

Comment 4 Seth Jennings 2018-05-03 14:18:15 UTC
Might still make 3.10, but not a blocker.

Comment 6 Liang Xia 2018-08-23 08:44:27 UTC
Checked with version v3.11.0-0.17.0, the node enters inactive state,

# systemctl status atomic-openshift-node.service 
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2018-08-23 04:35:01 EDT; 2s ago
     Docs: https://github.com/openshift/origin
  Process: 4740 ExecStart=/usr/local/bin/openshift-node (code=exited, status=0/SUCCESS)
 Main PID: 4740 (code=exited, status=0/SUCCESS)

Comment 8 errata-xmlrpc 2018-10-11 07:19:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.