Bug 1507800 - Node service cannot automatically start after node reboot while enabling system containers
Summary: Node service cannot automatically start after node reboot while enabling syst...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Michael Gugino
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 08:09 UTC by Qin Ping
Modified: 2018-09-10 16:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-10 16:29:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qin Ping 2017-10-31 08:09:57 UTC
Description of problem:
After node reboot, node service can not start automaically.

openshift_ansible_vars:
  containerized: true
  openshift_use_system_containers: true
  system_images_registry: registry.ops.openshift.com
  openshift_docker_use_system_container: true
  openshift_docker_systemcontainer_image_override: registry.ops.openshift.com/openshift3/container-engine:v3.7
  openshift_use_crio: true
  openshift_crio_systemcontainer_image_override: registry.ops.openshift.com/openshift3/cri-o:v3.7

Version-Release number of the following components:
openshift openshift v3.7.0-0.184.0 and openshift-ansible-3.7.0-0.184.0

How reproducible:
Always

Steps to Reproduce:
1. Using the ansible vars listed above to install the OCP cluster
2. Reboot one of node

Actual results:
service atomic-openshift-node is not started.

Expected results:
service atomic-openshift-node is started.

Additional info:
Docker is hardcoded as a dependence of `WantedBy` in node unit file. Node can be started correctly after modifying `WantedBy=docker.service` to `WantedBy=container-engine.service`

# systemctl cat atomic-openshift-node
# /etc/systemd/system/atomic-openshift-node.service
[Unit]
After=container-engine.service
After=openvswitch.service
Wants=container-engine.service
After=atomic-openshift-node-dep.service
After=atomic-openshift.service
Requires=dnsmasq.service
After=dnsmasq.service

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/atomic-openshift-node
EnvironmentFile=/etc/sysconfig/atomic-openshift-node-dep

ExecStartPre=/bin/bash -c 'export -p > /run/atomic-openshift-node-env'
ExecStart=/bin/runc --systemd-cgroup run 'atomic-openshift-node'
ExecStop=/bin/runc --systemd-cgroup kill 'atomic-openshift-node'
SyslogIdentifier=atomic-openshift-node
Restart=always
RestartSec=5s
WorkingDirectory=/var/lib/containers/atomic/atomic-openshift-node.0
RuntimeDirectory=atomic-openshift-node

[Install]
WantedBy=docker.service

Comment 2 Michael Gugino 2018-01-24 15:06:52 UTC
I'm not sure how this bug is possible.  This was solved quite some time ago:
https://github.com/openshift/openshift-ansible/pull/4131

Comment 3 Qin Ping 2018-01-26 01:52:59 UTC
This problem still existed on OCP version: v3.9.0-0.23.0

Comment 4 Michael Gugino 2018-01-26 20:34:59 UTC
I have figured out what the root of this issue is.

1) We install the origin-node.service / atomic-openshift-node.service unit file before installing the system container.

2) System container for node is installed, this causes atomic to overwrite the service unit we created.

3) Service starts correctly after installing the system container, but since the 'docker' target is not available on crio, the unit is never linked to a 'target.wants'

Need to consult with syscontainer team to figure out best way forward.  Should we not install node as a system container so we can create our own unit file?  Even if we fix the ordering in asnible, it is likely that if the user were to update the container, the unit file would be overwritten again.

Perhaps there is a way to instruct atomic to not create the unit file?

Comment 5 Michael Gugino 2018-01-26 23:19:15 UTC
Looks like we can indeed template the service unit files.

I have submitted a PR against origin here: https://github.com/openshift/origin/pull/18314

Comment 6 Scott Dodson 2018-02-06 02:18:09 UTC
Should be fixed in v3.9.0-0.38.0

Comment 7 Qin Ping 2018-02-06 03:19:38 UTC
verified in v3.9.0-0.38.0


Note You need to log in before you can comment on or make changes to this bug.