Bug 1737036

Summary: On shutdown, paunch containers can be stopped before the containers they depend on
Product: Red Hat OpenStack Reporter: Damien Ciabrini <dciabrin>
Component: openstack-tripleo-heat-templatesAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: high    
Version: 15.0 (Stein)CC: emacchi, jschluet, lmiccini, mburns, michele, pkomarov
Target Milestone: rcKeywords: Triaged
Target Release: 15.0 (Stein)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190802160551.7561281.el8ost python-paunch-4.5.1-0.20190802160541.d105c6e.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:24:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damien Ciabrini 2019-08-02 11:57:47 UTC
Description of problem:
A paunch container has three systemd files associated with it:
  1. tripleo_*.service - the regular systemd service generated by paunch
  2. libpod-conmon*.scope - created dynamically by podman. runs a conmon
     process that creates a pidfile for tripleo_*.service and monitor it.
  3. libpod-*.scope - created dynamically by runc. for cgroups accounting

The liveness of the scopes is directly tied to that of the podman
container started by tripleo_*.service. Moreover, paunch can only set
start/stop dependencies on 1., not 2. and 3.

On reboot, systemd is allowed to stop 2. or 3. at any time, which means
that it can happen that systemd stops the container's scopes _before_ the tripleo_*.service itself.

When such unexpected stop sequence happens, the paunch service can be
stopped before all the services it depends on (e.g. nova-compute can
be stopped before nova-libvirt), and this can cause restart issue
after reboot.

There's no option in podman to configure the scope file to not stop
before the paunch service is stopped. The only workaround so far is to
inject an additional drop-in file for each scope, with extra
dependencies that prevents systemd from stopping the scopes file
before the paunch service is stopped.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.6.1-0.20190729150510.74ae8ba.el8ost.noarch

How reproducible:
Depends on systemd shutdown ordering

Steps to Reproduce:
1. deploy a overcloud
2. shutdown a compute node

Actual results:
systemd may stop nova-compute before nova-libvirt while the former depends on the latter

Expected results:
the ordering should always be respected during shutdown

Additional info:

Comment 5 pkomarov 2019-08-14 11:57:56 UTC
Verified , 

correct ordering is seen during a shutdown : 
[root@compute-0 ~]# journalctl -b -1 |grep 'libvirt\|nova'|tail -n 2
Aug 14 11:40:08 compute-0 systemd[1]: Stopped nova_libvirt container.
Aug 14 11:40:18 compute-0 systemd[1]: Stopped nova_compute container.

[root@compute-0 ~]# rpm -qa|grep paunch
paunch-services-4.5.1-0.20190802160541.d105c6e.el8ost.noarch
python3-paunch-4.5.1-0.20190802160541.d105c6e.el8ost.noarch
[root@compute-0 ~]# logout
[heat-admin@compute-0 ~]$ logout
Connection to 192.168.24.15 closed.
[stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-10.6.1-0.20190806190500.bdcffcd.el8ost.noarch

Comment 6 Emilien Macchi 2019-08-28 21:28:11 UTC
*** Bug 1710871 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2019-09-21 11:24:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811