Bug 1737036 - On shutdown, paunch containers can be stopped before the containers they depend on
Summary: On shutdown, paunch containers can be stopped before the containers they depe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: 15.0 (Stein)
Assignee: Damien Ciabrini
QA Contact: pkomarov
URL:
Whiteboard:
: 1710871 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-02 11:57 UTC by Damien Ciabrini
Modified: 2020-04-16 07:07 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190802160551.7561281.el8ost python-paunch-4.5.1-0.20190802160541.d105c6e.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:24:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 674090 0 'None' MERGED Generate addition drop-in dependencies for podman containers 2021-01-26 17:16:25 UTC
OpenStack gerrit 674094 0 'None' MERGED Generate addition drop-in dependencies for podman containers 2021-01-26 17:16:25 UTC
RDO 21666 0 None stein-rdo: MERGED openstack/paunch-distgit: (squash) backporting systemd drop-in script (Ib85ea22c4bc29f7864ddf3d7821524561e7134d2) 2019-08-06 13:19:37 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:24:41 UTC

Description Damien Ciabrini 2019-08-02 11:57:47 UTC
Description of problem:
A paunch container has three systemd files associated with it:
  1. tripleo_*.service - the regular systemd service generated by paunch
  2. libpod-conmon*.scope - created dynamically by podman. runs a conmon
     process that creates a pidfile for tripleo_*.service and monitor it.
  3. libpod-*.scope - created dynamically by runc. for cgroups accounting

The liveness of the scopes is directly tied to that of the podman
container started by tripleo_*.service. Moreover, paunch can only set
start/stop dependencies on 1., not 2. and 3.

On reboot, systemd is allowed to stop 2. or 3. at any time, which means
that it can happen that systemd stops the container's scopes _before_ the tripleo_*.service itself.

When such unexpected stop sequence happens, the paunch service can be
stopped before all the services it depends on (e.g. nova-compute can
be stopped before nova-libvirt), and this can cause restart issue
after reboot.

There's no option in podman to configure the scope file to not stop
before the paunch service is stopped. The only workaround so far is to
inject an additional drop-in file for each scope, with extra
dependencies that prevents systemd from stopping the scopes file
before the paunch service is stopped.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.6.1-0.20190729150510.74ae8ba.el8ost.noarch

How reproducible:
Depends on systemd shutdown ordering

Steps to Reproduce:
1. deploy a overcloud
2. shutdown a compute node

Actual results:
systemd may stop nova-compute before nova-libvirt while the former depends on the latter

Expected results:
the ordering should always be respected during shutdown

Additional info:

Comment 5 pkomarov 2019-08-14 11:57:56 UTC
Verified , 

correct ordering is seen during a shutdown : 
[root@compute-0 ~]# journalctl -b -1 |grep 'libvirt\|nova'|tail -n 2
Aug 14 11:40:08 compute-0 systemd[1]: Stopped nova_libvirt container.
Aug 14 11:40:18 compute-0 systemd[1]: Stopped nova_compute container.

[root@compute-0 ~]# rpm -qa|grep paunch
paunch-services-4.5.1-0.20190802160541.d105c6e.el8ost.noarch
python3-paunch-4.5.1-0.20190802160541.d105c6e.el8ost.noarch
[root@compute-0 ~]# logout
[heat-admin@compute-0 ~]$ logout
Connection to 192.168.24.15 closed.
[stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-10.6.1-0.20190806190500.bdcffcd.el8ost.noarch

Comment 6 Emilien Macchi 2019-08-28 21:28:11 UTC
*** Bug 1710871 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2019-09-21 11:24:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.