Bug 1726442

Summary:	SIGTERM from systemd to containers\|conmon on shutdown causes unexpected results
Product:	Red Hat Enterprise Linux 8	Reporter:	Damien Ciabrini <dciabrin>
Component:	podman	Assignee:	Jindrich Novy <jnovy>
Status:	CLOSED ERRATA	QA Contact:	Alex Jia <ajia>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	8.3	CC:	bbaude, dciabrin, dornelas, dwalsh, emacchi, gscrivan, jligon, jnovy, lsm5, mheon, michele, msekleta, pthomas, tsweeney, vrothber, ypu
Target Milestone:	rc	Keywords:	Reopened, Triaged
Target Release:	8.4
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	podman-3.0.0-2.el8 or newer	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-05-18 15:32:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1727325

Comment 1 Damien Ciabrini 2019-07-02 21:04:01 UTC

Description of problem:

In OSP15, each containerized OpenStack service runs in a podman
container. paunch is in charge of creating the podman container,
and it also provides a systemd service file so that systemd
monitors the running container and restarts it if needed.

On the other hand HA services run in podman containers that are
both created and monitored by pacemaker. systemd is not involved
in the monitoring of such services.

the systemd service files generated by paunch have some ordering
dependencies so that on shutdown, systemd stops services in the
following order

  1) first all the paunch-managed containers

  2) then pacemaker, which in turn is responsible for stopping its
     own containers.


Now with podman, each time a podman container is created with
e.g. "podman run", runc creates a conmon side process that ensures
that the container's pid 1 is still running. This conmon process
comes with its own auto-generated systemd scope file.



For the sake of the example, this is what podman internally creates
when pacemaker creates the containerized galera service:

#podman ps | grep galera
0b48b685e705  192.168.24.1:8787/rhosp15/openstack-mariadb:20190621.1                  dumb-init -- /bin...  2 hours ago     Up 2 hours ago            galera-bundle-podman-0

# systemctl -a | grep 0b48b685e705 | cat
  var-lib-containers-storage-overlay\x2dcontainers-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b-userdata-shm.mount loaded    active     mounted         /var/lib/containers/storage/overlay-containers/0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b/userdata/shm
  libpod-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope                                                        loaded    active     running         libcontainer container 0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b
  libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope                                                 loaded    active     running         libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope

# systemctl status libpod-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope
● libpod-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope - libcontainer container 0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b
   Loaded: loaded (/run/systemd/transient/libpod-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope; transient)
Transient: yes
   Active: active (running) since Tue 2019-07-02 13:53:17 UTC; 6h ago
    Tasks: 41 (limit: 26213)
   Memory: 279.5M
      CPU: 21min 11.976s
   CGroup: /machine.slice/libpod-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope
           ├─6360 dumb-init -- /bin/bash /usr/local/bin/kolla_start
           ├─6383 /usr/sbin/pacemaker_remoted
           ├─7246 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysql/mysqld.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://controlle>
           └─7597 /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mariadb/plugin --user=mysql --wsrep_on=ON --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://controller-0.internalapi.localdom>

# systemctl status libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope
● libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope
   Loaded: loaded (/run/systemd/transient/libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope; transient)
Transient: yes
   Active: active (running) since Tue 2019-07-02 13:53:16 UTC; 6h ago
    Tasks: 2 (limit: 26213)
   Memory: 1.0M
   CGroup: /machine.slice/libpod-conmon-0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b.scope
           └─6345 /usr/libexec/podman/conmon -s -c 0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b -u 0b48b685e7056063591cc45d90e17057fa5f34c0a005fd6e8cb9c565e4c0526b -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/0b48b685e7056063591cc45d90e17057fa5f34>


The problem with that extra systemd scope is that on machine shutdown,
it has no ordering constraint enforced. So unlike the
pacemaker-defined service, which can only be stopped after all paunch
services have stopped, this conmon scope/service will be stop as soon
as the reboot is triggered.

An unexpected side effect is that when the conmon service is commanded
to stop, it sends a SIGTERM signal to the pacemaker galera container,
and thus the galera service is effectively stopped before all paunch
services have been stopped by pacemaker, which violates the defined
orderning constraint in the first place.

How reproducible:
Always

Steps to Reproduce:
1. Create a systemd service A with a start command e.g. "ExecStart=/usr/bin/podman start container_a" 
2. Create another systemd service B with a start command e.g. "ExecStart=/usr/bin/podman start container_b", which depends on service A. i.e. "After=service_a.service"
3. reboot the host

Actual results:
Both service A and service B are stopped by systemd concurrently, because the conmon scope service created by podman/runc is stopped without orderning constraint

Expected results:
service B should stop first, then service A

Additional info:

Comment 3 Matthew Heon 2019-07-02 21:09:28 UTC

Potential partial solution at https://github.com/containers/libpod/pull/3474 (unverified at present)

Comment 6 Damien Ciabrini 2019-07-03 14:17:38 UTC

So I'm trying to come up with a small reproducer that mimick the way we set up our containers in OpenStack.
I'm not convinced I got a valid reproducer yet, but this is what I have right now:



### create a container that mimicks how our cluster manager manages podman container (i.e. without systemd)

podman create --name=service_a -d --net=host fedora sleep infinity

### create another container that mimicks how the other containers (the majority of Openstack containers) are spawned and monitored by systemd

podman create --name=service_b --conmon-pidfile=/var/run/service_b.pid -d --net=host fedora sleep infinity




### create the systemd services to mimick what is done in OpenStack
### - service_a is our cluster manager, which creates podman container without systemd nor service files
### - service_b is a regular openstack podman container, managed by systemd
### - mid_service is a dummy service that serves as a synchronization point to ensure that openstack services (here service_b) always stop before container managed by our cluster manager (here service_a)

cd /etc/systemd/system

cat >service_a.service <<'EOF'
[Unit]
Description=service A
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/podman start service_a
ExecStop=/usr/bin/podman stop -t 10 service_a
[Install]
WantedBy=multi-user.target
EOF

cat >mid_service.service <<'EOF'
[Unit]
Description=Mid-service time checkpoint
After=service_a.service
Before=shutdown.target
RefuseManualStop=yes
[Service]
Type=oneshot
ExecStart=/bin/true
RemainAfterExit=yes
ExecStop=/bin/true
[Install]
WantedBy=multi-user.target
EOF

cat >service_a.service <<'EOF'
[Unit]
Description=service B
After=mid_service.service
[Service]
Restart=always
ExecStart=/usr/bin/podman start service_b
ExecStop=/bin/sh -c "sleep 10 && echo 'arbitrary long sleep before stop' && /usr/bin/podman stop -t 10 service_b"
KillMode=none
Type=forking
PIDFile=/var/run/service_b.pid
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

### observe that service B starts after service A

systemctl enable service_a service_b mid_service --now


### observe that when stopping service B, there's a configured 10s delay before container effectively stops

systemctl stop service_b

Jul 03 13:44:42 controller-0 systemd[1]: Stopping service B...
Jul 03 13:44:52 controller-0 sh[101465]: arbitrary long sleep before stop
Jul 03 13:45:03 controller-0 systemd[1]: libpod-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope: Consumed 24ms CPU time
Jul 03 13:45:03 controller-0 sh[101465]: 5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da
Jul 03 13:45:03 controller-0 systemd[1]: Stopped service B.


# restart service_b before last test

# systemctl stop service_b

# observe than when rebooting, conmon scope for service A can be stopped by systemd even though service B hasn't fully stop yet

[root@controller-0 ~]# podman ps
CONTAINER ID  IMAGE                                                      COMMAND               CREATED            STATUS                PORTS  NAMES
[...a few containers in my env...]
5dcf0c985fca  docker.io/library/fedora:latest                            sleep infinity        2 hours ago        Up 59 seconds ago            service_b
16090b176250  docker.io/library/fedora:latest                            sleep infinity        2 hours ago        Up About an hour ago         service_a
[...other contaienrs in my env...]

[root@controller-0 ~]# systemctl -a | grep -e 5dcf0c985fca -e 16090b176250
  var-lib-containers-storage-overlay\x2dcontainers-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9-userdata-shm.mount loaded    active   mounted   /var/lib/containers/storage/overlay-containers/16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9/userdata/shm
  var-lib-containers-storage-overlay\x2dcontainers-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da-userdata-shm.mount loaded    active   mounted   /var/lib/containers/storage/overlay-containers/5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da/userdata/shm
  libpod-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope                                                        loaded    active   running   libcontainer container 16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9                                     
  libpod-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope                                                        loaded    active   running   libcontainer container 5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da                                     
  libpod-conmon-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope                                                 loaded    active   running   libpod-conmon-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope                                        
  libpod-conmon-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope                                                 loaded    active   running   libpod-conmon-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope                                        
reboot

# after reboot, show the sequence and observe that container 16090b176250 spawned by service a got stopped by systemd before stopping service B.

[root@controller-0 ~]# journalctl --since 12:59:56 -t systemd -t sh  | grep -e service -e 5dcf0c985fca -e 16090b176250 -e eboot                                                                                                                                                   
Jul 03 12:59:56 controller-0 systemd[1]: local-fs.target: Found dependency on systemd-tmpfiles-setup.service/stop
Jul 03 12:59:56 controller-0 systemd[1]: Stopping libpod-conmon-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope.
Jul 03 12:59:56 controller-0 systemd[1]: Stopping libcontainer container 16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.
Jul 03 12:59:56 controller-0 systemd[1]: Stopping libpod-conmon-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope.
Jul 03 12:59:56 controller-0 systemd[1]: Stopping service B...
Jul 03 12:59:56 controller-0 systemd[1]: user-runtime-dir: Unit not needed anymore. Stopping.
Jul 03 12:59:57 controller-0 systemd[1]: Stopping libcontainer container 5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.
Jul 03 12:59:57 controller-0 systemd[1]: user-runtime-dir: Unit not needed anymore. Stopping.
Jul 03 12:59:57 controller-0 systemd[1]: dnf-makecache.service: Main process exited, code=killed, status=15/TERM
Jul 03 12:59:57 controller-0 systemd[1]: dnf-makecache.service: Failed with result 'signal'.
Jul 03 12:59:57 controller-0 systemd[1]: Starting Show Plymouth Reboot Screen...
Jul 03 12:59:57 controller-0 systemd[1]: Started Show Plymouth Reboot Screen.
Jul 03 12:59:57 controller-0 systemd[1]: Stopped target NFS client services.
Jul 03 13:00:17 controller-0 systemd[1]: Stopped libcontainer container 5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.
Jul 03 13:00:17 controller-0 systemd[1]: libpod-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope: Consumed 27ms CPU time
Jul 03 13:00:17 controller-0 systemd[1]: Unmounted /var/lib/containers/storage/overlay-containers/5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da/userdata/shm.
Jul 03 13:00:17 controller-0 sh[27825]: 5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da
Jul 03 13:00:17 controller-0 systemd[1]: Stopped service B.
Jul 03 13:00:17 controller-0 systemd[1]: Stopping Mid-service time checkpoint...
Jul 03 13:00:17 controller-0 systemd[1]: Stopped libpod-conmon-5dcf0c985fcac4558bd6290f3ef6fecbd0e40a0e2ca67a53839ce57c5069b3da.scope.
Jul 03 13:00:17 controller-0 systemd[1]: Stopped Mid-service time checkpoint.
Jul 03 13:00:17 controller-0 systemd[1]: Stopping service A...
Jul 03 13:00:27 controller-0 systemd[1]: Stopped libcontainer container 16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.
Jul 03 13:00:27 controller-0 systemd[1]: libpod-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope: Consumed 32ms CPU time
Jul 03 13:00:27 controller-0 systemd[1]: Unmounted /var/lib/containers/storage/overlay-containers/16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9/userdata/shm.
Jul 03 13:00:27 controller-0 systemd[1]: Stopped service A.
Jul 03 13:00:27 controller-0 systemd[1]: Stopped libpod-conmon-16090b176250e8e9d12d41ac92773d17a3011d1d5da5a7b4466d651b705c30d9.scope.
Jul 03 13:00:27 controller-0 systemd[1]: Starting Reboot...
-- Reboot --


It looks like systemd doesn't care about orderning and begin to stop conmon for podman containers service_a and service_b even if the systemd service for service_b hasn't stopped yet.
I don't know how valid that reproducer is because ultimately I still see the log "Stopped libcontainer container 16090b1..." after the log "Stopped service B."

I'd still like to get confirm/disprove this entire theory that conmon sends SIGTERM to our containers at an unexpected time during reboot, by getting a unambiguous reboot log sequence. 
But this is essentially the sequence of events that is happening in our OpenStack env:

# before reboot, rabbitmq monitor is ok
Jul 03 14:01:43 controller-0 rabbitmq-cluster(rabbitmq)[138382]: DEBUG: rabbitmq monitor :
# a reboot is started, and conmon for rabbitmq container begins to stop
Jul 03 14:01:51 controller-0 systemd[1]: Stopping libpod-conmon-85b5a880b211a9fd0346166d30383c6fd8f2ad5b5ee0a20970f0d1158be26e43.scope.
# the main pid in the rabbitmq container detects that it got requested to terminate
Jul 03 14:01:51 controller-0 pacemaker-remoted[3954]: notice: Caught 'Terminated' signal
# only now did the regular systemd-managed openstack service stops. (i.e. rabbitmq shouldn't stop before horizon per systemd dependencies)
Jul 03 14:01:51 controller-0 systemd[1]: Stopping horizon container...
Jul 03 14:01:51 controller-0 pacemaker-controld[2623]: notice: rabbitmq-bundle-0 requested shutdown of its remote connection
Jul 03 14:02:01 controller-0 systemd[1]: Stopped horizon container.
Jul 03 14:02:01 controller-0 systemd[1]: Stopping Paunch Container Shutdown...
Jul 03 14:02:01 controller-0 systemd[1]: Stopped Paunch Container Shutdown.
Jul 03 14:02:01 controller-0 pacemakerd[2596]: notice: Caught 'Terminated' signal
# only now our cluster manager begins stopped
Jul 03 14:02:01 controller-0 systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
Jul 03 14:02:01 controller-0 pacemakerd[2596]: notice: Shutting down Pacemaker
Jul 03 14:02:01 controller-0 pacemakerd[2596]: notice: Stopping pacemaker-controld
Jul 03 14:02:01 controller-0 pacemaker-controld[2623]: notice: Caught 'Terminated' signal
[...]
# and apparently the rabbitmq process in the rabbitmq container already got stopped
Jul 03 14:02:16 controller-0 rabbitmq-cluster(rabbitmq)[140011]: INFO: RabbitMQ server is not running
Jul 03 14:02:16 controller-0 rabbitmq-cluster(rabbitmq)[140016]: DEBUG: rabbitmq stop : 0
Jul 03 14:02:16 controller-0 pacemaker-remoted[3954]: notice: rabbitmq_stop_0:87492:stderr [ Error: unable to perform an operation on node 'rabbit@controller-0'. Please see diagnostics information and suggestions below. ]
# only that the rabbitmq container gets stopped
Jul 03 14:02:16 controller-0 systemd[1]: Stopped libcontainer container 85b5a880b211a9fd0346166d30383c6fd8f2ad5b5ee0a20970f0d1158be26e43.
Jul 03 14:02:16 controller-0 systemd[1]: libpod-85b5a880b211a9fd0346166d30383c6fd8f2ad5b5ee0a20970f0d1158be26e43.scope: Consumed 9min 41.625s CPU time
Jul 03 14:02:16 controller-0 systemd[1]: Unmounted /var/lib/containers/storage/overlay-containers/85b5a880b211a9fd0346166d30383c6fd8f2ad5b5ee0a20970f0d1158be26e43/userdata/shm.
Jul 03 14:02:16 controller-0 systemd[1]: Unmounted /var/lib/containers/storage/overlay/41beba6aded9ddfd4d89570f2831d2349b784483904fc54983756e2f7627389a/merged.
Jul 03 14:02:16 controller-0 systemd[1]: Stopped libpod-conmon-85b5a880b211a9fd0346166d30383c6fd8f2ad5b5ee0a20970f0d1158be26e43.scope.

Comment 7 Emilien Macchi 2019-07-17 18:18:49 UTC

I'ved tested Damien's solution, did a reboot and here are the logs:

http://ix.io/1OKd

We can see that both rabbitmq & galera are stopped *after* non-ha containers, I think this is a viable option given the fact we have no other alternative at this time.

Comment 8 Damien Ciabrini 2019-08-07 14:30:21 UTC

FTR, until we have a programmatic way to configure those dependencies in podman, here are the two workarounds that we've implemented for OpenStack:
1. cluster-managed podman containers: https://bugzilla.redhat.com/show_bug.cgi?id=1738303
2. systemd-managed podman containers: https://bugzilla.redhat.com/show_bug.cgi?id=1737036

Comment 11 Daniel Walsh 2019-08-15 10:18:04 UTC

Moving this out to RHEL8.2 since it has a lower priority and will not be fixed for the 8.1 release.

Comment 15 Daniel Walsh 2019-11-17 14:10:57 UTC

Lets push this to 8.3 release.

Comment 25 Daniel Walsh 2021-01-28 12:23:56 UTC

Matt and Valentin has this been fixed i podman 3.0?

Comment 26 Matthew Heon 2021-01-28 14:27:18 UTC

I believe Giuseppe's cgroups=split patch may have provided a resolution to systemd shutdown ordering, but I'm insufficiently familiar with systemd's shutdown process to be sure, and I've never tested this.

Comment 27 Valentin Rothberg 2021-01-29 13:55:39 UTC

I think we need some expertise from the systemd team.  Pulling in, Michal.

@Michal:  I will summarize the issue quickly and describe what I am seeing in my local reproducers.

Assume we have two units (A and B).  Both are generated via `podman generate systemd`.  Unit B is set "After=A.service".  When I stop the units, B is stopped before A.  The journal clearly indicates that `Stopping A` happens after `Stopped B`.

Now when I reboot the machine (systemctl reboot) the stop order changes.  A and B are stopped simultaneously.  As suggested above, adding a `sleep` before `podman stop` makes that easier to see.

Can you give guidance on how to enforce the ordering at shutdown/reboot?

Comment 28 RHEL Program Management 2021-02-01 07:41:54 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 29 Valentin Rothberg 2021-02-01 07:53:35 UTC

I am going to reopen since this bug is serious.

Michal, Giuseppe, Dan and I had a debugging session on Friday.  We looked specifically at the shutdown scenario where the Order of among the Podman services was seemingly changed.  The problem lies in systemd scope that Podman creates and runs the container in.  Systemd does not know that the scope relates to the unit Podman runs in.  Hence, as soon as the shutdown starts, such "orphaned" scopes are being cleaned up by systemd.  This means that the container is being killed but note that conmon is not involved.  Once the services are about to stop, conmon fails immediately since the container has already been killed.

We want to explore two solutions:
1) Create a split-mode for Cgroups v1.
2) Let Podman send dbus messages to inform systemd about the scope.

Another simple workaround is to use `podman create/run --cgroups=disabled`.  This way, the container runs in the unit's Cgroup.

Comment 39 Alex Jia 2021-02-23 12:00:38 UTC

(In reply to Damien Ciabrini from comment #6)
> So I'm trying to come up with a small reproducer that mimick the way we set
> up our containers in OpenStack.
> I'm not convinced I got a valid reproducer yet, but this is what I have
> right now:

Hi Damien,
I tried to use your reproducer to verify this bug on podman-3.0.1-1.module+el8.4.0+10073+30e5ea69
w/ crun-0.18-1.module+el8.4.0+10073+30e5ea69, the following is my test output, please help confirm
whether it is enough for you, thanks!

[root@ibm-x3650m4-01-vm-02 system]# systemctl daemon-reload
[root@ibm-x3650m4-01-vm-02 system]# systemctl enable service_a service_b mid_service --now
Created symlink /etc/systemd/system/multi-user.target.wants/service_a.service → /etc/systemd/system/service_a.service.
Created symlink /etc/systemd/system/multi-user.target.wants/service_b.service → /etc/systemd/system/service_b.service.
Created symlink /etc/systemd/system/multi-user.target.wants/mid_service.service → /etc/systemd/system/mid_service.service.

[root@ibm-x3650m4-01-vm-02 system]# systemctl stop service_b
NOTE: there's a configured 10s delay in here.

[root@ibm-x3650m4-01-vm-02 system]# systemctl restart service_b
[root@ibm-x3650m4-01-vm-02 system]# podman ps
CONTAINER ID  IMAGE                                     COMMAND               CREATED        STATUS             PORTS   NAMES
f9ef0d2a2255  registry.fedoraproject.org/fedora:latest  sleep infinity        6 minutes ago  Up 58 seconds ago          service_a
c13ade2a0a18  registry.fedoraproject.org/fedora:latest  sleep infinity        6 minutes ago  Up 10 seconds ago          service_b

[root@ibm-x3650m4-01-vm-02 system]# systemctl -a | grep -e  f9ef0d2a2255 -e c13ade2a0a18
  var-lib-containers-storage-overlay\x2dcontainers-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6-userdata-shm.mount loaded    active   mounted   /var/lib/containers/storage/overlay-containers/c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6/userdata/shm
  var-lib-containers-storage-overlay\x2dcontainers-f9ef0d2a22558b1d8d667043ab648f6e8dc22396d513fd72d34c8f92f8303327-userdata-shm.mount loaded    active   mounted   /var/lib/containers/storage/overlay-containers/f9ef0d2a22558b1d8d667043ab648f6e8dc22396d513fd72d34c8f92f8303327/userdata/shm
  libpod-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6.scope                                                        loaded    active   running   libcrun container                                                                                                           
  libpod-f9ef0d2a22558b1d8d667043ab648f6e8dc22396d513fd72d34c8f92f8303327.scope                                                        loaded    active   running   libcrun container                                                                                                           

NOTE: no libpod-conmon-xxx is found in here.

[root@ibm-x3650m4-01-vm-02 system]# journalctl --since 10:40:00 -t systemd -t sh  | grep -e service -e  f9ef0d2a2255 -e c13ade2a0a18
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Starting service A...
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Starting service B...
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Started service A.
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Starting Mid-service time checkpoint...
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Started Mid-service time checkpoint.
Feb 23 10:46:42 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Started service B.
Feb 23 10:46:45 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Stopping service B...
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: libpod-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6.scope: Succeeded.
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[49951]: var-lib-containers-storage-overlay\x2dcontainers-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6-userdata-shm.mount: Succeeded.
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: var-lib-containers-storage-overlay\x2dcontainers-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6-userdata-shm.mount: Succeeded.
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[5100]: var-lib-containers-storage-overlay\x2dcontainers-c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6-userdata-shm.mount: Succeeded.
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com sh[67534]: c13ade2a0a1889869826dec21a221daacf2058acc2a32cd2729f2bdd465427e6
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: service_b.service: Succeeded.
Feb 23 10:47:06 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Stopped service B.
Feb 23 10:47:30 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Starting service B...
Feb 23 10:47:30 ibm-x3650m4-01-vm-02.ibm2.lab.eng.bos.redhat.com systemd[1]: Started service B.

NOTE: I ran 'systemctl daemon-reload' and 'systemctl enable service_a service_b mid_service --now' two times,
because it has a typo in your reproducer(there are two 'service_a.service' w/o 'service_b.service').

And I also gave a tests for '--cgroups=split' command option.

[root@ibm-x3650m4-01-vm-02 ~]# podman run --rm --cgroups=split quay.io/libpod/alpine cat /proc/self/cgroup
Trying to pull quay.io/libpod/alpine:latest...
Getting image source signatures
Copying blob 9d16cba9fb96 done  
Copying config 9617696764 done  
Writing manifest to image destination
Storing signatures
12:hugetlb:/user.slice/user-0.slice/session-11.scope/container
11:rdma:/
10:devices:/user.slice/user-0.slice/session-11.scope/container
9:memory:/user.slice/user-0.slice/session-11.scope/container
8:freezer:/user.slice/user-0.slice/session-11.scope/container
7:cpu,cpuacct:/user.slice/user-0.slice/session-11.scope/container
6:net_cls,net_prio:/user.slice/user-0.slice/session-11.scope/container
5:cpuset:/user.slice/user-0.slice/session-11.scope/container
4:perf_event:/user.slice/user-0.slice/session-11.scope/container
3:pids:/user.slice/user-0.slice/session-11.scope/container
2:blkio:/user.slice/user-0.slice/session-11.scope/container
1:name=systemd:/user.slice/user-0.slice/session-11.scope/supervisor

Comment 40 Damien Ciabrini 2021-02-23 14:05:04 UTC

Hey Alex,

comment #6 was a big unclear, so let me restate what we want to verify.

Given the three order-dependent services A, B and mid from comment #6, we want the following to always work:


1. During a shutdown - When _all_ services are stopped at the same time - the stopping of service A should always take place after service B has fully stopped.

So the test in comment #39 is not enough, you should perform a `systemctl reboot` of the node and verify that the logs stopped the services in the right order.


2. Valentin added another good point in comment #27: once the node has been restarted, we want to ensure that the start/stop dependencies are still enforced by systemd.

So doing a second `systemctl reboot` should also stop service A only once service B is fully stopped.

Comment 44 Alex Jia 2021-02-24 11:27:07 UTC

Thank you Damien! 

Moving this bug to VERIFIED state according to Damien's testing.

Comment 46 errata-xmlrpc 2021-05-18 15:32:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1796