Bug 1850303
Summary: | podman containers are not properly cleaned and restarted when their conmon process is killed | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Takashi Kajinami <tkajinam> |
Component: | python-paunch | Assignee: | Takashi Kajinami <tkajinam> |
Status: | CLOSED ERRATA | QA Contact: | nlevinki <nlevinki> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.0 (Train) | CC: | aschultz, emacchi, kecarter, ramishra |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-paunch-5.3.3-1.20200810143359.6f44509.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-28 15:38:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Takashi Kajinami
2020-06-24 01:05:16 UTC
After fixing the problem by stopping podman container and restarting it via systemd, the process is restarted under common process expectedly. ~~~ [heat-admin@controller-0 ~]$ sudo podman ps | grep nova_conductor 0edc910c83a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:20200416.1 kolla_start 34 hours ago Up About a minute ago nova_conductor [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor |-conmon(21169)-+-dumb-init(21181)---nova-conductor(21206)-+-nova-conductor(21491) | | |-nova-conductor(21492) | | |-nova-conductor(21493) | | `-nova-conductor(21494) ~~~ There is a fix merged into podman recently, which makes ExecStopPost also configured in systemd unit files, so that container processes are actually stopped even common process fails. I think we need to implement the same in tripleo ansible, so that generated systemd file has ExecStopPost. https://github.com/containers/libpod/commit/e5c3432944245a740ed443803c654dcc9c3757f0 I tested systemd unit file with ExecStopPost added ~~~ [heat-admin@controller-0 ~]$ sudo cat /etc/systemd/system/tripleo_nova_conductor.service [Unit] Description=nova_conductor container After=paunch-container-shutdown.service Wants= [Service] Restart=always ExecStart=/usr/bin/podman start nova_conductor ExecReload=/usr/bin/podman kill --signal HUP nova_conductor ExecStop=/usr/bin/podman stop -t 10 nova_conductor ExecStopPost=/usr/bin/podman stop -t 10 nova_conductor KillMode=none Type=forking PIDFile=/var/run/nova_conductor.pid [Install] WantedBy=multi-user.target [heat-admin@controller-0 ~]$ sudo systemctl daemon-reload ~~~ and confirmed that it didn't affect normal stop/start operation ~~~ [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_nova_conductor ● tripleo_nova_conductor.service - nova_conductor container Loaded: loaded (/etc/systemd/system/tripleo_nova_conductor.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-06-24 01:04:27 UTC; 1h 6min ago Main PID: 21169 (conmon) Tasks: 0 (limit: 26213) Memory: 2.4M CGroup: /system.slice/tripleo_nova_conductor.service ‣ 21169 /usr/bin/conmon --api-version 1 -s -c 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -u 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -r /usr/bin/runc -b /var/lib/containers/storage> Jun 24 01:04:26 controller-0 systemd[1]: Starting nova_conductor container... Jun 24 01:04:27 controller-0 podman[21146]: 2020-06-24 01:04:27.220821736 +0000 UTC m=+0.429382454 container init 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rho> Jun 24 01:04:27 controller-0 podman[21146]: 2020-06-24 01:04:27.24996283 +0000 UTC m=+0.458523513 container start 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rho> Jun 24 01:04:27 controller-0 podman[21146]: nova_conductor Jun 24 01:04:27 controller-0 systemd[1]: Started nova_conductor container. Jun 24 02:10:54 controller-0 systemd[1]: Reloading nova_conductor container. Jun 24 02:10:54 controller-0 podman[391867]: 2020-06-24 02:10:54.445391064 +0000 UTC m=+0.083626342 container kill 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:10:54 controller-0 podman[391867]: 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 Jun 24 02:10:54 controller-0 systemd[1]: Reloaded nova_conductor container. [heat-admin@controller-0 ~]$ sudo systemctl stop tripleo_nova_conductor [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_nova_conductor ● tripleo_nova_conductor.service - nova_conductor container Loaded: loaded (/etc/systemd/system/tripleo_nova_conductor.service; enabled; vendor preset: disabled) Active: inactive (dead) since Wed 2020-06-24 02:11:22 UTC; 2s ago Process: 394597 ExecStopPost=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=0/SUCCESS) Process: 393786 ExecStop=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=0/SUCCESS) Main PID: 21169 (code=exited, status=0/SUCCESS) Jun 24 02:10:54 controller-0 systemd[1]: Reloading nova_conductor container. Jun 24 02:10:54 controller-0 podman[391867]: 2020-06-24 02:10:54.445391064 +0000 UTC m=+0.083626342 container kill 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:10:54 controller-0 podman[391867]: 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 Jun 24 02:10:54 controller-0 systemd[1]: Reloaded nova_conductor container. Jun 24 02:11:18 controller-0 systemd[1]: Stopping nova_conductor container... Jun 24 02:11:22 controller-0 podman[393786]: 2020-06-24 02:11:22.059051776 +0000 UTC m=+3.421731728 container died 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:11:22 controller-0 podman[393786]: 2020-06-24 02:11:22.060230275 +0000 UTC m=+3.422910246 container stop 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:11:22 controller-0 podman[393786]: 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 Jun 24 02:11:22 controller-0 podman[394597]: 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 Jun 24 02:11:22 controller-0 systemd[1]: Stopped nova_conductor container. [heat-admin@controller-0 ~]$ sudo systemctl start tripleo_nova_conductor [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_nova_conductor ● tripleo_nova_conductor.service - nova_conductor container Loaded: loaded (/etc/systemd/system/tripleo_nova_conductor.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-06-24 02:11:43 UTC; 47s ago Process: 394597 ExecStopPost=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=0/SUCCESS) Process: 393786 ExecStop=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=0/SUCCESS) Process: 396438 ExecStart=/usr/bin/podman start nova_conductor (code=exited, status=0/SUCCESS) Main PID: 396524 (conmon) Tasks: 0 (limit: 26213) Memory: 1.8M CGroup: /system.slice/tripleo_nova_conductor.service ‣ 396524 /usr/bin/conmon --api-version 1 -s -c 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -u 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -r /usr/bin/runc -b /var/lib/containers/storag> Jun 24 02:11:43 controller-0 systemd[1]: Starting nova_conductor container... Jun 24 02:11:43 controller-0 podman[396438]: 2020-06-24 02:11:43.625492561 +0000 UTC m=+0.475655869 container init 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:11:43 controller-0 podman[396438]: 2020-06-24 02:11:43.641841944 +0000 UTC m=+0.492005326 container start 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/r> Jun 24 02:11:43 controller-0 podman[396438]: nova_conductor Jun 24 02:11:43 controller-0 systemd[1]: Started nova_conductor container. ~~~ and now systemd can restart the container whose common process was killed. ~~~ [heat-admin@controller-0 ~]$ sudo podman ps | grep nova_conductor 0edc910c83a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:20200416.1 kolla_start 35 hours ago Up About a minute ago nova_conductor [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor |-conmon(396524)-+-dumb-init(396541)---nova-conductor(396589)-+-nova-conductor(397076) | | |-nova-conductor(397077) | | |-nova-conductor(397078) | | `-nova-conductor(397079) [heat-admin@controller-0 ~]$ sudo kill -KILL 396524 [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor |-dumb-init(396541)---nova-conductor(396589)---nova-conductor(397076) [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor |-conmon(406733)-+-dumb-init(406746)---nova-conductor(406765) [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_nova_conductor ● tripleo_nova_conductor.service - nova_conductor container Loaded: loaded (/etc/systemd/system/tripleo_nova_conductor.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-06-24 02:13:29 UTC; 11s ago Process: 405610 ExecStopPost=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=125) Process: 393786 ExecStop=/usr/bin/podman stop -t 10 nova_conductor (code=exited, status=0/SUCCESS) Process: 406710 ExecStart=/usr/bin/podman start nova_conductor (code=exited, status=0/SUCCESS) Main PID: 406733 (conmon) Tasks: 0 (limit: 26213) Memory: 1.9M CGroup: /system.slice/tripleo_nova_conductor.service ‣ 406733 /usr/bin/conmon --api-version 1 -s -c 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -u 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 -r /usr/bin/runc -b /var/lib/containers/storag> Jun 24 02:13:28 controller-0 systemd[1]: Starting nova_conductor container... Jun 24 02:13:29 controller-0 podman[406710]: 2020-06-24 02:13:29.338664709 +0000 UTC m=+0.428971832 container init 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rh> Jun 24 02:13:29 controller-0 podman[406710]: 2020-06-24 02:13:29.355447918 +0000 UTC m=+0.445755041 container start 0edc910c83a41e393db9378e01b51df3223371077e169ce8d8c590840cebcf38 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs/r> Jun 24 02:13:29 controller-0 podman[406710]: nova_conductor Jun 24 02:13:29 controller-0 systemd[1]: Started nova_conductor container. [heat-admin@controller-0 ~]$ sudo podman ps | grep nova_conductor 0edc910c83a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-conductor:20200416.1 kolla_start 35 hours ago Up 2 minutes ago nova_conductor [heat-admin@controller-0 ~]$ sudo pstree -p | grep nova-conductor |-conmon(406733)-+-dumb-init(406746)---nova-conductor(406765)-+-nova-conductor(407012) | | |-nova-conductor(407013) | | |-nova-conductor(407014) | | `-nova-conductor(407015) [heat-admin@controller-0 ~]$ ~~~ Moving to "paunch" component - not sure if tripleo-ansible will need any patch, I think we moved to podman managed systemd units with newer version. Paunch is used at least in 16.0 and 16.1. Thank you for the information, Cédric. I'll submit a patch to paunch as well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4284 |