Description of problem: There has been some issues observed in stopping tripleo_nova_compute/tripleo_nova_libvirt container. This is observed sometime that the services are failed to stop but some of the time it works perfectly. Like if you see below : [root@compute-1 heat-admin]# systemctl status tripleo_nova_compute ● tripleo_nova_compute.service - nova_compute container Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2020-10-05 17:38:31 UTC; 25s ago Process: 78914 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS) Main PID: 78931 (conmon) Tasks: 0 (limit: 204317) Memory: 2.8M CGroup: /system.slice/tripleo_nova_compute.service ‣ 78931 /usr/bin/conmon --api-version 1 -s -c abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -u abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -r /usr/bin/runc -b /var/lib/containers/> Oct 05 17:38:31 compute-1 systemd[1]: Starting nova_compute container... Oct 05 17:38:31 compute-1 podman[78915]: 2020-10-05 17:38:31.382735002 +0000 UTC m=+0.173023960 container init abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:38:31 compute-1 podman[78915]: 2020-10-05 17:38:31.396849632 +0000 UTC m=+0.187138602 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: nova_compute Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217) Oct 05 17:38:31 compute-1 systemd[1]: Started nova_compute container. [root@compute-1 heat-admin]# systemctl stop tripleo_nova_compute [root@compute-1 heat-admin]# systemctl status tripleo_nova_compute ● tripleo_nova_compute.service - nova_compute container Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2020-10-05 17:39:11 UTC; 1s ago Process: 79360 ExecStopPost=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS) Process: 79241 ExecStop=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS) Process: 78914 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS) Main PID: 78931 (code=exited, status=137) Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217) Oct 05 17:38:31 compute-1 systemd[1]: Started nova_compute container. Oct 05 17:39:01 compute-1 systemd[1]: Stopping nova_compute container... Oct 05 17:39:11 compute-1 podman[79241]: 2020-10-05 17:39:11.604369286 +0000 UTC m=+10.225565088 container died abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:39:11 compute-1 podman[79241]: 2020-10-05 17:39:11.606180499 +0000 UTC m=+10.227376295 container stop abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:39:11 compute-1 podman[79241]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 Oct 05 17:39:11 compute-1 systemd[1]: tripleo_nova_compute.service: Main process exited, code=exited, status=137/n/a Oct 05 17:39:11 compute-1 podman[79360]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 Oct 05 17:39:11 compute-1 systemd[1]: tripleo_nova_compute.service: Failed with result 'exit-code'. Oct 05 17:39:11 compute-1 systemd[1]: Stopped nova_compute container. [root@compute-1 heat-admin]# [root@compute-1 heat-admin]# [root@compute-1 heat-admin]# [root@compute-1 heat-admin]# systemctl stop tripleo_nova_compute [root@compute-1 heat-admin]# systemctl status tripleo_nova_compute ● tripleo_nova_compute.service - nova_compute container Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled) Active: inactive (dead) since Mon 2020-10-05 17:40:32 UTC; 46s ago Process: 79994 ExecStopPost=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS) Process: 79911 ExecStop=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS) Process: 79662 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS) Main PID: 79686 (code=exited, status=0/SUCCESS) Oct 05 17:40:11 compute-1 podman[79667]: 2020-10-05 17:40:11.648765956 +0000 UTC m=+0.194652529 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:40:11 compute-1 paunch-start-podman-container[79662]: nova_compute Oct 05 17:40:11 compute-1 paunch-start-podman-container[79662]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217) Oct 05 17:40:11 compute-1 systemd[1]: Started nova_compute container. Oct 05 17:40:25 compute-1 systemd[1]: Stopping nova_compute container... Oct 05 17:40:32 compute-1 podman[79911]: 2020-10-05 17:40:32.141310674 +0000 UTC m=+6.153475271 container died abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:40:32 compute-1 podman[79911]: 2020-10-05 17:40:32.142295241 +0000 UTC m=+6.154459819 container stop abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:40:32 compute-1 podman[79911]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 Oct 05 17:40:32 compute-1 podman[79994]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 Oct 05 17:40:32 compute-1 systemd[1]: Stopped nova_compute container. [root@compute-1 heat-admin]# systemctl start tripleo_nova_compute [root@compute-1 heat-admin]# systemctl status tripleo_nova_compute ● tripleo_nova_compute.service - nova_compute container Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2020-10-05 17:41:25 UTC; 1s ago Process: 80128 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS) Main PID: 80143 (conmon) Tasks: 0 (limit: 204317) Memory: 2.7M CGroup: /system.slice/tripleo_nova_compute.service ‣ 80143 /usr/bin/conmon --api-version 1 -s -c abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -u abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -r /usr/bin/runc -b /var/lib/containers/> Oct 05 17:41:24 compute-1 systemd[1]: Starting nova_compute container... Oct 05 17:41:24 compute-1 podman[80129]: 2020-10-05 17:41:24.986328161 +0000 UTC m=+0.148323429 container init abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:41:25 compute-1 podman[80129]: 2020-10-05 17:41:25.001562539 +0000 UTC m=+0.163557796 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:41:25 compute-1 paunch-start-podman-container[80128]: nova_compute Oct 05 17:41:25 compute-1 paunch-start-podman-container[80128]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217) Oct 05 17:41:25 compute-1 systemd[1]: Started nova_compute container. [root@compute-1 heat-admin]# systemctl stop tripleo_nova_libvirt [root@compute-1 heat-admin]# systemctl status tripleo_nova_libvirt ● tripleo_nova_libvirt.service - nova_libvirt container Loaded: loaded (/etc/systemd/system/tripleo_nova_libvirt.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2020-10-05 17:57:09 UTC; 1s ago Process: 86666 ExecStopPost=/usr/bin/podman stop -t 10 nova_libvirt (code=exited, status=0/SUCCESS) Process: 86635 ExecStop=/usr/bin/podman stop -t 10 nova_libvirt (code=exited, status=0/SUCCESS) Main PID: 83508 (code=exited, status=143) Oct 05 17:50:16 compute-1 paunch-start-podman-container[83492]: Creating additional drop-in dependency for "nova_libvirt" (bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480) Oct 05 17:50:17 compute-1 systemd[1]: Started nova_libvirt container. Oct 05 17:57:08 compute-1 systemd[1]: Stopping nova_libvirt container... Oct 05 17:57:09 compute-1 podman[86635]: 2020-10-05 17:57:09.116397386 +0000 UTC m=+0.207859718 container died bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:57:09 compute-1 podman[86635]: 2020-10-05 17:57:09.119940705 +0000 UTC m=+0.211402968 container stop bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:57:09 compute-1 podman[86635]: bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 Oct 05 17:57:09 compute-1 podman[86666]: bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 Oct 05 17:57:09 compute-1 systemd[1]: tripleo_nova_libvirt.service: Main process exited, code=exited, status=143/n/a Oct 05 17:57:09 compute-1 systemd[1]: tripleo_nova_libvirt.service: Failed with result 'exit-code'. Oct 05 17:57:09 compute-1 systemd[1]: Stopped nova_libvirt container. [root@compute-1 heat-admin]# systemctl start tripleo_nova_libvirt [root@compute-1 heat-admin]# systemctl status tripleo_nova_libvirt ● tripleo_nova_libvirt.service - nova_libvirt container Loaded: loaded (/etc/systemd/system/tripleo_nova_libvirt.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2020-10-05 17:57:18 UTC; 1s ago Process: 86708 ExecStart=/usr/libexec/paunch-start-podman-container nova_libvirt (code=exited, status=0/SUCCESS) Main PID: 86725 (conmon) Tasks: 0 (limit: 204317) Memory: 2.7M CGroup: /system.slice/tripleo_nova_libvirt.service ‣ 86725 /usr/bin/conmon --api-version 1 -s -c bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 -u bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 -r /usr/bin/runc -b /var/lib/containers/> Oct 05 17:57:18 compute-1 systemd[1]: Starting nova_libvirt container... Oct 05 17:57:18 compute-1 podman[86709]: 2020-10-05 17:57:18.447511872 +0000 UTC m=+0.168163187 container init bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs> Oct 05 17:57:18 compute-1 podman[86709]: 2020-10-05 17:57:18.463124596 +0000 UTC m=+0.183775946 container start bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Oct 05 17:57:18 compute-1 paunch-start-podman-container[86708]: nova_libvirt Oct 05 17:57:18 compute-1 paunch-start-podman-container[86708]: Creating additional drop-in dependency for "nova_libvirt" (bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480) Oct 05 17:57:18 compute-1 systemd[1]: Started nova_libvirt container. Version-Release number of selected component (if applicable): 16.1.2 How reproducible: Sometime Steps to Reproduce: 1.go to compute host and start tripleo_nova_compute/libvirt service 2.stop the tripleo_nova_compute/libvirt service 3.failed to stop Actual results: shows the status of services as failed to stop Expected results: shows the inactive status after stopping the services Additional info:
conmon is being killed with SIGKILL and thus returning 137 (128 + 9), the .service for nova-compute should really take this into account tbh. https://tldp.org/LDP/abs/html/exitcodes.html
The following upstream patch should correct this situation: https://review.opendev.org/756333 There's also a master version of the patch, for tripleo-ansible: https://review.opendev.org/756339 the commit message explains the whole thing with those weird exit code. More doc is also readable here: https://tldp.org/LDP/abs/html/exitcodes.html Namely, we hit the «Fatal error signal "n"» case, meaning we have to subtract 128 from the exit code to get the actual signal. Cheers, C.
*** Bug 1885362 has been marked as a duplicate of this bug. ***
sudo systemctl stop tripleo_nova_compute sudo systemctl status tripleo_nova_compute sudo systemctl start tripleo_nova_compute
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817