Bug 1885363 - [OSP16.1]Failed to stop the tripleo_nova_compute /tripleo_nova_libvirt services
Summary: [OSP16.1]Failed to stop the tripleo_nova_compute /tripleo_nova_libvirt services
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Cédric Jeanneret
QA Contact: nlevinki
URL:
Whiteboard:
: 1885362 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-05 18:04 UTC by Paras Babbar
Modified: 2021-03-17 15:34 UTC (History)
9 users (show)

Fixed In Version: python-paunch-5.3.3-1.20200826193408.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:32:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:34:06 UTC

Description Paras Babbar 2020-10-05 18:04:51 UTC
Description of problem:

There has been some issues observed in stopping tripleo_nova_compute/tripleo_nova_libvirt container. This is observed sometime that the services are failed to stop  but some of the time it works perfectly.

Like if you see below :
[root@compute-1 heat-admin]# systemctl status tripleo_nova_compute
● tripleo_nova_compute.service - nova_compute container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-10-05 17:38:31 UTC; 25s ago
  Process: 78914 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS)
 Main PID: 78931 (conmon)
    Tasks: 0 (limit: 204317)
   Memory: 2.8M
   CGroup: /system.slice/tripleo_nova_compute.service
           ‣ 78931 /usr/bin/conmon --api-version 1 -s -c abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -u abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -r /usr/bin/runc -b /var/lib/containers/>
 
Oct 05 17:38:31 compute-1 systemd[1]: Starting nova_compute container...
Oct 05 17:38:31 compute-1 podman[78915]: 2020-10-05 17:38:31.382735002 +0000 UTC m=+0.173023960 container init abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:38:31 compute-1 podman[78915]: 2020-10-05 17:38:31.396849632 +0000 UTC m=+0.187138602 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: nova_compute
Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217)
Oct 05 17:38:31 compute-1 systemd[1]: Started nova_compute container.
[root@compute-1 heat-admin]# systemctl stop tripleo_nova_compute
[root@compute-1 heat-admin]# systemctl status tripleo_nova_compute
● tripleo_nova_compute.service - nova_compute container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2020-10-05 17:39:11 UTC; 1s ago
  Process: 79360 ExecStopPost=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS)
  Process: 79241 ExecStop=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS)
  Process: 78914 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS)
 Main PID: 78931 (code=exited, status=137)
 
Oct 05 17:38:31 compute-1 paunch-start-podman-container[78914]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217)
Oct 05 17:38:31 compute-1 systemd[1]: Started nova_compute container.
Oct 05 17:39:01 compute-1 systemd[1]: Stopping nova_compute container...
Oct 05 17:39:11 compute-1 podman[79241]: 2020-10-05 17:39:11.604369286 +0000 UTC m=+10.225565088 container died abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:39:11 compute-1 podman[79241]: 2020-10-05 17:39:11.606180499 +0000 UTC m=+10.227376295 container stop abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:39:11 compute-1 podman[79241]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217
Oct 05 17:39:11 compute-1 systemd[1]: tripleo_nova_compute.service: Main process exited, code=exited, status=137/n/a
Oct 05 17:39:11 compute-1 podman[79360]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217
Oct 05 17:39:11 compute-1 systemd[1]: tripleo_nova_compute.service: Failed with result 'exit-code'.
Oct 05 17:39:11 compute-1 systemd[1]: Stopped nova_compute container.
[root@compute-1 heat-admin]#
[root@compute-1 heat-admin]#
[root@compute-1 heat-admin]#
[root@compute-1 heat-admin]# systemctl stop tripleo_nova_compute
[root@compute-1 heat-admin]# systemctl status tripleo_nova_compute
● tripleo_nova_compute.service - nova_compute container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2020-10-05 17:40:32 UTC; 46s ago
  Process: 79994 ExecStopPost=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS)
  Process: 79911 ExecStop=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS)
  Process: 79662 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS)
 Main PID: 79686 (code=exited, status=0/SUCCESS)
 
Oct 05 17:40:11 compute-1 podman[79667]: 2020-10-05 17:40:11.648765956 +0000 UTC m=+0.194652529 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:40:11 compute-1 paunch-start-podman-container[79662]: nova_compute
Oct 05 17:40:11 compute-1 paunch-start-podman-container[79662]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217)
Oct 05 17:40:11 compute-1 systemd[1]: Started nova_compute container.
Oct 05 17:40:25 compute-1 systemd[1]: Stopping nova_compute container...
Oct 05 17:40:32 compute-1 podman[79911]: 2020-10-05 17:40:32.141310674 +0000 UTC m=+6.153475271 container died abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:40:32 compute-1 podman[79911]: 2020-10-05 17:40:32.142295241 +0000 UTC m=+6.154459819 container stop abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:40:32 compute-1 podman[79911]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217
Oct 05 17:40:32 compute-1 podman[79994]: abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217
Oct 05 17:40:32 compute-1 systemd[1]: Stopped nova_compute container.
[root@compute-1 heat-admin]# systemctl start tripleo_nova_compute
[root@compute-1 heat-admin]# systemctl status tripleo_nova_compute
● tripleo_nova_compute.service - nova_compute container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-10-05 17:41:25 UTC; 1s ago
  Process: 80128 ExecStart=/usr/libexec/paunch-start-podman-container nova_compute (code=exited, status=0/SUCCESS)
 Main PID: 80143 (conmon)
    Tasks: 0 (limit: 204317)
   Memory: 2.7M
   CGroup: /system.slice/tripleo_nova_compute.service
           ‣ 80143 /usr/bin/conmon --api-version 1 -s -c abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -u abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 -r /usr/bin/runc -b /var/lib/containers/>
 
Oct 05 17:41:24 compute-1 systemd[1]: Starting nova_compute container...
Oct 05 17:41:24 compute-1 podman[80129]: 2020-10-05 17:41:24.986328161 +0000 UTC m=+0.148323429 container init abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:41:25 compute-1 podman[80129]: 2020-10-05 17:41:25.001562539 +0000 UTC m=+0.163557796 container start abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:41:25 compute-1 paunch-start-podman-container[80128]: nova_compute
Oct 05 17:41:25 compute-1 paunch-start-podman-container[80128]: Creating additional drop-in dependency for "nova_compute" (abc882b186dc743977fc3bd3f41296796ada4cb7d95b65cbfb9882f577889217)
Oct 05 17:41:25 compute-1 systemd[1]: Started nova_compute container.
 
 
[root@compute-1 heat-admin]# systemctl stop tripleo_nova_libvirt
[root@compute-1 heat-admin]# systemctl status tripleo_nova_libvirt
● tripleo_nova_libvirt.service - nova_libvirt container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_libvirt.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2020-10-05 17:57:09 UTC; 1s ago
  Process: 86666 ExecStopPost=/usr/bin/podman stop -t 10 nova_libvirt (code=exited, status=0/SUCCESS)
  Process: 86635 ExecStop=/usr/bin/podman stop -t 10 nova_libvirt (code=exited, status=0/SUCCESS)
 Main PID: 83508 (code=exited, status=143)
 
Oct 05 17:50:16 compute-1 paunch-start-podman-container[83492]: Creating additional drop-in dependency for "nova_libvirt" (bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480)
Oct 05 17:50:17 compute-1 systemd[1]: Started nova_libvirt container.
Oct 05 17:57:08 compute-1 systemd[1]: Stopping nova_libvirt container...
Oct 05 17:57:09 compute-1 podman[86635]: 2020-10-05 17:57:09.116397386 +0000 UTC m=+0.207859718 container died bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:57:09 compute-1 podman[86635]: 2020-10-05 17:57:09.119940705 +0000 UTC m=+0.211402968 container stop bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:57:09 compute-1 podman[86635]: bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480
Oct 05 17:57:09 compute-1 podman[86666]: bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480
Oct 05 17:57:09 compute-1 systemd[1]: tripleo_nova_libvirt.service: Main process exited, code=exited, status=143/n/a
Oct 05 17:57:09 compute-1 systemd[1]: tripleo_nova_libvirt.service: Failed with result 'exit-code'.
Oct 05 17:57:09 compute-1 systemd[1]: Stopped nova_libvirt container.
[root@compute-1 heat-admin]# systemctl start tripleo_nova_libvirt
[root@compute-1 heat-admin]# systemctl status tripleo_nova_libvirt
● tripleo_nova_libvirt.service - nova_libvirt container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_libvirt.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-10-05 17:57:18 UTC; 1s ago
  Process: 86708 ExecStart=/usr/libexec/paunch-start-podman-container nova_libvirt (code=exited, status=0/SUCCESS)
 Main PID: 86725 (conmon)
    Tasks: 0 (limit: 204317)
   Memory: 2.7M
   CGroup: /system.slice/tripleo_nova_libvirt.service
           ‣ 86725 /usr/bin/conmon --api-version 1 -s -c bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 -u bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 -r /usr/bin/runc -b /var/lib/containers/>
 
Oct 05 17:57:18 compute-1 systemd[1]: Starting nova_libvirt container...
Oct 05 17:57:18 compute-1 podman[86709]: 2020-10-05 17:57:18.447511872 +0000 UTC m=+0.168163187 container init bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osbs>
Oct 05 17:57:18 compute-1 podman[86709]: 2020-10-05 17:57:18.463124596 +0000 UTC m=+0.183775946 container start bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Oct 05 17:57:18 compute-1 paunch-start-podman-container[86708]: nova_libvirt
Oct 05 17:57:18 compute-1 paunch-start-podman-container[86708]: Creating additional drop-in dependency for "nova_libvirt" (bef7003d6eadd86c1148f20063808f5a715419b718430f6b9f8ead06b9dbd480)
Oct 05 17:57:18 compute-1 systemd[1]: Started nova_libvirt container.

Version-Release number of selected component (if applicable):
16.1.2

How reproducible:
Sometime
Steps to Reproduce:
1.go to compute host and start tripleo_nova_compute/libvirt service
2.stop the tripleo_nova_compute/libvirt service
3.failed to stop

Actual results:
shows the status of services as failed to stop

Expected results:
shows the inactive status after stopping the services

Additional info:

Comment 1 Lee Yarwood 2020-10-06 10:04:46 UTC
conmon is being killed with SIGKILL and thus returning 137 (128 + 9), the .service for nova-compute should really take this into account tbh.

https://tldp.org/LDP/abs/html/exitcodes.html

Comment 3 Cédric Jeanneret 2020-10-06 15:29:05 UTC
The following upstream patch should correct this situation:
https://review.opendev.org/756333

There's also a master version of the patch, for tripleo-ansible: https://review.opendev.org/756339

the commit message explains the whole thing with those weird exit code.

More doc is also readable here:
https://tldp.org/LDP/abs/html/exitcodes.html

Namely, we hit the «Fatal error signal "n"» case, meaning we have to subtract 128 from the exit code to get the actual signal.

Cheers,

C.

Comment 5 Lee Yarwood 2020-10-14 12:30:19 UTC
*** Bug 1885362 has been marked as a duplicate of this bug. ***

Comment 14 Jad Haj Yahya 2021-01-25 08:03:59 UTC
sudo systemctl stop tripleo_nova_compute
sudo systemctl status tripleo_nova_compute
sudo systemctl start tripleo_nova_compute

Comment 21 errata-xmlrpc 2021-03-17 15:32:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.