Bug 1708708 - [OSP15] undercloud redeployment failed with "Command '['systemctl', 'stop', 'tripleo_nova_compute.service']' returned non-zero exit status 1."
Summary: [OSP15] undercloud redeployment failed with "Command '['systemctl', 'stop', '...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch
Version: 15.0 (Stein)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: beta
: 15.0 (Stein)
Assignee: Cédric Jeanneret
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-10 16:06 UTC by Artem Hrechanychenko
Modified: 2020-01-31 08:08 UTC (History)
8 users (show)

Fixed In Version: python-paunch-4.4.1-0.20190523160352.0cd4c64.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:21:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 660561 0 'None' MERGED Add missing link between healthcheck unit and main service 2020-03-28 10:51:47 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:22:17 UTC

Description Artem Hrechanychenko 2019-05-10 16:06:32 UTC
Description of problem:
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-deployment-15-virthost-undercloud-conf_changes-RHELOSP-38657/11/artifact/

redeployment of Undercloud failed with 

TASK [Debug output for task: Start containers for step 5] **********************
fatal: [undercloud-0]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "Job for tripleo_nova_compute.service canceled.",
        "systemctl failed",
        "Traceback (most recent call last):",
        "  File \"/usr/lib/python3.6/site-packages/paunch/utils/systemd.py\", line 123, in service_delete",
        "    subprocess.check_call(['systemctl', 'stop', sysd_f])",
        "  File \"/usr/lib64/python3.6/subprocess.py\", line 311, in check_call",
        "    raise CalledProcessError(retcode, cmd)",
        "subprocess.CalledProcessError: Command '['systemctl', 'stop', 'tripleo_nova_compute.service']' returned non-zero exit status 1.",
        "Command '['systemctl', 'stop', 'tripleo_nova_compute.service']' returned non-zero exit status 1."
    ]
}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
undercloud-0               : ok=288  changed=101  unreachable=0    failed=1    skipped=370  rescued=0    ignored=1

[stack@undercloud-0 ~]$ sudo systemctl status tripleo_nova_compute.service
● tripleo_nova_compute.service - nova_compute container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_compute.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2019-05-10 13:41:03 UTC; 2h 23min ago
  Process: 113287 ExecStop=/usr/bin/podman stop -t 10 nova_compute (code=exited, status=0/SUCCESS)
 Main PID: 51822

May 10 15:50:52 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:52:12 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:53:22 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:54:52 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:56:02 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:57:22 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 15:59:02 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 16:00:36 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 16:02:12 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.
May 10 16:03:42 undercloud-0.redhat.local systemd[1]: nova_compute container is not active.


2aead00e93f7  192.168.24.1:8787/rhosp15/openstack-nova-compute-ironic:20190423.1        dumb-init --singl...  3 hours ago  Exited (137) 2 hours ago         nova_compute

[stack@undercloud-0 ~]$ sudo podman logs nova_compute
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Creating directory /etc/httpd
INFO:__main__:Creating directory /etc/httpd/conf.d
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/10-nova_api_wsgi.conf to /etc/httpd/conf.d/10-nova_api_wsgi.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/access_compat.load to /etc/httpd/conf.d/access_compat.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/actions.load to /etc/httpd/conf.d/actions.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/alias.conf to /etc/httpd/conf.d/alias.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/alias.load to /etc/httpd/conf.d/alias.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/auth_basic.load to /etc/httpd/conf.d/auth_basic.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/auth_digest.load to /etc/httpd/conf.d/auth_digest.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authn_anon.load to /etc/httpd/conf.d/authn_anon.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authn_core.load to /etc/httpd/conf.d/authn_core.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authn_dbm.load to /etc/httpd/conf.d/authn_dbm.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authn_file.load to /etc/httpd/conf.d/authn_file.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_core.load to /etc/httpd/conf.d/authz_core.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_dbm.load to /etc/httpd/conf.d/authz_dbm.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_groupfile.load to /etc/httpd/conf.d/authz_groupfile.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_host.load to /etc/httpd/conf.d/authz_host.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_owner.load to /etc/httpd/conf.d/authz_owner.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/authz_user.load to /etc/httpd/conf.d/authz_user.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/autoindex.conf to /etc/httpd/conf.d/autoindex.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/autoindex.load to /etc/httpd/conf.d/autoindex.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/cache.load to /etc/httpd/conf.d/cache.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/cgi.load to /etc/httpd/conf.d/cgi.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/dav.load to /etc/httpd/conf.d/dav.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/dav_fs.conf to /etc/httpd/conf.d/dav_fs.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/dav_fs.load to /etc/httpd/conf.d/dav_fs.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/deflate.conf to /etc/httpd/conf.d/deflate.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/deflate.load to /etc/httpd/conf.d/deflate.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/dir.conf to /etc/httpd/conf.d/dir.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/dir.load to /etc/httpd/conf.d/dir.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/env.load to /etc/httpd/conf.d/env.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/expires.load to /etc/httpd/conf.d/expires.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/ext_filter.load to /etc/httpd/conf.d/ext_filter.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/filter.load to /etc/httpd/conf.d/filter.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/include.load to /etc/httpd/conf.d/include.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/log_config.load to /etc/httpd/conf.d/log_config.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/logio.load to /etc/httpd/conf.d/logio.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/mime.conf to /etc/httpd/conf.d/mime.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/mime.load to /etc/httpd/conf.d/mime.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/mime_magic.conf to /etc/httpd/conf.d/mime_magic.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/mime_magic.load to /etc/httpd/conf.d/mime_magic.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/negotiation.conf to /etc/httpd/conf.d/negotiation.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/negotiation.load to /etc/httpd/conf.d/negotiation.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/prefork.conf to /etc/httpd/conf.d/prefork.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/prefork.load to /etc/httpd/conf.d/prefork.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/rewrite.load to /etc/httpd/conf.d/rewrite.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/setenvif.conf to /etc/httpd/conf.d/setenvif.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/setenvif.load to /etc/httpd/conf.d/setenvif.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/socache_shmcb.load to /etc/httpd/conf.d/socache_shmcb.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/speling.load to /etc/httpd/conf.d/speling.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/ssl.conf to /etc/httpd/conf.d/ssl.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/ssl.load to /etc/httpd/conf.d/ssl.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/status.conf to /etc/httpd/conf.d/status.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/status.load to /etc/httpd/conf.d/status.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/substitute.load to /etc/httpd/conf.d/substitute.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/suexec.load to /etc/httpd/conf.d/suexec.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/systemd.load to /etc/httpd/conf.d/systemd.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/unixd.load to /etc/httpd/conf.d/unixd.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/usertrack.load to /etc/httpd/conf.d/usertrack.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/version.load to /etc/httpd/conf.d/version.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/vhost_alias.load to /etc/httpd/conf.d/vhost_alias.load
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/wsgi.conf to /etc/httpd/conf.d/wsgi.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf.d/wsgi.load to /etc/httpd/conf.d/wsgi.load
INFO:__main__:Creating directory /etc/httpd/conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf/httpd.conf to /etc/httpd/conf/httpd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/httpd/conf/ports.conf to /etc/httpd/conf/ports.conf
INFO:__main__:Creating directory /etc/my.cnf.d
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
INFO:__main__:Deleting /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/nova/nova.conf to /etc/nova/nova.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/var/spool/cron/nova to /var/spool/cron/nova
INFO:__main__:Creating directory /var/www
INFO:__main__:Creating directory /var/www/cgi-bin
INFO:__main__:Creating directory /var/www/cgi-bin/nova
INFO:__main__:Copying /var/lib/kolla/config_files/src/var/www/cgi-bin/nova/nova-api to /var/www/cgi-bin/nova/nova-api
INFO:__main__:Deleting /etc/iscsi/initiatorname.iscsi
INFO:__main__:Copying /var/lib/kolla/config_files/src-iscsid/initiatorname.iscsi to /etc/iscsi/initiatorname.iscsi
INFO:__main__:Writing out command to execute
INFO:__main__:Setting permission for /var/log/nova
INFO:__main__:Setting permission for /var/log/nova/nova-manage.log
INFO:__main__:Setting permission for /var/log/nova/nova-conductor.log
INFO:__main__:Setting permission for /var/log/nova/nova-scheduler.log
INFO:__main__:Setting permission for /var/log/nova/nova-api.log
INFO:__main__:Setting permission for /var/log/nova/nova-metadata-api.log
++ cat /run_command
+ CMD=/usr/bin/nova-compute
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/nova ]]
++ mkdir -p /var/log/kolla/nova
+++ stat -c %a /var/log/kolla/nova
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/nova
++ . /usr/local/bin/kolla_nova_extend_start
+ echo 'Running command: '\''/usr/bin/nova-compute'\'''
Running command: '/usr/bin/nova-compute'
+ exec /usr/bin/nova-compute


Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190423.n.1

How reproducible:


Steps to Reproduce:
1. install undercloud 
2. change undercloud.conf
3. re-run undercloud installation

Actual results:
failed to stop tripleo.nova-compute service - nova_compute container in exited state with rc=127

Expected results:
pass

Additional info:
build artifacts - 
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-deployment-15-virthost-undercloud-conf_changes-RHELOSP-38657/11/artifact/undercloud-0.tar.gz

Comment 2 Cédric Jeanneret 2019-05-16 11:17:27 UTC
hmm, the "137" exit code shown for the container is usually when we have a "pkill" or "kill" of the process running in the container.

We have to investigate a bit more, but I suspect the "podman stop -t 10 nova_compute" to cause this. I'm deploying an env to check that.

I'm pretty sure we can get the same exit code without re-deploying the undercloud, that will make things easier to investigate hopefully.

I keep the DFG:Compute in the loop, just in case.

Comment 5 Cédric Jeanneret 2019-05-16 12:45:45 UTC
Me again,

So I think I've found the issue: healthcheck. Yeah, again ;).

Basically, there's a conflict between the tripleo_<container>.service and tripleo_<container>_healthcheck.timer when we want to stop a container using the systemctl thingy.

Proof:

[root@undercloud ~]# podman ps -a | grep nova                                   
e8fd4d9185e5  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  21 minutes ago  Exited (0) 21 minutes ago         nova_cell_v2_discover_hosts
5128249d2dfa  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  23 minutes ago  Exited (0) 22 minutes ago         nova_wait_for_compute_service
397b079ad64b  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  23 minutes ago  Up 23 minutes ago                 nova_compute
fd70522eb897  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  23 minutes ago  Up 23 minutes ago                 nova_metadata
2b765d1cba8a  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  23 minutes ago  Up 23 minutes ago                 nova_api
7e70a6ac84e2  docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo             dumb-init --singl...  23 minutes ago  Up 23 minutes ago                 nova_conductor
e9d8ba1c74bb  docker.io/tripleomaster/centos-binary-nova-scheduler:current-tripleo             dumb-init --singl...  24 minutes ago  Up 24 minutes ago                 nova_scheduler
651f03d20847  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  24 minutes ago  Up 24 minutes ago                 nova_api_cron
7a57344d79a6  docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo             dumb-init --singl...  31 minutes ago  Exited (0) 30 minutes ago         nova_db_sync
e0a9e09daa84  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  31 minutes ago  Exited (0) 31 minutes ago         nova_api_ensure_default_cell
a559ec5555a5  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  31 minutes ago  Exited (0) 31 minutes ago         nova_api_map_cell0
36178df43122  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  31 minutes ago  Exited (0) 31 minutes ago         nova_statedir_owner
bec10e9babbf  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  32 minutes ago  Exited (0) 32 minutes ago         nova_api_db_sync
711475c559e2  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  35 minutes ago  Exited (0) 35 minutes ago         nova_metadata_init_logs
3164e0e62894  docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo             dumb-init --singl...  35 minutes ago  Exited (0) 35 minutes ago         nova_conductor_init_log
582467e25885  docker.io/tripleomaster/centos-binary-nova-api:current-tripleo                   dumb-init --singl...  35 minutes ago  Exited (0) 35 minutes ago         nova_api_init_logs
[root@undercloud ~]# systemctl stop tripleo_nova_compute                        
Job for tripleo_nova_compute.service canceled.                                  
[root@undercloud ~]# echo $?                                                    
1                                                                               
[root@undercloud ~]#            

That's what you hit.

And if we take care of the timer:
[root@undercloud ~]# podman ps | grep nova_compute
397b079ad64b  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  32 minutes ago     Up 12 seconds ago            nova_compute
[root@undercloud ~]# systemctl stop tripleo_nova_compute_healthcheck.timer
[root@undercloud ~]# systemctl stop tripleo_nova_compute
[root@undercloud ~]# echo $? ; podman ps -a | grep nova_compute
0
397b079ad64b  docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo        dumb-init --singl...  33 minutes ago     Exited (0) 14 seconds ago            nova_compute


We're therefore lacking a dependency between the timer (healthcheck) and the main service.

I'll dig in systemd doc in order to find the best way to link them both, and update paunch in order to ensure we do have this link properly set.

I remove the DFG:Compute since it's clearly an issue for DF :). And move it to python-paunch component.

Cheers,

C.

Comment 6 Cédric Jeanneret 2019-05-17 13:00:16 UTC
Heya,

upstream patch is working and solves the issue, at least I could run it 3 times on my lab - would love to get some feedback on it in order to ensure it does solve your issue on osp-15 (I've tested it against Master so far).

Cheers,

C.

Comment 8 Cédric Jeanneret 2019-05-22 12:22:11 UTC
Hey,

patch merged in Master, backport to stable/stein on its way: https://review.opendev.org/#/c/660561/

Once that's done, I'll need to ensure it's downstream, and request a build.

Cheers,

C.

Comment 14 Artem Hrechanychenko 2019-06-10 06:58:16 UTC
VERIFIED

python3-paunch-4.4.1-0.20190523160352.0cd4c64.el8ost.noarch.rpm

Comment 18 errata-xmlrpc 2019-09-21 11:21:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.