Bug 1708708
Summary: | [OSP15] undercloud redeployment failed with "Command '['systemctl', 'stop', 'tripleo_nova_compute.service']' returned non-zero exit status 1." | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
Component: | python-paunch | Assignee: | Cédric Jeanneret <cjeanner> |
Status: | CLOSED ERRATA | QA Contact: | Victor Voronkov <vvoronko> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 15.0 (Stein) | CC: | cjeanner, dbecker, emacchi, jcoufal, mburns, morazi, mschuppe, vvoronko |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 15.0 (Stein) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | python-paunch-4.4.1-0.20190523160352.0cd4c64.el8ost | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-09-21 11:21:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Artem Hrechanychenko
2019-05-10 16:06:32 UTC
hmm, the "137" exit code shown for the container is usually when we have a "pkill" or "kill" of the process running in the container. We have to investigate a bit more, but I suspect the "podman stop -t 10 nova_compute" to cause this. I'm deploying an env to check that. I'm pretty sure we can get the same exit code without re-deploying the undercloud, that will make things easier to investigate hopefully. I keep the DFG:Compute in the loop, just in case. Me again, So I think I've found the issue: healthcheck. Yeah, again ;). Basically, there's a conflict between the tripleo_<container>.service and tripleo_<container>_healthcheck.timer when we want to stop a container using the systemctl thingy. Proof: [root@undercloud ~]# podman ps -a | grep nova e8fd4d9185e5 docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 21 minutes ago Exited (0) 21 minutes ago nova_cell_v2_discover_hosts 5128249d2dfa docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 23 minutes ago Exited (0) 22 minutes ago nova_wait_for_compute_service 397b079ad64b docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 23 minutes ago Up 23 minutes ago nova_compute fd70522eb897 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 23 minutes ago Up 23 minutes ago nova_metadata 2b765d1cba8a docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 23 minutes ago Up 23 minutes ago nova_api 7e70a6ac84e2 docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo dumb-init --singl... 23 minutes ago Up 23 minutes ago nova_conductor e9d8ba1c74bb docker.io/tripleomaster/centos-binary-nova-scheduler:current-tripleo dumb-init --singl... 24 minutes ago Up 24 minutes ago nova_scheduler 651f03d20847 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 24 minutes ago Up 24 minutes ago nova_api_cron 7a57344d79a6 docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo dumb-init --singl... 31 minutes ago Exited (0) 30 minutes ago nova_db_sync e0a9e09daa84 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 31 minutes ago Exited (0) 31 minutes ago nova_api_ensure_default_cell a559ec5555a5 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 31 minutes ago Exited (0) 31 minutes ago nova_api_map_cell0 36178df43122 docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 31 minutes ago Exited (0) 31 minutes ago nova_statedir_owner bec10e9babbf docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 32 minutes ago Exited (0) 32 minutes ago nova_api_db_sync 711475c559e2 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 35 minutes ago Exited (0) 35 minutes ago nova_metadata_init_logs 3164e0e62894 docker.io/tripleomaster/centos-binary-nova-conductor:current-tripleo dumb-init --singl... 35 minutes ago Exited (0) 35 minutes ago nova_conductor_init_log 582467e25885 docker.io/tripleomaster/centos-binary-nova-api:current-tripleo dumb-init --singl... 35 minutes ago Exited (0) 35 minutes ago nova_api_init_logs [root@undercloud ~]# systemctl stop tripleo_nova_compute Job for tripleo_nova_compute.service canceled. [root@undercloud ~]# echo $? 1 [root@undercloud ~]# That's what you hit. And if we take care of the timer: [root@undercloud ~]# podman ps | grep nova_compute 397b079ad64b docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 32 minutes ago Up 12 seconds ago nova_compute [root@undercloud ~]# systemctl stop tripleo_nova_compute_healthcheck.timer [root@undercloud ~]# systemctl stop tripleo_nova_compute [root@undercloud ~]# echo $? ; podman ps -a | grep nova_compute 0 397b079ad64b docker.io/tripleomaster/centos-binary-nova-compute-ironic:current-tripleo dumb-init --singl... 33 minutes ago Exited (0) 14 seconds ago nova_compute We're therefore lacking a dependency between the timer (healthcheck) and the main service. I'll dig in systemd doc in order to find the best way to link them both, and update paunch in order to ensure we do have this link properly set. I remove the DFG:Compute since it's clearly an issue for DF :). And move it to python-paunch component. Cheers, C. Heya, upstream patch is working and solves the issue, at least I could run it 3 times on my lab - would love to get some feedback on it in order to ensure it does solve your issue on osp-15 (I've tested it against Master so far). Cheers, C. Hey, patch merged in Master, backport to stable/stein on its way: https://review.opendev.org/#/c/660561/ Once that's done, I'll need to ensure it's downstream, and request a build. Cheers, C. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-15-virthost-undercloud-conf_changes-RHELOSP-38657/ job was executed and passed on RHOS_TRUNK-15.0-RHEL-8-20190528.n.2 VERIFIED python3-paunch-4.4.1-0.20190523160352.0cd4c64.el8ost.noarch.rpm Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |