Bug 2218455
| Summary: | HAProxy fails to restart during update from 16.1.8 to 16.2.5 - task "copy certificate, chgrp, restart haproxy" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Eric Nothen <enothen> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Luca Miccini <lmiccini> |
| Status: | MODIFIED --- | QA Contact: | Joe H. Rahme <jhakimra> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | lmiccini, mburns, tkajinam |
| Target Milestone: | z6 | Keywords: | Triaged |
| Target Release: | 16.2 (Train on RHEL 8.4) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.6.1-2.20230808225213.9adcac6.el8osttrunk | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: HAproxy fails to restart after update, eventually causing failure of update job in controller-0 when updating from 16.1.8 to 16.2.5 Version-Release number of selected component (if applicable): 16.2.5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: At the time HAproxy fails to restart, all of the rpms have been updated in the controller, and all of the new container images have been pre-fetched. The error is mostly the same: $ grep FATAL 0020-mistral.tar.gz/var/log/containers/mistral/package_update.log | grep -c "does not exist" 19 ~~~ { "ansible_loop_var": "item", "changed": true, "cmd": "set -e\nif podman ps -f \"id=01c2b754fee8\" --format \"{{.Names}}\" | grep -q \"^haproxy-bundle\"; then\n tar -c /etc/pki/tls/private/overcloud_endpoint.pem | podman exec -i 01c2b754fee8 tar -C / -xv\nelse\n podman cp /etc/pki/tls/private/overcloud_endpoint.pem 01c2b754fee8:/etc/pki/tls/private/overcloud_endpoint.pem\nfi\npodman exec --user root 01c2b754fee8 chgrp haproxy /etc/pki/tls/private/overcloud_endpoint.pem\npodman kill --signal=HUP 01c2b754fee8\n", "delta": "0:00:00.569902", "end": "2023-06-28 12:29:29.541934", "failed_when_result": true, "item": "01c2b754fee8", "msg": "non-zero return code", "rc": 125, "start": "2023-06-28 12:29:28.972032", "stderr": "Error: container \"01c2b754fee8\" does not exist", "stderr_lines": [ "Error: container \"01c2b754fee8\" does not exist" ], "stdout": "", "stdout_lines": [] } ~~~ But there's one that's slightly different: ~~~ { "ansible_loop_var": "item", "changed": true, "cmd": "set -e\nif podman ps -f \"id=0185c92bf4a9\" --format \"{{.Names}}\" | grep -q \"^haproxy-bundle\"; then\n tar -c /etc/pki/tls/private/overcloud_endpoint.pem | podman exec -i 0185c92bf4a9 tar -C / -xv\nelse\n podman cp /etc/pki/tls/private/overcloud_endpoint.pem 0185c92bf4a9:/etc/pki/tls/private/overcloud_endpoint.pem\nfi\npodman exec --user root 0185c92bf4a9 chgrp haproxy /etc/pki/tls/private/overcloud_endpoint.pem\npodman kill --signal=HUP 0185c92bf4a9\n", "delta": "0:00:00.696201", "end": "2023-06-28 12:29:58.321678", "failed_when_result": true, "item": "0185c92bf4a9", "msg": "non-zero return code", "rc": 255, "start": "2023-06-28 12:29:57.625477", "stderr": "Error: OCI runtime error: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1", "stderr_lines": [ "Error: OCI runtime error: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1" ], "stdout": "", "stdout_lines": [] } ~~~