Bug 2218455

Summary: HAProxy fails to restart during update from 16.1.8 to 16.2.5 - task "copy certificate, chgrp, restart haproxy"
Product: Red Hat OpenStack Reporter: Eric Nothen <enothen>
Component: openstack-tripleo-heat-templatesAssignee: Luca Miccini <lmiccini>
Status: MODIFIED --- QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: lmiccini, mburns, tkajinam
Target Milestone: z6Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230808225213.9adcac6.el8osttrunk Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Nothen 2023-06-29 08:08:41 UTC
Description of problem:

HAproxy fails to restart after update, eventually causing failure of update job in controller-0 when updating from 16.1.8 to 16.2.5

Version-Release number of selected component (if applicable):
16.2.5

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

At the time HAproxy fails to restart, all of the rpms have been updated in the controller, and all of the new container images have been pre-fetched.

The error is mostly the same:

$ grep FATAL 0020-mistral.tar.gz/var/log/containers/mistral/package_update.log | grep -c "does not exist"
19

~~~
{
  "ansible_loop_var": "item",
  "changed": true,
  "cmd": "set -e\nif podman ps -f \"id=01c2b754fee8\" --format \"{{.Names}}\" | grep -q \"^haproxy-bundle\"; then\n  tar -c /etc/pki/tls/private/overcloud_endpoint.pem | podman exec -i 01c2b754fee8 tar -C / -xv\nelse\n  podman cp /etc/pki/tls/private/overcloud_endpoint.pem 01c2b754fee8:/etc/pki/tls/private/overcloud_endpoint.pem\nfi\npodman exec --user root 01c2b754fee8 chgrp haproxy /etc/pki/tls/private/overcloud_endpoint.pem\npodman kill --signal=HUP 01c2b754fee8\n",
  "delta": "0:00:00.569902",
  "end": "2023-06-28 12:29:29.541934",
  "failed_when_result": true,
  "item": "01c2b754fee8",
  "msg": "non-zero return code",
  "rc": 125,
  "start": "2023-06-28 12:29:28.972032",
  "stderr": "Error: container \"01c2b754fee8\" does not exist",
  "stderr_lines": [
    "Error: container \"01c2b754fee8\" does not exist"
  ],
  "stdout": "",
  "stdout_lines": []
}
~~~

But there's one that's slightly different:
~~~
{
  "ansible_loop_var": "item",
  "changed": true,
  "cmd": "set -e\nif podman ps -f \"id=0185c92bf4a9\" --format \"{{.Names}}\" | grep -q \"^haproxy-bundle\"; then\n  tar -c /etc/pki/tls/private/overcloud_endpoint.pem | podman exec -i 0185c92bf4a9 tar -C / -xv\nelse\n  podman cp /etc/pki/tls/private/overcloud_endpoint.pem 0185c92bf4a9:/etc/pki/tls/private/overcloud_endpoint.pem\nfi\npodman exec --user root 0185c92bf4a9 chgrp haproxy /etc/pki/tls/private/overcloud_endpoint.pem\npodman kill --signal=HUP 0185c92bf4a9\n",
  "delta": "0:00:00.696201",
  "end": "2023-06-28 12:29:58.321678",
  "failed_when_result": true,
  "item": "0185c92bf4a9",
  "msg": "non-zero return code",
  "rc": 255,
  "start": "2023-06-28 12:29:57.625477",
  "stderr": "Error: OCI runtime error: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1",
  "stderr_lines": [
    "Error: OCI runtime error: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1"
  ],
  "stdout": "",
  "stdout_lines": []
}
~~~