Bug 1992630

Summary:	Error during 16.0 to 16.1 controller update: TASK [copy certificate, chgrp, restart haproxy]
Product:	Red Hat OpenStack	Reporter:	Eduardo Olivares <eolivare>
Component:	openstack-tripleo	Assignee:	James Slagle <jslagle>
Status:	CLOSED DUPLICATE	QA Contact:	Joe H. Rahme <jhakimra>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	16.1 (Train)	CC:	mburns, michele
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-11 13:17:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eduardo Olivares 2021-08-11 12:53:20 UTC

Description of problem:
OSP 16.0 to 16.1 update fails during the update of the controller nodes. More specifically, it fails during the task "copy certificate, chgrp, restart haproxy" due to this:
2021-08-10 14:09:13 | TASK [copy certificate, chgrp, restart haproxy] ********************************
2021-08-10 14:09:13 | Tuesday 10 August 2021  14:09:11 +0000 (0:00:00.074)       0:14:21.127 ******** 
2021-08-10 14:09:13 | failed: [controller-0] (item=be8cd2c0a4e3) => {"ansible_loop_var": "item", "changed": true, "cmd": "set -e\nif podman ps -f \"id=be8cd2c0a4e3\" --format \"{{.Names}}\" | grep -q \"^haproxy-bundle\"; then\n  tar -c /etc/pki/tls/private/overcloud_endpoint.pem | podman exec -i be8cd2c0a4e3 tar -C / -xv\nelse\n  podman cp /etc/pki/tls/private/overcloud_endpoint.pem be8cd2c0a4e3:/etc/pki/tls/private/overcloud_endpoint.pem\nfi\npodman exec --user root be8cd2c0a4e3 chgrp haproxy /etc/pki/tls/private/overcloud_endpoint.pem\npodman kill --signal=HUP be8cd2c0a4e3\n", "delta": "0:00:00.618533", "end": "2021-08-10 14:09:12.553196", "item": "be8cd2c0a4e3", "msg": "non-zero return code", "rc": 2, "start": "2021-08-10 14:09:11.934663", "stderr": "tar: Removing leading `/' from member names\ntar: This does not look like a tar archive\ntar: Exiting with failure status due to previous errors\ntime=\"2021-08-10T14:09:12Z\" level=error msg=\"read unixpacket @->/var/run/libpod/socket/d66965777187448eda285e4c9f94f324349bdbda2fcb0dbdd7da76768ab23022/attach: read: connection reset by peer\"\nError: non zero exit code: 2: OCI runtime error", "stderr_lines": ["tar: Removing leading `/' from member names", "tar: This does not look like a tar archive", "tar: Exiting with failure status due to previous errors", "time=\"2021-08-10T14:09:12Z\" level=error msg=\"read unixpacket @->/var/run/libpod/socket/d66965777187448eda285e4c9f94f324349bdbda2fcb0dbdd7da76768ab23022/attach: read: connection reset by peer\"", "Error: non zero exit code: 2: OCI runtime error"], "stdout": "", "stdout_lines": []}

Link to the job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-from-16-latest_cdn-to-16.1-passed_phase2-composable-ipv4/44/
Link to the logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-from-16-latest_cdn-to-16.1-passed_phase2-composable-ipv4/44/undercloud-0/home/stack/overcloud_update_run-Controller.log.gz


I set the severity to medium because 16.0 is EOL.



Version-Release number of selected component (if applicable):
update from RHOS_TRUNK-16.0-RHEL-8-20200923.n.1 to RHOS-16.1-RHEL-8-20210804.n.0
update from RHOS_TRUNK-16.0-RHEL-8-20200923.n.1 to RHOS-16.1-RHEL-8-20210727.n.1


How reproducible:
2/2 times



Steps to Reproduce:
1. run the 16 to 16.1 OVN update job

Comment 1 Michele Baldessari 2021-08-11 13:17:28 UTC

This is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1988330 which should end in a 16.1 compose soonish (I hope)

*** This bug has been marked as a duplicate of bug 1988330 ***