Bug 2026451

Summary: Controller upgrade fails due to inactive rabbitmq (PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log')
Product: Red Hat OpenStack Reporter: Julie Pichon <jpichon>
Component: openstack-selinuxAssignee: Julie Pichon <jpichon>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: cjeanner, jfrancoa, jgrosso, jpichon, jpretori, jschluet, lhh, lvrabec, pgrist, spower, ssigwald, tvignaud
Target Milestone: z8Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-selinux-0.8.24-1.20211201143442.26243bf.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2020210 Environment:
Last Closed: 2022-03-24 11:02:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Julie Pichon 2021-11-24 17:01:19 UTC
+++ This bug was initially created as a clone of Bug #2020210 +++

Description of problem:

OSP13 to OSP16.2 FFWD upgrades CI job are consistently failing during the first controller upgrade step. The failure occurs in the step 2 of the deploy_tasks:

2021-11-04 05:09:51 | 2021-11-04 05:09:46.313036 | 52540003-fc03-57af-b661-000000001b0d |    WAITING | Wait for containers to start for step 2 using paunch | controller-0 | 545 retries left
2021-11-04 05:09:51 | 2021-11-04 05:09:50.905674 | 52540003-fc03-57af-b661-000000001b0d |      FATAL | Wait for containers to start for step 2 using paunch | controller-0 | error={"ansible_job_id": "407855745283.181049", "attempts": 657, "changed": false, "finished": 1, "msg": "Paunch failed with config_id 

.....

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/undercloud-0/home/stack/overcloud_upgrade_run-controller-0,networker-0.log.gz

When checking the containers running in the node we observe the rabbitmq_wait_bundle container exited with error code 1: 

d83bcc96d57a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:16.2_20211027.1            /container_puppet...  42 minutes ago  Exited (1) 7 minutes ago           rabbitmq_wait_bundle

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/extra/podman/podman_allinfo.log.gz

Which indeed, the rabbitmq-bundle did fail to start up:

2021-11-04T04:26:21.204527852+00:00 stderr F ERROR:__main__:Failed to change ownership of /var/log/rabbitmq/log to 42439:42439
2021-11-04T04:26:21.204527852+00:00 stderr F Traceback (most recent call last):
2021-11-04T04:26:21.204527852+00:00 stderr F   File "/usr/local/bin/kolla_set_configs", line 359, in set_perms
2021-11-04T04:26:21.204527852+00:00 stderr F     os.chown(path, uid, gid)
2021-11-04T04:26:21.204527852+00:00 stderr F PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log'

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/containers/stdouts/rabbitmq-bundle.log.gz


--- Additional comment from Julie Pichon on 2021-11-05 10:15:53 UTC ---

(In reply to Julie Pichon from comment #5)
> 16.1 and 16.2 are actually tied to specific, different versions (because of
> 8.2 vs 8.4 streams) so it may not be a problem on 16.1 (yet).

Just to follow up on this point, a RHEL 8.2 VM set up with the container-tools:2.0 stream shows container-selinux-2.124.0-1.module+el8.2.0+11121+714aca16.src.rpm as available, which is old enough NOT to contain the problematic patch.


Additional information for 16.1:
--------------------------------

container-tools:2.0 provides an older version of container-selinux that does not conflict with the file context in openstack-selinux, so nothing is currently broken in 16.1. (Bug 2026372 was opened to figure out if the patch may be backported so we can coordinate a fix.)

The 16.2 patch cannot be applied as is because it removes the file context we still rely on in 16.1, but we could still bring in the rule.

Comment 11 errata-xmlrpc 2022-03-24 11:02:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986

Comment 12 Red Hat Bugzilla 2023-09-15 01:17:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days