Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2020210

Summary: [FFWD 13 ->16.2] Controller upgrade fails due to inactive rabbitmq (PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log')
Product: Red Hat OpenStack Reporter: Jose Luis Franco <jfrancoa>
Component: openstack-selinuxAssignee: Julie Pichon <jpichon>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.2 (Train)CC: cjeanner, jpichon, jpretori, lhh, lvrabec, pgrist, spower, tvignaud
Target Milestone: z1Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-selinux-0.8.28-2.20210612124809.el8ost.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2026451 (view as bug list) Environment:
Last Closed: 2021-12-09 20:41:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jose Luis Franco 2021-11-04 11:13:09 UTC
Description of problem:

OSP13 to OSP16.2 FFWD upgrades CI job are consistently failing during the first controller upgrade step. The failure occurs in the step 2 of the deploy_tasks:

2021-11-04 05:09:51 | 2021-11-04 05:09:46.313036 | 52540003-fc03-57af-b661-000000001b0d |    WAITING | Wait for containers to start for step 2 using paunch | controller-0 | 545 retries left
2021-11-04 05:09:51 | 2021-11-04 05:09:50.905674 | 52540003-fc03-57af-b661-000000001b0d |      FATAL | Wait for containers to start for step 2 using paunch | controller-0 | error={"ansible_job_id": "407855745283.181049", "attempts": 657, "changed": false, "finished": 1, "msg": "Paunch failed with config_id 

.....

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/undercloud-0/home/stack/overcloud_upgrade_run-controller-0,networker-0.log.gz

When checking the containers running in the node we observe the rabbitmq_wait_bundle container exited with error code 1: 

d83bcc96d57a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:16.2_20211027.1            /container_puppet...  42 minutes ago  Exited (1) 7 minutes ago           rabbitmq_wait_bundle

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/extra/podman/podman_allinfo.log.gz

Which indeed, the rabbitmq-bundle did fail to start up:

2021-11-04T04:26:21.204527852+00:00 stderr F ERROR:__main__:Failed to change ownership of /var/log/rabbitmq/log to 42439:42439
2021-11-04T04:26:21.204527852+00:00 stderr F Traceback (most recent call last):
2021-11-04T04:26:21.204527852+00:00 stderr F   File "/usr/local/bin/kolla_set_configs", line 359, in set_perms
2021-11-04T04:26:21.204527852+00:00 stderr F     os.chown(path, uid, gid)
2021-11-04T04:26:21.204527852+00:00 stderr F PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log'

Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/containers/stdouts/rabbitmq-bundle.log.gz

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Julie Pichon 2021-11-04 11:23:58 UTC
As discussed on IRC, this is due to a change in container-selinux which added a file context for a path that we're also defining in openstack-selinux:

https://github.com/containers/container-selinux/commit/7e5f3cae10e2d805821fb84dff7418b9e3b0cc1f
/var/log/containers(/.*)?							gen_context(system_u:object_r:container_log_t,s0)

https://github.com/redhat-openstack/openstack-selinux/blob/af4b2b8/local_settings.sh.in#L31
/var/log/containers(/.*)?                          all files          system_u:object_r:container_file_t:s0 

Redefining file contexts is not allowed so this file context fails to apply when openstack-selinux is installed.

Comment 3 Cédric Jeanneret 2021-11-04 12:07:13 UTC
We have to double-check for 16.1 - is it shipping the same container-selinux change?

Comment 4 Julie Pichon 2021-11-04 12:40:45 UTC
I'm double-checking but I believe the container-tools module stream is tied to the OS, not to the release. So I would expect the same container-selinux package will be available in 16.1 (could we add the container-selinux version to this bug?). If it's a package update and openstack-selinux was already installed though, the problem will probably manifest differently since it might be the container-selinux package that fails.

Comment 5 Julie Pichon 2021-11-04 12:52:05 UTC
16.1 and 16.2 are actually tied to specific, different versions (because of 8.2 vs 8.4 streams) so it may not be a problem on 16.1 (yet).

I'm not sure a straightforward backport will work though... The new 'allow' rule would be fine, but if we backport the file context change and container-selinux is still an old one, we may end up with yet another context that container_t can't manage, probably var_log_t.

Comment 6 Jose Luis Franco 2021-11-04 13:53:10 UTC
Requesting blocker flag. This issue is impacting to all our FFWD CI jobs from OSP13 to OSP16.2 and it would block customers upgrading to OSP16.2.

Comment 8 Julie Pichon 2021-11-05 10:15:53 UTC
(In reply to Julie Pichon from comment #5)
> 16.1 and 16.2 are actually tied to specific, different versions (because of
> 8.2 vs 8.4 streams) so it may not be a problem on 16.1 (yet).

Just to follow up on this point, a RHEL 8.2 VM set up with the container-tools:2.0 stream shows container-selinux-2.124.0-1.module+el8.2.0+11121+714aca16.src.rpm as available, which is old enough NOT to contain the problematic patch.

Comment 24 errata-xmlrpc 2021-12-09 20:41:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5067