Bug 2020210
| Summary: | [FFWD 13 ->16.2] Controller upgrade fails due to inactive rabbitmq (PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log') | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jose Luis Franco <jfrancoa> | |
| Component: | openstack-selinux | Assignee: | Julie Pichon <jpichon> | |
| Status: | CLOSED ERRATA | QA Contact: | Jason Grosso <jgrosso> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 16.2 (Train) | CC: | cjeanner, jpichon, jpretori, lhh, lvrabec, pgrist, spower, tvignaud | |
| Target Milestone: | z1 | Keywords: | Triaged | |
| Target Release: | 16.2 (Train on RHEL 8.4) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-selinux-0.8.28-2.20210612124809.el8ost.1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2026451 (view as bug list) | Environment: | ||
| Last Closed: | 2021-12-09 20:41:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
As discussed on IRC, this is due to a change in container-selinux which added a file context for a path that we're also defining in openstack-selinux: https://github.com/containers/container-selinux/commit/7e5f3cae10e2d805821fb84dff7418b9e3b0cc1f /var/log/containers(/.*)? gen_context(system_u:object_r:container_log_t,s0) https://github.com/redhat-openstack/openstack-selinux/blob/af4b2b8/local_settings.sh.in#L31 /var/log/containers(/.*)? all files system_u:object_r:container_file_t:s0 Redefining file contexts is not allowed so this file context fails to apply when openstack-selinux is installed. We have to double-check for 16.1 - is it shipping the same container-selinux change? I'm double-checking but I believe the container-tools module stream is tied to the OS, not to the release. So I would expect the same container-selinux package will be available in 16.1 (could we add the container-selinux version to this bug?). If it's a package update and openstack-selinux was already installed though, the problem will probably manifest differently since it might be the container-selinux package that fails. 16.1 and 16.2 are actually tied to specific, different versions (because of 8.2 vs 8.4 streams) so it may not be a problem on 16.1 (yet). I'm not sure a straightforward backport will work though... The new 'allow' rule would be fine, but if we backport the file context change and container-selinux is still an old one, we may end up with yet another context that container_t can't manage, probably var_log_t. Requesting blocker flag. This issue is impacting to all our FFWD CI jobs from OSP13 to OSP16.2 and it would block customers upgrading to OSP16.2. (In reply to Julie Pichon from comment #5) > 16.1 and 16.2 are actually tied to specific, different versions (because of > 8.2 vs 8.4 streams) so it may not be a problem on 16.1 (yet). Just to follow up on this point, a RHEL 8.2 VM set up with the container-tools:2.0 stream shows container-selinux-2.124.0-1.module+el8.2.0+11121+714aca16.src.rpm as available, which is old enough NOT to contain the problematic patch. CI job passed using openstack-selinux version: 2021-11-14T00:36:17+0000 SUBDEBUG Installed: openstack-selinux-0.8.28-2.20210612124809.el8ost.1.noarch Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovs_vlan-provider-network/71/controller-0/var/log/dnf.rpm.log.gz CI job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovs_vlan-provider-network/71/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5067 |
Description of problem: OSP13 to OSP16.2 FFWD upgrades CI job are consistently failing during the first controller upgrade step. The failure occurs in the step 2 of the deploy_tasks: 2021-11-04 05:09:51 | 2021-11-04 05:09:46.313036 | 52540003-fc03-57af-b661-000000001b0d | WAITING | Wait for containers to start for step 2 using paunch | controller-0 | 545 retries left 2021-11-04 05:09:51 | 2021-11-04 05:09:50.905674 | 52540003-fc03-57af-b661-000000001b0d | FATAL | Wait for containers to start for step 2 using paunch | controller-0 | error={"ansible_job_id": "407855745283.181049", "attempts": 657, "changed": false, "finished": 1, "msg": "Paunch failed with config_id ..... Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/undercloud-0/home/stack/overcloud_upgrade_run-controller-0,networker-0.log.gz When checking the containers running in the node we observe the rabbitmq_wait_bundle container exited with error code 1: d83bcc96d57a undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-rabbitmq:16.2_20211027.1 /container_puppet... 42 minutes ago Exited (1) 7 minutes ago rabbitmq_wait_bundle Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/extra/podman/podman_allinfo.log.gz Which indeed, the rabbitmq-bundle did fail to start up: 2021-11-04T04:26:21.204527852+00:00 stderr F ERROR:__main__:Failed to change ownership of /var/log/rabbitmq/log to 42439:42439 2021-11-04T04:26:21.204527852+00:00 stderr F Traceback (most recent call last): 2021-11-04T04:26:21.204527852+00:00 stderr F File "/usr/local/bin/kolla_set_configs", line 359, in set_perms 2021-11-04T04:26:21.204527852+00:00 stderr F os.chown(path, uid, gid) 2021-11-04T04:26:21.204527852+00:00 stderr F PermissionError: [Errno 13] Permission denied: '/var/log/rabbitmq/log' Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.2-from-13-latest_cdn-3cont_2comp_2net-ipv4-ovn_vlan-provider-network/63/controller-0/var/log/containers/stdouts/rabbitmq-bundle.log.gz Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: