Description of problem: In Puppet OpenStack project, we are trying to implement support for swift-container-sharder. But at some point, the job started failing because of timeout. Example build: https://zuul.opendev.org/t/openstack/build/7934c2b3a1a944edb63a99a803374a0b Looking at the job timeout we noticed the following command takes more than 1 hour. $ sudo -E sealert -a /var/log/audit/audit.log Looking at audit.log, we see bunch of denial logs about /var/cache/swift/conainer.recon . ~~~ type=AVC msg=audit(1643970686.673:12167): avc: denied { open } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5282236 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970686.673:12168): avc: denied { getattr } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5282236 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970686.673:12169): avc: denied { ioctl } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5282236 ioctlcmd=0x5401 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970686.674:12170): avc: denied { lock } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5282236 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970686.675:12171): avc: denied { unlink } for pid=83749 comm="swift-container" name="container.recon" dev="xvda1" ino=5282236 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.697:12197): avc: denied { read write } for pid=83749 comm="swift-container" name="container.recon" dev="xvda1" ino=5315251 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.697:12197): avc: denied { open } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5315251 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.697:12198): avc: denied { getattr } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5315251 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.697:12199): avc: denied { ioctl } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5315251 ioctlcmd=0x5401 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.697:12200): avc: denied { lock } for pid=83749 comm="swift-container" path="/var/cache/swift/container.recon" dev="xvda1" ino=5315251 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 type=AVC msg=audit(1643970716.698:12201): avc: denied { unlink } for pid=83749 comm="swift-container" name="container.recon" dev="xvda1" ino=5315251 scontext=system_u:system_r:swift_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=1 ~~~ Very strange thing is that these denied operations are executed by container-replicator, not by container-sharder we are trying to enable. ~~~ swift 80969 1 80969 1.5 0.5 205964 42420 /usr/libexec/platform-python -s /usr/bin/swift-container-sharder /etc/swift/container-server.conf swift 83749 1 83749 3.4 0.4 197820 33216 /usr/libexec/platform-python -s /usr/bin/swift-container-replicator /etc/swift/container-server.conf ~~~ We don't see these denials in existing jobs, without container-sharder and it looks like the service is somehow results in triggering these operations. However the container.recon file is one of the swift data file and this access should not be blocked. Version-Release number of selected component (if applicable): openstack-selinux-0.8.29-0.20211104145848.7211283.el8.noarch openstack-swift-account-2.29.0-0.20220129035748.3429a11.el8.noarch openstack-swift-container-2.29.0-0.20220129035748.3429a11.el8.noarch openstack-swift-object-2.29.0-0.20220129035748.3429a11.el8.noarch openstack-swift-proxy-2.29.0-0.20220129035748.3429a11.el8.noarch How reproducible: Always Steps to Reproduce: 1. Setup CentOS8 with selinux permissive 2. Deploy swift with container-shader enabled 3. check audit.log Actual results: Denial is continuously logged Expected results: Denial should not be detected Additional info: The same issue is observed in CentOS9 Stream job as well. C9S job is progressing faster thus doesn't hit timeout. https://zuul.opendev.org/t/openstack/build/a1933e745bb546149aaa581ce489972a
I'm wondering if this may have something to do with directories not being mounted with :z... https://github.com/openstack/tripleo-heat-templates/blob/c4aa1e3/deployment/swift/swift-storage-container-puppet.yaml#L697 - if: - {get_param: SwiftContainerSharderEnabled} - swift_container_sharder: image: *swift_container_image net: host user: swift restart: always volumes: list_concat: - {get_attr: [ContainersCommon, volumes]} - - /var/lib/kolla/config_files/swift_container_sharder.json:/var/lib/kolla/config_files/config.json:ro - /var/lib/config-data/puppet-generated/swift:/var/lib/kolla/config_files/src:ro - /srv/node:/srv/node - /dev:/dev - /var/cache/swift:/var/cache/swift <------------------ Here - /var/log/containers/swift:/var/log/swift:z Some of the other mounts in that file have /var/cache/swift:/var/cache/swift:z -- but then again, not all of them. Cedric, any thoughts on whether the cache directory should always be relabelled in that file? Still poking at the denials as well. As indicated on the PR already, the suggested fix wouldn't work as the rules don't match, but I'm not sure if that's something we want to allow yet as var_t seems pretty generic.
I really don't know. Fact is, the context shown in the denials seems to point to a HOST denial instead of a container thing. More over, it seems this is from the puppet CI, not an actual deploy - though I'm missing some more context. So I don't think playing with relabeling will actually help on that. I'm more thinking about some weird install order, where the directory is created before the actual policy setting the setype for this location is created. Takashi is collecting some more logs in order to get: - context of /var/cache/swift (ls -lZ + ls -lZd) - dnf.log for the package install order Once we get those info, we should be able to see a bit more light on this case. Fact is, apparently we don't see these issues within OSP/TripleO. For the record, container_t is allowed on var_cache_swift_t[1] and other swift related types, so I'm pretty sure it's related to install order at some point. [1] https://github.com/redhat-openstack/openstack-selinux/blob/master/os-podman.te#L45-L50
The issue is found not in TripleO job but Puppet jobs, which installs software from packages. CentOS 8 Stream job is still running at the time of running but I got results from CentOS 9 Stream job. https://zuul.opendev.org/t/openstack/build/2cf5b9eed57d41e5869c93dbb972bcbb From dnf.log, I confirmed openstack-selinux was installed before swift packages. https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1.rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/2cf5b9e/logs/dnf/dnf.rpm.txt 2022-02-04T13:02:44+0000 INFO --- logging initialized --- 2022-02-04T13:02:45+0000 SUBDEBUG Installed: container-selinux-3:2.173.2-1.el9.noarch 2022-02-04T13:03:07+0000 SUBDEBUG Installed: openstack-selinux-0.8.29-0.20211110070709.7211283.el9.noarch 2022-02-04T13:04:42+0000 INFO --- logging initialized --- ... 2022-02-04T13:06:43+0000 SUBDEBUG Installed: python3-swiftclient-3.13.0-0.20211202131348.3f5d5b0.el9.noarch 2022-02-04T13:06:44+0000 INFO --- logging initialized --- 2022-02-04T13:06:47+0000 SUBDEBUG Installed: liberasurecode-1.6.2-2.el9s.x86_64 2022-02-04T13:06:47+0000 SUBDEBUG Installed: python3-pyeclib-1.6.0-3.el9s.x86_64 2022-02-04T13:06:47+0000 SUBDEBUG Installed: python3-swift-2.29.0-0.20220204071333.e8cecf7.el9.noarch 2022-02-04T13:06:47+0000 INFO useradd warning: swift's uid 160 outside of the SYS_UID_MIN 201 and SYS_UID_MAX 999 range. 2022-02-04T13:06:48+0000 INFO --- logging initialized --- ... 2022-02-04T13:06:48+0000 INFO --- logging initialized --- 2022-02-04T13:06:49+0000 SUBDEBUG Installed: python3-ceilometermiddleware-2.4.0-0.20211202130400.b4c43bf.el9.noarch 2022-02-04T13:06:49+0000 SUBDEBUG Installed: openstack-swift-proxy-2.29.0-0.20220204071333.e8cecf7.el9.noarch 2022-02-04T13:06:51+0000 INFO --- logging initialized --- ... 2022-02-04T13:10:50+0000 INFO --- logging initialized --- 2022-02-04T13:10:51+0000 SUBDEBUG Installed: openstack-swift-account-2.29.0-0.20220204071333.e8cecf7.el9.noarch 2022-02-04T13:10:53+0000 INFO --- logging initialized --- 2022-02-04T13:10:54+0000 SUBDEBUG Installed: openstack-swift-container-2.29.0-0.20220204071333.e8cecf7.el9.noarch 2022-02-04T13:10:57+0000 INFO --- logging initialized --- 2022-02-04T13:10:58+0000 SUBDEBUG Installed: openstack-swift-object-2.29.0-0.20220204071333.e8cecf7.el9.noarch 2022-02-04T13:12:05+0000 INFO --- logging initialized --- ~~~ According to the output of ls -lZ, the .recon files have swift_var_cache_t type. https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1.rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/2cf5b9e/logs/var_cache_swift.txt However the /var/cache/swift directory has var_cache_t https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1.rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/2cf5b9e/logs/var_cache_swift_d.txt
posting the file content, in case build log expires. (In reply to Takashi Kajinami from comment #3) ... > According to the output of ls -lZ, the .recon files have swift_var_cache_t type. > > https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1. > rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002- > tempest-centos-9-stream/2cf5b9e/logs/var_cache_swift.txt total 12 -rw-------. 1 swift swift system_u:object_r:swift_var_cache_t:s0 380 Feb 4 13:59 account.recon -rw-------. 1 swift swift system_u:object_r:swift_var_cache_t:s0 1443 Feb 4 13:59 container.recon -rw-------. 1 swift swift system_u:object_r:swift_var_cache_t:s0 862 Feb 4 13:59 object.recon > However the /var/cache/swift directory has var_cache_t > > https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1. > rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002- > tempest-centos-9-stream/2cf5b9e/logs/var_cache_swift_d.txt drwxr-xr-x. 2 swift swift system_u:object_r:var_t:s0 4096 Feb 4 13:59 /var/cache/swift The full audit.log can be found here(audit.log.txt) https://945153fb5dc14c8536cd-253b0a19be2181811797fa6cbd0c2b8d.ssl.cf1.rackcdn.com/827850/2/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/2cf5b9e/logs/index.html
Basically what we implement in puppet is to install packages and update .conf files, and starts the systemd services. that's all. We don't use containers. We don't use TripleO. So this sounds like something wrong with packaged items which are not really used by TripleO, possibly because of podman + container_foo_t.
Thank you for the very helpful feedback, especially comment 4. I can reproduce the discrepancy in comment 4 locally. It's a problem with this line in https://github.com/redhat-openstack/openstack-selinux/blob/master/local_settings.sh.in#L36: ["$LOCALSTATEDIR/cache/swift(/.*)"]='swift_var_cache_t' There's a question mark missing to cover the case of the directory. It should be: ["$LOCALSTATEDIR/cache/swift(/.*)?"]='swift_var_cache_t' I think this possibly can explain the issue, because any file that doesn't already exists when the rule is applied will get the wrong context by default when created after that. However, it looks like the question mark's been missing for a few years, and in comment 4 container.recon has the correct label, so I'm not 100% sur if it will help here but that's something we'll want to fix either way.
Thanks, Julie, for investigation and explanation. I agree it's still strange, though we see that is worth fixing. We hadn't seen this until we enabled container-sharder and I'm not aware of any clear reason. Also, denial log shows the container.recon has var_cache_t but the actual file has swift_var_cache_t . Looking at the operation of that file, I found swift creates a temp file in the same directory then rename it to the target directory to override the old file. It might be possible there happens conflicting operation which results in accessing non-existing file but I'm not quite clear about this. If we agree to fix that missing '?', I'm inclined to fix that first and I can check whether the situation is improved once a new package with the fix is published.
That sounds like a good plan to me. If you'd like to submit the patch, I'll be happy to review it! I can tag a new openstack-selinux version after that.
Thanks Julie, I have created a separate PR to enforce swift_var_cache_t on /var/cache/swift itself. Because it's not unclear whether that is the root case, I used Related: tag instead of Fixes: tag.
Thank you for submitting the patch! Related vs Fixes sounds fine to me, I tagged the new commit with the 0.8.30 version tag. Let us know if it helps.
Setting needinfo on me because the next action is to check how our CI works with the updated openstack-selinux package. I rerun job last night but unfortunately the new package was not yet synced to our mirror at that moment.
Now we have openstack-selinux-0.8.30-0.20220207092220.33a7e5c.el9.noarch , and no longer see the denials. https://zuul.opendev.org/t/openstack/build/4226ea7d3154445093a2f863d1cfde8d https://bc3cac2a3b209bd45a0b-01a80a447b6e4e9a4734ae7a7e6b5d1e.ssl.cf1.rackcdn.com/827850/3/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/4226ea7/logs/rpm-qa.txt https://bc3cac2a3b209bd45a0b-01a80a447b6e4e9a4734ae7a7e6b5d1e.ssl.cf1.rackcdn.com/827850/3/check/puppet-openstack-integration-7-scenario002-tempest-centos-9-stream/4226ea7/job-output.txt I'm still waiting for the fixed package for CentOS 8 but I think we are good to close this now as we confirmed the solution works for at least for CentOS 9. I'll reopen this if we still see the issue even after the fixe version is released for CentOS 8 as well.
One more update. As it is taking a bit time until we get the fixed version released for CentOS 8, I implemented temporal workaround to ensure the proper selinux type is assigned to /var/cache/swift. https://review.opendev.org/c/openstack/puppet-openstack-integration/+/829380/ With this change, now the job is passing with container-sharder enabled and we no longer see bunch of denials. https://review.opendev.org/c/openstack/puppet-openstack-integration/+/829418/
Thank you for the update. This seems like a decent workaround, although I hope a recent package gets promoted soon.
Going to close this, feel free to open a new bug referencing this one if there are still issues! Thank you.