Moving this from https://issues.redhat.com/browse/OCPBUGS-7651 We landed https://github.com/ostreedev/ostree/pull/2569 to do automatic policy rebuilds, but this fails across major version updates. Starting from a RHEL8 system (e.g. RHEL8 CoreOS), then: $ rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev:4.13-9.2 $ systemctl stop ostree-finalize-staged Feb 28 13:56:54 cosa-devsh systemd[1]: Stopped OSTree Finalize Staged Deployment. Feb 28 13:56:54 cosa-devsh systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'. Feb 28 13:56:54 cosa-devsh systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited status=1 Feb 28 13:56:54 cosa-devsh ostree[2466]: error: Child process exited with code 1 Feb 28 13:56:54 cosa-devsh ostree[2472]: semodule: Failed! Feb 28 13:56:54 cosa-devsh ostree[2472]: libsemanage.semanage_validate_and_compile_fcontexts: setfiles returned error code 255. Feb 28 13:56:54 cosa-devsh ostree[2473]: invalid context system_u:object_r:avahi_conf_t:s0 Feb 28 13:56:54 cosa-devsh ostree[2473]: libsepol.sepol_context_to_sid: could not convert system_u:object_r:avahi_conf_t:s0 to sid Feb 28 13:56:54 cosa-devsh ostree[2473]: libsepol.context_from_string: could not create context structure Feb 28 13:56:54 cosa-devsh ostree[2473]: libsepol.context_from_record: could not create context structure Feb 28 13:56:54 cosa-devsh ostree[2473]: libsepol.context_from_record: type avahi_conf_t is not defined Feb 28 13:56:53 cosa-devsh ostree[2472]: The --rebuild-if-modules-changed option is deprecated. Use --refresh instead. Feb 28 13:56:53 cosa-devsh ostree[2466]: Copying /etc changes: 12 modified, 0 removed, 33 added Feb 28 13:56:53 cosa-devsh ostree[2466]: Copying /etc changes: 12 modified, 0 removed, 33 added Feb 28 13:56:53 cosa-devsh kernel: EXT4-fs (vda3): re-mounted. Opts: Feb 28 13:56:53 cosa-devsh ostree[2466]: Finalizing staged deployment Feb 28 13:56:53 cosa-devsh systemd[1]: Stopping OSTree Finalize Staged Deployment...
(In reply to Colin Walters from comment #0) > $ rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev:4.13-9.2 I'm trying to reproduce this locally, but the above command fails with: error: remote error: reading manifest 4.13-9.2 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized What am I missing?
Right, you need to write an `/etc/ostree/auth.json` file containing a container pull secret from https://console.redhat.com/openshift/downloads#tool-pull-secret That said there's also a centos version which doesn't require a pull secret; try using ostree-unverified-registry:quay.io/okd/centos-stream-coreos-9:4.12-x86_64
(Sorry, didn't mean to move component, even though the fix may need to be somewhere other than ostree)
(In reply to Colin Walters from comment #2) > Right, you need to write an `/etc/ostree/auth.json` file containing a > container pull secret from > https://console.redhat.com/openshift/downloads#tool-pull-secret Thanks, that worked. > That said there's also a centos version which doesn't require a pull secret; > try using > ostree-unverified-registry:quay.io/okd/centos-stream-coreos-9:4.12-x86_64 That also worked, but in both cases I didn't get the reported libsepol errors... Is there another step missing to trigger the issue?
> That also worked, but in both cases I didn't get the reported libsepol errors... Yeah, `semodule -DB` (and I think any local policy modifications really aside from booleans) is enough to trigger.
It's worth stepping back here and understanding the high level goal: The last 11.5+ years of my life (since https://github.com/ostreedev/ostree/commit/f874ac043dae8a2147ccbb428629888905a32603 ) I've been trying to make general purpose Linux systems have fully transactional upgrades; updates are queued in the background and don't change your running system, *and* if when you reboot you hit a failure, you can just roll back to the previous system state. (But not in a default "you're not root on your computer" way like iOS/ChromeOS/etc). A key part of the design here is that we don't run new code when performing a rollback. ("run code" here would be e.g. recompiling selinux policy again) This maximizes reliability and means rollbacks are much more likely to work. Consequently, we must have *two copies* of the selinux policy - stored in /etc. Now, when going to do an update, per the PR we need to re-apply local changes against the new policy, while the old policy is still loaded. Hence, the policy compilation should not care about what policy is running (in theory in fact, we could support *transactionally* switching between a selinux-disabled state to enabled, etc.) ISTM (but I didn't dig fully) that something in the libsepolicy path is trying to look at the loaded policy in this case.
I think this probably needs to be an OCP 4.13 blocker because there's now extensive use of e.g. compilance operator and other tooling which wants to inject local selinux policy modifications, and that all worked when doing "rhel8 -> rhel8" upgrades. Now, if there isn't an easy fix in either ostree or libsemanage or whatever, then...I could imagine doing an awful hack where we build on https://github.com/openshift/os/pull/962 - but the semantics there are quite ugly as we'd be booting into rhel9 userspace with a rhel8 policy then dynamically reloading. I haven't tested it.
One thing I will note here is that with the container-native OS flow, where instead of mutating nodes individually we expect customers to e.g. use a Dockerfile to change policy - this bug doesn't exist, because we're not doing fragile per-machine tricks with containers before rebooting etc. The way the machine updates becomes (correctly) dumb again, we're just writing files.
Hmm...wait, actually this doesn't reproduce for me using a local CIL module like https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/using_selinux/writing-a-custom-selinux-policy_using-selinux If the issue here is really only scoped to `semodule -DB` i.e. enabling debug mode...that's not so bad at all, and probably not a blocker.
Clearing blocker flag per above.
I think I know where the problem is now... The contents of the policy.linked file actually depend on the disable_dontaudit flag, but we don't include them in the checksum. So after the user runs `semodule -DB`, policy.linked will be different from the rpm baseline and thus rpm-ostree won't update it to the new version. Then in the `semodule --refresh` called from `ostree admin finalize-taged` we see that the module content matches the checksum and we skip rebuilding policy.linked. Finally, the `setfiles` command ends up called with the old policy and new file context mappings, so it fails the validation. The fix is to simply include the effective disable_dontaudit value (and a couple of other inputs as well) in the checksum calculation, so that we detect that the policy needs to be rebuilt correctly. Switching to libsemanage and assigning to myself. NOTE: The bug will need to be cloned also to RHEL-8, since the functionality has been backported there as well (see bug 2049186), but let's keep it as a single bug until the fix is upstream for simplicity...
Thanks for digging in! I'm glad to hear agreement this only affects `semodule -DB`. This would be nice to fix for 9.3; but my instincts say that few production use cases will hit this, but if evidence proves otherwise we can consider 9.2.z I think at that time or so?
Fix posted upstream: https://lore.kernel.org/selinux/20230309143741.346749-1-omosnace@redhat.com/
*** Bug 2174873 has been marked as a duplicate of this bug. ***
The x is now merged upstream: https://github.com/SELinuxProject/selinux/commit/a171ba62bbba891a8dce2239327b1d905f695b82 Reassigning to Petr.
Retargeting to 9.3, since this doesn't seem to be justifiable as an exception/blocker and we don't have evidence that it breaks normal usage (workaround is to enable dontaudit rules before upgrade and disable them again after if needed). If anyone believes this should be expedited in 9.2, please scream :)