Description of problem: /var/lib/kolla/ has wrong selinux context in some case which prevents containers from starting Version-Release number of selected component (if applicable): Latest How reproducible: That customer environment Steps to Reproduce: 1. unknown 2. 3. Actual results: podman containers are failing to start due to wrong selinux context on /var/lib/kolla. It had the var_lib context instead of container_files. Expected results: Somethign wicked happend. Additional info: Really wicked. Also, libvirtd was enable which spawned dnsmasq which binded to :53 and prevented ironic-dnsmasq from binding to the same port.
Shouldn't we add something like this semanage fcontext -a -t containers_file_t "/var/lib/kolla(/.*)?" ?
I believe directories under /var/lib/kolla/ are bind-mounted with :z in tripleo-heat-templates which gives them the right context. But we could also add this to openstack-selinux as an extra safety measure. Were you able to set the right context, did that resolve the issue with the containers not starting?
Is it possible a restorecon command was run after starting the containers? That would reset the bind-mounts among other things, and generally shouldn't be done here.
Bonjour Julie! I've tried everything so far , restorecon, reboot, manually set the permissions on /var/lib/kolla, etc ... the only thing that solves their problem is to reboot and put selinux in permissive. Something's uterly broken here in this case. There was a libvirtd service running which prevented the ironic-inspector-dnsmasq process from starting properly so I'm wondering if it could've snowballed to a selinux issue. Customer will try deploying a new minimal installation instead of the full installation with GUI and then re-install the undercloud. The main issue they faced was that introspection was failing... due to libvirtd but containers like mistral were in a "Z" state because it couldn't read some .json files . Another thing I found puzzling is that some containers failed to start with "sudo -E kolla_set_configs" whereas some other containers like mistral would just fail loading some .json files due to selinux.
Hello, Just stepping in. Adding the fcontext thing is a good idea - but keep in mind this action is slow. really slow (and it's a pity). Also, fcontext won't update existing files, so it probably will not correct the environment. But with that fcontext in place, we might then be able to run restorecon on that precise location. That said. Most of the files being mounted FROM that location have the :ro flag, meaning readonly, meaning the selinux type shouldn't really cause any issue. The locations with either flags are: deployment/database/mysql-pacemaker-puppet.yaml: - /var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json:rw,z deployment/glance/glance-api-container-puppet.yaml: - /var/lib/kolla/config_files/glance_api.json:/var/lib/kolla/config_files/config.json (no flag, implies :rw) The "z" relabels thing, but that would affect only the subdirectory This is it. Not really sure WHAT creates this location, apparently it's not within tripleo-heat-templates - maybe from a package? Cheers, C.
(In reply to David Hill from comment #4) > Bonjour Julie! I've tried everything so far , restorecon, reboot, manually > set the permissions on /var/lib/kolla, etc ... the only thing that solves > their problem is to reboot and put selinux in permissive. Something's > uterly broken here in this case. I would strongly advise against running restorecon commands, unless they're very targeted to a single file. This can cause a number of other problems. What are the openstack-selinux and container-selinux versions? It does seem like there are a number of other issues on-going in addition to the SELinux one...
@julie : so touch /.autorelabel is not advisable ? What happens if some kind of rpm update / installation generate these ? Everything blows up ? @cedric: I'm not sure either ... could be a package or even a python script...
I think we can reproduce this issue just by running restorecon or touch /.autorelabel... customer might have done something there.
touch /.autorelabel may be the least bad way to go about it, since it requires a reboot and the bind-mounts should be recreated when the containers restart after... But there are other labels that are e.g. only applied at deploy time in THT. I generally wouldn't recommend doing a wide restorecon on a running system. There are a lot of ways in which it can create problems (e.g. bug 1846540).
I'm not sure I agree with this at this point. It should be a permanent context for those paths if it's required and /.autorelabel shouldn't break anything. If we have selinux bugs, we should definitely fix them and I can see many reasons where a selinux relable could be required.
So basically here, a /.autorelabel will not restore the contexts to what they should be and then my undercloud deployment is no longer functionnal. Even if I reboot, it's broken now and from what I understand, I'd have to run "openstack undercloud install" again to restore those contexts ?
I think an update would reset the proper context as well. Also, when the containers are restarted after the reboot, the volumes bind-mounted with :z would also have the correct label.
I re-ran "openstack undercloud install" and it fixed the issue on my lab ... so this isn't as bad as it looks even though it's not how it should be fixed in my book. Adding the seliux context to path with semanage would be more appropriate ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0986