Bug 1941922 - Add missing fcontext in templates using the ":z" flag for bind-mounts
Summary: Add missing fcontext in templates using the ":z" flag for bind-mounts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.0
Assignee: Cédric Jeanneret
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-23 07:33 UTC by Cédric Jeanneret
Modified: 2022-09-21 12:15 UTC (History)
5 users (show)

Fixed In Version: tripleo-ansible-3.1.3-0.20210701131810.8398ce3.el8ost openstack-tripleo-heat-templates-14.1.3-0.20210702225236.ee71cd7.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:14:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github redhat-openstack openstack-selinux pull 73 0 None open Allow container_t to manage swift_data_t files/dirs 2021-03-26 08:17:07 UTC
OpenStack gerrit 782393 0 None NEW Ensure we get the relevant fcontext overrides 2021-03-23 13:16:35 UTC
OpenStack gerrit 782439 0 None NEW Ensure SELinux context persist across restorecon and reboot 2021-03-23 13:16:35 UTC
Red Hat Bugzilla 1941412 1 urgent CLOSED After updating undercloud to 16.1.4, overcloud deploy or "update prepare" fails due to selinux issue on Swift 2022-10-03 14:30:07 UTC
Red Hat Issue Tracker OSP-1409 0 None None None 2022-04-13 20:04:51 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:15:41 UTC

Description Cédric Jeanneret 2021-03-23 07:33:07 UTC
Description of problem:

We rely too much on the ":z" flag to relabel container volumes. This can lead to issues where an SELinux related package is updated, and a restorecon is issued, while the containers aren't restarted (as in "stop", "start"), leading to SELinux denials due to the volume being back to its original context/label.

While some of the fcontext are set via the openstack-selinux package, we can't do it for every single locations, especially since there are users that aren't on tripleo/osp. Making the change in openstack-selinux would then break their environment.

The right thing to do is:
- gather every ":z" mention in t-h-t
- for every location, check if an fcontext rule exists in openstack-selinux[1]
- if not, add the needed call to the Ansible sefecontext module

I think we can avoid applying the context (i.e. no need to run restorecon), since it would be done anyway via the ":z" flag.

We might want to make a new role within tripleo-ansible in order to:
- add the fcontext rule
- create the location with the wanted label
If we run fcontext before creating the location, we ensure it will get the correct labels. But we might hit a situation where the location already exists - so we want to ensure it's correctly labelled.

Notes:
- the ":z" flag will still be needed for *shared* locations
- this will probably lead to far, far less use of ":z" - not all the listed volumes are shared within multiple containers
- the call to sefcontext will make ansible slower, because fcontext itself is a slow thing. We will need to do some checks and PoC in order to compare the elapsed time.

Comment 2 Cédric Jeanneret 2021-03-23 13:16:37 UTC
With the following 2 patches:
- https://review.opendev.org/c/openstack/tripleo-heat-templates/+/782439 (hitting t-h-t)
- https://review.opendev.org/c/openstack/tripleo-ansible/+/782393 (needed, some directory creations are in there, easier to keep things together)

the Undercloud has the following custom fcontext:

/etc/iscsi(/.*)?                                   all files          system_u:object_r:container_file_t:s0 
/etc/target(/.*)?                                  all files          system_u:object_r:container_file_t:s0 
/httpboot(/.*)?                                    all files          system_u:object_r:httpd_sys_content_t:s0 
/tftpboot(/.*)?                                    all files          system_u:object_r:tftpdir_t:s0 
/usr/bin/neutron-rootwrap-daemon                   all files          system_u:object_r:neutron_exec_t:s0 
/usr/bin/neutron-vpn-agent                         all files          system_u:object_r:neutron_exec_t:s0 
/usr/bin/swift-object-reconstructor                all files          system_u:object_r:swift_exec_t:s0 
/usr/bin/swift-object-relinker                     all files          system_u:object_r:swift_exec_t:s0 
/var/cache/swift(/.*)                              all files          system_u:object_r:swift_var_cache_t:s0 
/var/lib/config-data(/.*)?                         all files          system_u:object_r:container_file_t:s0 
/var/lib/container-config-scripts(/.*)?            all files          system_u:object_r:container_file_t:s0 
/var/lib/designate/bind9(/.*)?                     all files          system_u:object_r:named_zone_t:s0 
/var/lib/ironic(/.*)?                              all files          system_u:object_r:container_file_t:s0 
/var/lib/ironic-inspector/dhcp-hostsdir(/.*)?      all files          system_u:object_r:container_file_t:s0 
/var/lib/iscsi(/.*)?                               all files          system_u:object_r:container_file_t:s0 
/var/lib/kolla(/.*)?                               all files          system_u:object_r:container_file_t:s0 
/var/lib/mongodb(/.*)?                             all files          system_u:object_r:mongod_var_lib_t:s0 
/var/lib/mysql(/.*)?                               all files          system_u:object_r:container_file_t:s0 
/var/lib/nova/.ssh(/.*)?                           all files          system_u:object_r:ssh_home_t:s0 
/var/lib/openstack-dashboard                       all files          system_u:object_r:httpd_var_lib_t:s0 
/var/lib/rabbitmq(/.*)?                            all files          system_u:object_r:container_file_t:s0 
/var/lib/tripleo-config(/.*)?                      all files          system_u:object_r:container_file_t:s0 
/var/lib/vhost_sockets(/.*)?                       all files          system_u:object_r:virt_cache_t:s0 
/var/log/aodh/app.log                              all files          system_u:object_r:httpd_log_t:s0 
/var/log/ceilometer/app.log                        all files          system_u:object_r:httpd_log_t:s0 
/var/log/containers(/.*)?                          all files          system_u:object_r:container_file_t:s0 
/var/log/gnocchi/app.log                           all files          system_u:object_r:httpd_log_t:s0 
/var/log/pacemaker.log.*                           all files          system_u:object_r:cluster_var_log_t:s0 
/var/log/panko/app.log                             all files          system_u:object_r:httpd_log_t:s0 
/var/log/zaqar/zaqar.log                           all files          system_u:object_r:httpd_log_t:s0 

Notes:
- the lines that aren't matching container_file_t come mostly from openstack-selinux, and aren't used in the containerized environment (meaning: NOT osp nor tripleo)
- the /var/log/containers line is the only one from openstack-selinux setting an fcontext matching container_file_t
- I don't see a real increase of time due to the fcontext calls, at least for the UC.

Comment 3 Cédric Jeanneret 2021-03-23 13:37:44 UTC
Note:
we have at least one conflict with openstack-selinux:
local_settings.sh.in:["$LOCALSTATEDIR/cache/swift(/.*)"]='swift_var_cache_t'

in TripleO/OSP, we want container_file_t here.... not sure how things will be, but that's not good.

@jpichon any idea? Guess we can't really drop that line from the openstack-selinux, can we? How does `semanage fcontext' behave in this kind of situation? An openstack-selinux update will revert the configuration anyway :(.

Comment 4 Julie Pichon 2021-03-24 10:54:31 UTC
I've been researching different options and not really coming up with any good solution.

We can't just replace the file context line because we have RDO users deploying on baremetal who contribute to openstack-selinux whose environment would break.

The gist of the problem is that baremetal deployments and TripleO deployments have a need for different file contexts to be set up.

- - -

(a) I was wondering if we could take advantage of SELinux module priority but that only applies to .pp files, aka the allow etc rules. File labels are separate because they're not supposed to change that easily, I think.

The way semanage fcontext works when there is a conflict is that it breaks:

$ sudo semanage fcontext -a -t container_file_t "/var/cache/swift(/.*)"
ValueError: File context for /var/cache/swift(/.*) already defined

And I think that's why the rules weren't applied in your environment - the old ones need to be removed first.

$ sudo semanage fcontext -d -t swift_var_cache_t "/var/cache/swift(/.*)"
$ sudo semanage fcontext -a -t container_file_t "/var/cache/swift(/.*)"

But then we come back to what you said: an openstack-selinux update would revert them... Unless we add some logic in local_settings.sh to not delete/add/modify rules that are currently container_file_t...?

(b) If there was some way, any way to know if we're in a TripleO environment at openstack-selinux install time, we could read the environment variable to decide which file contexts to apply... But from what I can tell, openstack-selinux is one of the many dependencies that get installed together with python-tripleoclient so it doesn't look like we have any chance to set up anything before that, since it's the very first step.

We can't create a new SELinux boolean because that applies at the rule levels, not file contexts - we can't adjust file labels based on a boolean that's later set to true or false.

Although... Could we set an environment variable anyway? That could avoid the problem with package updates then? Process becomes:

1. Install openstack-selinux, with the current rules and file contexts
2. Run the install - THT sets up the new container_file_t contexts (may have to delete the conflicting ones first)
3. Set that TRIPLEO_ENV variable or whatever it's called
4. Then on openstack-selinux updates, local_settings.sh will pick that up and make sure we don't override the new file contexts need by TripleO?

(c) New package... openstack-selinux vs tripleo-selinux or something. With only file contexts differences? There are a lot of container-related rules that probably should move over for clarity then, though that might be hard to untangle at this point. I'm not sure how easy it'd be to keep them in sync. Or tripleo-selinux could just have the extra file contexts and depend on openstack-selinux, to make sure the other package gets installed first and the file contexts can be overriden...?

(d) Stop having openstack-selinux manage file contexts giving issues, like you suggest. Just remove the line. Let deployers manage the rules depending on their type of environment. This might not break existing baremetal environments since the rule openstack-selinux originally set up should still exist, but this will break new deployments. Also seems to make the idea of storing SELinux policies in a package redundant if we end up pushing the burden onto the deployer anyway. I don't really support this.

- - -

All of these are messy and I hope I'm missing something and someone else can come up with something better. If not, I think I would lean more toward a variation of (a) or (b)... Thoughts?

Comment 5 Cédric Jeanneret 2021-03-24 11:40:33 UTC
Hello Julie,

Thank you for your thoughts and the possibilities!

I would go for the "A" one: "do not edit things that are set to container_file_t" or, maybe a bit further: "do not edit things that are already custom rules". Though this last step might lead to some issues when we actually WANT to edit existing rules, but it would prevent issues.

But....

But it won't be enough: if the package stops overriding things, we still have the counterpart withing tripleo-ansible and tripleo-heat-templates, where we're pushing new rules or, even, wanting to override existing custom rules. As you said: we'll need to "-d" the existing one (meaning we have to have a prior knowledge of the conflicting rule), then add the new one - this is time consuming, and I think we'll have to get a dedicated ansible action/plugin/handler/whatever to manage it properly. So not immediate remediation, unfortunately, and it might have a non-null impact on the huge efforts done in parallel to make the whole thing faster :(.

Regarding the "B"... we end with the same issue: tht/tripleo-ansible will have to override at least one existing rule, at least once in the stack lifecycle. While it's better, we'll still need to implement the logic that will check for existing custom rules, and take care of the conflicts - this is especially true for update/upgrades, where new ruleset are added.

C) A new package is a no-go, since it means duplicating rules, dependencies changes and conflicts at some point. Sad, but too complicated imho.

D) Dropping the conflicting rules will lead to issues reported by ppl using RDO on baremetal, and we want to prevent this. So indeed, not really something we should do.


Now.... there might be another way. Something a bit bigger, that will have an actual impact, but something that can help with the long term.
As you know, services such as mysql, redis, nova, neutron, swift and others are usually running in a dedicated SELinux context, usually "servicename_t", as in "swift_t", "mysql_t" and so on. Since we're running those services in containers, everything is overridden and runs as "container_t" (save nova_libvirt, for the sVirt support). And this, of course, creates issues: container_t usually can't access "servicename_t" content, meaning we're relying on relabeling things over and over and over, or wanting to override the default file context rules as pushed by the selinux-policy-targeted package.

So... what about switching the container context back to the native service context? I.e. instead of having swift running as container_t, preventing it to access /srv/node without relabeling, we're configuring the container to run as swift_t?
Of course, there WILL be issues: for instance, I'm pretty sure we'll need to do some new policies in order to allow container_t to switch to swift_t context. But that's pretty much all. No more relabeling. No more SELinux access denials on the standard service locations.
Also, there will probably be services that won't be able to switch (nothing comes to my mind right now though) - and we might need to give some thoughts for the shared volumes: what would be their context? Would new rules/policies be needed? Can't we rely on the existing ones already? What about booleans?

But I think this path has to be discussed and explored. It's, of course, not an immediate remediation either - but I think we'll get better, greater success with this one than proposed solutions A or B.

I've proposed this as a topic for the coming Xena PTG[1], hopefully it will gain interest and we'll be able to discuss this matter. In case nothing comes out, I'll just try it anyway - I have the feeling this is the right way to solve our issues.

What do you think?

Cheers,

C.

Comment 6 Cédric Jeanneret 2021-03-24 11:42:05 UTC
Meh - forgot to add the link...

[1] https://etherpad.opendev.org/p/tripleo-xena-topics

Comment 7 Julie Pichon 2021-03-24 17:15:00 UTC
I like the idea in general! Additional thoughts that might need more research would be, could that cause other issues with bypassing SELinux container confinement? Also, if we can't start directly in the native service context and need to allow relabelling, would that mean that any process starting as container_t could relabel itself and pass as somethingelse_t and get access to other directories?
Then again it looks like relabelto and relabelfrom are separate permissions, so we probably could mitigate by only allowing that one way, I think...
I wonder if it may be worth asking the podman / container-selinux people about expected best practices around this?

About allowing the container with the new types access to shared container_file_t volumes, we already have a number of rules around this, so it's fine to add more as needed. Adding new allow access rules is transparent and doesn't break anything, it's only file label changes that could.

Comment 8 Cédric Jeanneret 2021-03-25 06:40:45 UTC
Heya,

A 3rd way would be to analyze things precisely and see if we could allow container_t to access specific locations. For instance, allowing container_t to manage swift_data_t wouldn't represent a really huge flaw imho, while solving the conflict, and avoiding relabeling a location that can be pretty huge....

Frankly, I'd rather go THAT way than any others, but it might be just me knowing what will be on the road if we go with any of the other proposals ;). I don't really expect security issues if we allow container_t to access swift_data_t - the locations are known anyway.
The only thing.... of course if we have this location mounted in multiple containers, we'll still need the :z flag, which will relabel to container_file_t:s0 - but at least, a restorecon will NOT break the service. Which is the goal in the end.

In short:
Solution 1: do some hack in the openstack-selinux, using some env var (what will push it? when? what about the fact openstack-selinux comes in before any other tripleo related package? should we change the dependency tree?)
Solution 2: allow container_t to manage files/directories with swift_data_t
Solution 3: rework container runtime labels (maybe only for specific cases, like swift?)

Solution 1 pros:
might allow to keep the fcontext in tht + tripleo-ansible alongside openstack-selinux, with the need to solve a chicken-and-the-egg issue (and a potential one-time label conflict)
Solution 1 cons:
there's this chicken-and-the-egg issue, and the possibility to solve an fcontext conflict, meaning we'll need to remove one, and add a new one. fcontext actions are slow, but it MIGHT be OK.

Solution 2 pros:
fast to implement, easy to understand
Solution 2 cons:
might create a security flow, further analysis is needed; it probably can't be applied for other conflicts, but, at least for now, swift_data_t is the only one I detected.

Solution 3 pros:
might be clean, as running services in their native context is "right", at least on the paper, without further thoughts
Solution 3 cons:
long way; might have security impact we don't foresee; will need an assessment from the Container Team. On my own, I'm not that convinced this one is a path I want to follow, since it involves a ton of relabeling, or at least new SELinux policies to allow the native context to access container_file_t.

WDYT?

Comment 9 Julie Pichon 2021-03-25 08:59:33 UTC
I agree, that sounds absolutely reasonable to me. Let's go ahead with solution 2.

About solution 3, after reading more description about how it would work I was wondering if that would be similar to reverting to SELinux-less containers like in docker times, which is why I was wondering about asking for more opinions (just because I'm not sure myself).

But solution 2 seems a lot cleaner, easier and manageable so let's start there, and we can revisit later if required. Thanks for all the thoughts!

Comment 10 Cédric Jeanneret 2021-03-26 08:17:07 UTC
Thank you for your agreement, Julie :).

Here's a first patch for swift_data_t then:
https://github.com/redhat-openstack/openstack-selinux/pull/73

This, alongside the 2 others, should do it. I'll update the tht one in order to drop the fcontext for swift location, since it conflicts with the existing one.

Cheers,

C.

Comment 11 Julie Pichon 2021-03-26 13:38:38 UTC
Another option I didn't think about when writing comment 4: a downstream-only patch that changes the paths already defined in openstack-selinux to container_file_t. Downsides are the usual ones with downstream-only patches, chances of conflicts etc, and also that means it wouldn't work for upstream TripleO deployments.

I think the current solution is fine, just adding a note here for future reference in case we do decide to revisit at some point.

Comment 12 Julie Pichon 2021-04-07 10:34:12 UTC
(In reply to Cédric Jeanneret from comment #10)
> Here's a first patch for swift_data_t then:
> https://github.com/redhat-openstack/openstack-selinux/pull/73

For reference, this should now be available in the openstack-selinux-0.8.24-1.20210407093456.26243bf.el8ost build for 16.1, as well as the latest 16.2 and 17.0 builds.

Comment 13 Cédric Jeanneret 2021-08-27 10:50:17 UTC
How to verify:

- get OSP deployed (UC + 1-ctl for instance)
- on both nodes, run the following command, as root: `semanage fcontext -Cln'

There should be *a lot* of entries there, matching for instance:
- /var/lib/rabbitmq(/.*)?
- /var/lib/mysql(/.*)?
- /var/lib/config-data/ansible-generated/unbound(/.*)?
- /var/lib/tripleo-config(/.*)?


(I won't put every single lines, since they also depends on the active services)

Comment 15 David Rosenfeld 2022-06-17 15:31:55 UTC
Used procedure in Comment 13. This is example output from a controller:

[heat-admin@controller-0 ~]$ sudo semanage fcontext -Cln
/etc/iscsi(/.*)?                                   all files          system_u:object_r:container_file_t:s0 
/etc/target(/.*)?                                  all files          system_u:object_r:container_file_t:s0 
/httpboot(/.*)?                                    all files          system_u:object_r:httpd_sys_content_t:s0 
/tftpboot(/.*)?                                    all files          system_u:object_r:tftpdir_t:s0 
/usr/bin/neutron-rootwrap-daemon                   all files          system_u:object_r:neutron_exec_t:s0 
/usr/bin/neutron-vpn-agent                         all files          system_u:object_r:neutron_exec_t:s0 
/usr/bin/swift-object-reconstructor                all files          system_u:object_r:swift_exec_t:s0 
/usr/bin/swift-object-relinker                     all files          system_u:object_r:swift_exec_t:s0 
/var/cache/swift(/.*)?                             all files          system_u:object_r:swift_var_cache_t:s0 
/var/lib/cinder(/.*)?                              all files          system_u:object_r:container_file_t:s0 
/var/lib/config-data(/.*)?                         all files          system_u:object_r:container_file_t:s0 
/var/lib/container-config-scripts(/.*)?            all files          system_u:object_r:container_file_t:s0 
/var/lib/designate/bind9(/.*)?                     all files          system_u:object_r:named_zone_t:s0 
/var/lib/iscsi(/.*)?                               all files          system_u:object_r:container_file_t:s0 
/var/lib/kolla(/.*)?                               all files          system_u:object_r:container_file_t:s0 
/var/lib/mongodb(/.*)?                             all files          system_u:object_r:mongod_var_lib_t:s0 
/var/lib/nova/.ssh(/.*)?                           all files          system_u:object_r:ssh_home_t:s0 
/var/lib/openstack-dashboard                       all files          system_u:object_r:httpd_var_lib_t:s0 
/var/lib/rabbitmq(/.*)?                            all files          system_u:object_r:container_file_t:s0 
/var/lib/tripleo-config(/.*)?                      all files          system_u:object_r:container_file_t:s0 
/var/lib/vhost_sockets(/.*)?                       all files          system_u:object_r:virt_cache_t:s0 
/var/log/aodh/app.log                              all files          system_u:object_r:httpd_log_t:s0 
/var/log/ceilometer/app.log                        all files          system_u:object_r:httpd_log_t:s0 
/var/log/gnocchi/app.log                           all files          system_u:object_r:httpd_log_t:s0 
/var/log/pacemaker.log.*                           all files          system_u:object_r:cluster_var_log_t:s0 
/var/log/panko/app.log                             all files          system_u:object_r:httpd_log_t:s0 
/var/log/zaqar/zaqar.log                           all files          system_u:object_r:httpd_log_t:s0 
[heat-admin@controller-0 ~]$

Comment 21 errata-xmlrpc 2022-09-21 12:14:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.