Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1846540

Summary: We should exclude the disk and base files from the default fcontext list
Product: Red Hat OpenStack Reporter: David Vallee Delisle <dvd>
Component: openstack-novaAssignee: Rajesh Tailor <ratailor>
Status: CLOSED CANTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: berrange, dasmith, eglynn, jhakimra, jzaher, kchamart, ratailor, sbauza, sgordon, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-03 12:44:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Vallee Delisle 2020-06-11 20:07:01 UTC
Description of problem:
Some situations are calling for a restorecon on /var/lib/nova (ie when libvirt crashes, some context are not restored, or the console.log file has the wrong context, etc).

Normally we recommend that the operator runs a restorecon on the specific files, but, sometimes, an operator can simply run a restorecon -R on /var/lib/nova and this prevents qemu from being able to write to the root disk, which can be pretty impacting.

I believe I have found a solution to prevent this kind of situation to happen:

If we replace "/var/lib/nova(/.*)?" with "/var/lib/nova(/?(?!disk).)*"

Unfortunately, I can't really test it because I can't delete the original rule easily:
# semanage fcontext -d '/var/lib/nova(/.*)?'
ValueError: File context for /var/lib/nova(/.*)? is defined in policy, cannot be deleted

But I know the regex would match perfectly: https://regexr.com/56if1

We should have something similar for the _base files as well.

Version-Release number of selected component (if applicable):
openstack-nova-common-14.1.0-22.el7ost.noarch


How reproducible:
All the times

Steps to Reproduce:
1. Check the current context of disk files: ls -Z /var/lib/nova/instances/*/disk
2. restorecon -R -F -v /var/lib/nova

Actual results:
Context changed: ls -Z /var/lib/nova/instances/*/disk

Expected results:
We believe it shouldn't break

Comment 2 Daniel Berrangé 2020-07-02 13:03:25 UTC
(In reply to David Vallee Delisle from comment #0)
> Description of problem:
> Some situations are calling for a restorecon on /var/lib/nova (ie when
> libvirt crashes, some context are not restored, or the console.log file has
> the wrong context, etc).
> 
> Normally we recommend that the operator runs a restorecon on the specific
> files, but, sometimes, an operator can simply run a restorecon -R on
> /var/lib/nova and this prevents qemu from being able to write to the root
> disk, which can be pretty impacting.

This seems to be saying that the restorecon is invoked while existing QEMU processes are still running.

If so, that is a serious mistake.

restorecon must never be invoked while QEMU (or probably any OS workload) is running. It is expected that a running system will make changes to on-disk contexts that are not represented in the master policy, and so invoking restorecon will blow away the live changes. It is only safe to run restorecon in situations where you know the files affected are not currently in use.

> Steps to Reproduce:
> 1. Check the current context of disk files: ls -Z
> /var/lib/nova/instances/*/disk
> 2. restorecon -R -F -v /var/lib/nova
> 
> Actual results:
> Context changed: ls -Z /var/lib/nova/instances/*/disk
> 
> Expected results:
> We believe it shouldn't break

IMHO the usage scenario is simply wrong, and restorecon is working as intended resetting labels back to their original boot time defaults.

Comment 3 Kashyap Chamarthy 2020-07-02 13:08:49 UTC
Given comment#2 from Daniel, that this is a user error.

David, if you confirm that `restorecon` is indeed being invoked when QEMU processes are running — an invalid usage — then we may just have to close this as a NOTABUG, I'm afraid.

Comment 4 David Vallee Delisle 2020-07-02 13:30:51 UTC
Actually, the instance was stopped and couldn't be started anymore because of this bug [1]. While it's probably a bad usage of restorecon, is there any issue in setting a safety net for our customers if we can easily prevent it?

This caused a 24h + outage for the tenant here, and was considered a major outage for our customer.

Thanks,

DVD

[1] https://access.redhat.com/solutions/3215031

Comment 5 Daniel Berrangé 2020-07-02 17:07:25 UTC
(In reply to David Vallee Delisle from comment #4)
> Actually, the instance was stopped and couldn't be started anymore because
> of this bug [1]. While it's probably a bad usage of restorecon, is there any
> issue in setting a safety net for our customers if we can easily prevent it?
> 
> This caused a 24h + outage for the tenant here, and was considered a major
> outage for our customer.

> [1] https://access.redhat.com/solutions/3215031

This solution is actually a reasonably good example of the usage of "restorecon".  The recommendation in that solution strictly targets just a single file, so it is safe and won't have any unexpected side effects.

The bug description here though talks about "restorecon on /var/lib/nova" which is an very bad thing to do on a machine which has any VMs.

Changing the fcontext database rule from "/var/lib/nova(/.*)?" to "/var/lib/nova(/?(?!disk).)*"  won't prevent restorecon doing damage. restorecon will always touch every single file under the path it is given. So the effect of changing fcontext is that it will merely apply a different label to the one it currently does. It'll still be a bad label that breaks QEMU's ability to access its disks.

The only way to address that would be for libvirt to update the fcontext database every single time it starts a VM, listing all the files that the VM is permitted to access. This isn't practical because fcontext database updates are way too slow. It would have even worse behaviour on host crashes too, because fcontext database is preserved across reboots, where as labels needed for QEMU are allocated fresh on each boot, so you don't want any trace of previous labels preserved across the reboot.

Comment 6 David Vallee Delisle 2020-07-02 22:50:26 UTC
Hello Daniel,

Thanks for the input here. You're right, restorecon touches all the files under a folder even if there's no predetermined context for said file and even if we escape it when we use -F. I tested it under another folder in my lab and I confirm [1].

But, when I restorecon without -F, it looks like we have the expected behavior [2]. Am I reading this right? If this is the case, I believe it could work to have a similar regex, unless customer uses -F.


Thanks,

DVD

[1]
~~~
[root@ess13latest-scpu-0 ~]# mkdir -p /dvd/{one,two,three} && cd /dvd
[root@ess13latest-scpu-0 dvd]# semanage fcontext -a -t nova_var_lib_t '/dvd(/?(?!two).)*'
[root@ess13latest-scpu-0 dvd]# ls -tralZ
drwxr-xr-x. root root system_u:object_r:root_t:s0      ..
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 one
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 two
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 three
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 .
[root@ess13latest-scpu-0 dvd]# restorecon -R -F -v /dvd
restorecon reset /dvd context unconfined_u:object_r:default_t:s0->system_u:object_r:nova_var_lib_t:s0
restorecon reset /dvd/one context unconfined_u:object_r:default_t:s0->system_u:object_r:nova_var_lib_t:s0
restorecon reset /dvd/two context unconfined_u:object_r:default_t:s0->system_u:object_r:default_t:s0
restorecon reset /dvd/three context unconfined_u:object_r:default_t:s0->system_u:object_r:nova_var_lib_t:s0
[root@ess13latest-scpu-0 dvd]# ls -lZ
drwxr-xr-x. root root system_u:object_r:nova_var_lib_t:s0 one
drwxr-xr-x. root root system_u:object_r:nova_var_lib_t:s0 three
drwxr-xr-x. root root system_u:object_r:default_t:s0   two
~~~

[2]
~~~
[root@ess13latest-scpu-0 ~]# rm -rf /dvd && mkdir -p /dvd/{one,two,three} && cd /dvd
[root@ess13latest-scpu-0 dvd]# restorecon -R -v /dvd
restorecon reset /dvd context unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
restorecon reset /dvd/one context unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
restorecon reset /dvd/three context unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
[root@ess13latest-scpu-0 dvd]# ls -lZ
drwxr-xr-x. root root unconfined_u:object_r:nova_var_lib_t:s0 one
drwxr-xr-x. root root unconfined_u:object_r:nova_var_lib_t:s0 three
drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 two
~~~

Comment 7 Daniel Berrangé 2020-07-03 11:49:48 UTC
(In reply to David Vallee Delisle from comment #6)
> [root@ess13latest-scpu-0 ~]# rm -rf /dvd && mkdir -p /dvd/{one,two,three} &&
> cd /dvd
> [root@ess13latest-scpu-0 dvd]# restorecon -R -v /dvd
> restorecon reset /dvd context
> unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
> restorecon reset /dvd/one context
> unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
> restorecon reset /dvd/three context
> unconfined_u:object_r:default_t:s0->unconfined_u:object_r:nova_var_lib_t:s0
> [root@ess13latest-scpu-0 dvd]# ls -lZ
> drwxr-xr-x. root root unconfined_u:object_r:nova_var_lib_t:s0 one
> drwxr-xr-x. root root unconfined_u:object_r:nova_var_lib_t:s0 three
> drwxr-xr-x. root root unconfined_u:object_r:default_t:s0 two
> ~~~

The test scenario is invalid because you've left the test files on their original default context when running restorecon. To replicate the qemu situation you need to first set the files to a non-default label. Then you'll see that restorecon will reset them back to the default. The changed fcontext merely affects whether they get reset back to nova_var_lib_t or var_lib_t - either way it is going to break QEMUs access.

Comment 8 David Vallee Delisle 2020-07-03 12:44:54 UTC
And you're right Daniel.

I'll update the KCS putting a warning against restorecon on /var/lib/nova. 

I'll close this bug now.

Thanks for the help.

DVD

~~~
[root@ess13latest-scpu-0 dvd]# chcon -t nova_var_lib_t two
[root@ess13latest-scpu-0 dvd]# restorecon -R -v /dvd
restorecon reset /dvd/two context unconfined_u:object_r:nova_var_lib_t:s0->unconfined_u:object_r:default_t:s0
~~~