Bug 571714

Summary: Running virt-df on disk image relabels it, so qemu can no longer write to it.
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 13CC: apevec, berrange, clydekunkel7734, dwalsh, eparis, harald, jonathan, kay.sievers, mgrepl, pcfe, torbjorn.lindahl
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: setroubleshoot_trace_hash:404b47887015b2c8a02f0f868866bb01e6e77935f01ea9bd1605c07881ea4c35
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 575128 (view as bug list) Environment:
Last Closed: 2010-06-25 07:47:11 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Richard W.M. Jones 2010-03-09 05:21:34 EST
Summary:

SELinux is preventing qemu-kvm "write" access on /dev/dm-5.

Detailed Description:

SELinux denied access requested by qemu-kvm. It is not expected that this access
is required by qemu-kvm and this access may signal an intrusion attempt. It is
also possible that the specific version or configuration of the application is
causing it to require additional access.

Allowing Access:

You can generate a local policy module to allow this access - see FAQ
(http://docs.fedoraproject.org/selinux-faq-fc5/#id2961385) Please file a bug
report.

Additional Information:

Source Context                system_u:system_r:svirt_t:s0:c202,c996
Target Context                system_u:object_r:fixed_disk_device_t:s0
Target Objects                /dev/dm-5 [ blk_file ]
Source                        qemu-kvm
Source Path                   /usr/bin/qemu-kvm
Port                          <Unknown>
Host                          (removed)
Source RPM Packages           
Target RPM Packages           
Policy RPM                    selinux-policy-3.7.11-1.fc13
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Plugin Name                   catchall
Host Name                     (removed)
Platform                      Linux (removed)
                              2.6.33-1.fc13.i686.PAE #1 SMP Wed Feb 24 19:54:49
                              UTC 2010 i686 i686
Alert Count                   11629
First Seen                    Tue 09 Mar 2010 10:00:34 AM GMT
Last Seen                     Tue 09 Mar 2010 10:10:28 AM GMT
Local ID                      c5168667-01d2-43d4-a7d0-e64fdebc5f93
Line Numbers                  

Raw Audit Messages            

node=(removed) type=AVC msg=audit(1268129428.874:128494): avc:  denied  { write } for  pid=28490 comm="qemu-kvm" path="/dev/dm-5" dev=devtmpfs ino=166267 scontext=system_u:system_r:svirt_t:s0:c202,c996 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file



Hash String generated from  catchall,qemu-kvm,svirt_t,fixed_disk_device_t,blk_file,write
audit2allow suggests:

#============= svirt_t ==============
allow svirt_t fixed_disk_device_t:blk_file write;
Comment 1 Richard W.M. Jones 2010-03-09 05:24:31 EST
This happened part way through an installation of a
VM (Debian 5 as it happens) using virt-install.

/dev/dm-5 is the VM's disk device.

When this happens, installation appears to hang
and the VM is unresponsive.  I guess it must be
getting loads of disk errors.  Note that installation
got quite far in before this happened.
Comment 2 Richard W.M. Jones 2010-03-09 05:46:59 EST
WTF ..

The reason is because I happened to run 'virt-df' at the same time (I didn't
think that could possibly affect the installation, which is why I didn't
mention it).

The device is labelled like this during the first part of the installation:

brw-rw----. qemu qemu system_u:object_r:svirt_image_t:s0:c129,c671 /dev/dm-5

After running virt-df, the label changes to:

brw-rw----. root disk system_u:object_r:fixed_disk_device_t:s0 /dev/dm-5

I'm not aware of how virt-df could change the label.  All it does is to run
qemu directly on the disk image (with snapshot=on so it won't modify the
disk image, just read from it).
Comment 3 Alan Pevec 2010-03-09 06:34:08 EST
(In reply to comment #2)
> brw-rw----. root disk system_u:object_r:fixed_disk_device_t:s0 /dev/dm-5
> 
> I'm not aware of how virt-df could change the label.  All it does is to run
> qemu directly on the disk image (with snapshot=on so it won't modify the
> disk image, just read from it).    

my guess is that restorecond did it, this is the context from the policy:
/dev/dm-[0-9]+ block device       system_u:object_r:fixed_disk_device_t:s0
Comment 4 Richard W.M. Jones 2010-03-09 07:36:57 EST
I used systemtap to locate the offending process: udevd!

udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
Comment 5 Richard W.M. Jones 2010-03-09 07:42:48 EST
My SystemTap script, in case anyone wasn't to reproduce it:
http://rwmj.wordpress.com/2010/03/09/tip-use-systemtap-to-monitor-selinux-changes-to-files/
Comment 6 Daniel Walsh 2010-03-09 10:45:39 EST
restorecond only watches files listed in /etc/selinux/restorecond.conf

It does not touch /dev/

So the question is why does udev change the label of an existing device.

udev should only be setting the label on device creation.
Comment 7 Harald Hoyer 2010-03-16 05:40:37 EDT
if a device is closed after being open for a write, udev gets a "change" event, and resets all labels, symlinks, etc.
Comment 8 Richard W.M. Jones 2010-03-16 05:53:50 EDT
qemu has some perverse behaviour where it will always open the device
for write, even if snapshot mode is requested (which should only
require read access).

However ...

(In reply to comment #7)
> if a device is closed after being open for a write, udev gets a "change" event,
> and resets all labels, symlinks, etc.    

This doesn't sound good to me.  It seems with sVirt, the intention is:

(1) At device creation, udevd labels the device with
system_u:object_r:fixed_disk_device.

(2) When libvirt starts the domain, it is relabeled with
some temporary svirt label.

(3) When libvirt stops the domain, libvirt relabels it
back to system_u:object_r:fixed_disk_device.

After step (1), libvirt is relying on udevd not to touch the
labels on the device node ever again.

Can we tell udevd not to touch the label on the device
unless it is actually being created?
Comment 9 Harald Hoyer 2010-03-16 06:38:33 EDT
(In reply to comment #8)
> Can we tell udevd not to touch the label on the device
> unless it is actually being created?    

Hmm, keep the device open after (2) ...
Comment 10 Richard W.M. Jones 2010-03-16 06:52:11 EDT
(In reply to comment #9)
> (In reply to comment #8)
> > Can we tell udevd not to touch the label on the device
> > unless it is actually being created?    
> 
> Hmm, keep the device open after (2) ...    

There are two separate and unrelated processes
involved here, so that is not practical.  We need to
have the udevd rules reflect that labels should not
be set unless the device is being created anew.
Comment 11 Daniel Berrange 2010-03-16 06:59:29 EDT
> if a device is closed after being open for a write, udev gets a "change" event,
> and resets all labels, symlinks, etc.    

This is madness. Udev is inferring that 'closed after write' implies the app in question no longer needs/wants the device. This cannot possibly be a safe assumption in the general case, as this BZ report demonstrates.
Comment 12 Harald Hoyer 2010-03-16 12:02:02 EDT
Well, this is because block devices can be formatted, etc. They can get new UUIDs, LABELs, partitions, filesystem types, etc.

Probably owner, group, permissions and selinux context should not be changed on "change" events.
Comment 13 Kay Sievers 2010-03-16 12:57:17 EDT
(In reply to comment #12)
> Well, this is because block devices can be formatted, etc. They can get new
> UUIDs, LABELs, partitions, filesystem types, etc.
> 
> Probably owner, group, permissions and selinux context should not be changed on
> "change" events.    

We don't touch owner, group, mode if they did not change, we only call lsetfilecon().

It's in udev-node.c. We can remove it, if you think we don't need it.
Comment 14 Harald Hoyer 2010-03-16 13:06:06 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Well, this is because block devices can be formatted, etc. They can get new
> > UUIDs, LABELs, partitions, filesystem types, etc.
> > 
> > Probably owner, group, permissions and selinux context should not be changed on
> > "change" events.    
> 
> We don't touch owner, group, mode if they did not change, we only call
> lsetfilecon().
> 
> It's in udev-node.c. We can remove it, if you think we don't need it.    

Hmm, yes, we should not do that for "change" events.
Comment 15 Daniel Berrange 2010-03-16 13:07:10 EDT
> We don't touch owner, group, mode if they did not change, we only call
> lsetfilecon().

That's the crux of the problem - they will have changed in the context of virtual machines. The owner/group will have been changed to 'qemu:qemu' and file context to 'svirt_image_t'. The mode is the only thing that isn't changed, we just assume it was always mode 0700 or 0770, but that is probably a mistake to rely on too.
Comment 16 Kay Sievers 2010-03-16 14:18:24 EDT
(In reply to comment #15)
> > We don't touch owner, group, mode if they did not change, we only call
> > lsetfilecon().
> 
> That's the crux of the problem - they will have changed in the context of
> virtual machines. The owner/group will have been changed to 'qemu:qemu' and
> file context to 'svirt_image_t'. The mode is the only thing that isn't changed,
> we just assume it was always mode 0700 or 0770, but that is probably a mistake
> to rely on too.    

Changing owner, group, mode by external tools is not supported, udev will restore all that to the configured values with the next event, which can happen any time. Such changes needs to happen with udev rules or by applying ACL's.
Comment 17 Kay Sievers 2010-03-16 14:21:20 EDT
(In reply to comment #14)
> > It's in udev-node.c. We can remove it, if you think we don't need it.    
> 
> Hmm, yes, we should not do that for "change" events.    

We do not distinguish between "add" and "change". So we just remove this line?
  http://git.kernel.org/?p=linux/hotplug/udev.git;a=blob;f=udev/udev-node.c;h=4314cceb79893d8f966ff8aa056ea8e7d265e766;hb=HEAD#l58
Comment 18 Harald Hoyer 2010-03-16 14:45:13 EDT
(In reply to comment #17)
> (In reply to comment #14)
> > > It's in udev-node.c. We can remove it, if you think we don't need it.    
> > 
> > Hmm, yes, we should not do that for "change" events.    
> 
> We do not distinguish between "add" and "change". So we just remove this line?
>  
> http://git.kernel.org/?p=linux/hotplug/udev.git;a=blob;f=udev/udev-node.c;h=4314cceb79893d8f966ff8aa056ea8e7d265e766;hb=HEAD#l58    

Hmm, ok, so restorecon needs to be run after manual creation (which is done already anyway) before udev "add" events.
Comment 19 Richard W.M. Jones 2010-03-16 14:56:42 EDT
(In reply to comment #16)
> (In reply to comment #15)
> > > We don't touch owner, group, mode if they did not change, we only call
> > > lsetfilecon().
> > 
> > That's the crux of the problem - they will have changed in the context of
> > virtual machines. The owner/group will have been changed to 'qemu:qemu' and
> > file context to 'svirt_image_t'. The mode is the only thing that isn't changed,
> > we just assume it was always mode 0700 or 0770, but that is probably a mistake
> > to rely on too.    
> 
> Changing owner, group, mode by external tools is not supported, udev will
> restore all that to the configured values with the next event [...]

You mention "owner, group, mode" there.  Does that also include
SELinux label or not?
Comment 20 Daniel Berrange 2010-03-16 15:09:54 EDT
> Changing owner, group, mode by external tools is not supported, udev will
> restore all that to the configured values with the next event, which can happen
> any time. Such changes needs to happen with udev rules or by applying ACL's.    

Farming this out to udev rules is not practical because it means that at every guest startup attempt, or disk hotplug action, we'd need to write out new udev files, busy-wait while the kernel got around to scheduling udev & udev loading & then processing the new udev rules, before libvirt can continue launching the guest / attaching the disk. This kind of approach was done with Xen & was a complete disaster resulting in very unreliable & slow guest launch when the host was under high loads which delayed udevd processing rules promptly enough.  Compare that complexity with the simple chown() call that is currently done. Udev must not randomly change/reset devices which are already present & in active use on the system.
Comment 21 Daniel Walsh 2010-03-16 15:51:00 EDT
From an Accreditation standard this would also be very bad.  If an admin changes the permissions (MAC/DAC) on a device, they had better stay until reboot.
For example if they change a device to Trusted and udev changes it back to system low, that would be very bad.
Comment 22 Kay Sievers 2010-03-16 15:54:19 EDT
(In reply to comment #19)
> > Changing owner, group, mode by external tools is not supported, udev will
> > restore all that to the configured values with the next event [...]
> 
> You mention "owner, group, mode" there.  Does that also include
> SELinux label or not?    

If the primary ownerships configured by udev rules are not touched, these should not be touched too. At least it can be easily fixed to do it that way.
Comment 23 Kay Sievers 2010-03-16 15:56:57 EDT
(In reply to comment #21)
> For example if they change a device to Trusted and udev changes it back to
> system low, that would be very bad.    

Until now, the udev rules are the primary source for that, and no external tool is expected to change stuff behind its back. At least temporary udev rules need to be placed on the system to reflect such intended changes.
Comment 24 Kay Sievers 2010-03-16 16:04:40 EDT
(In reply to comment #20)
>  Compare that complexity with the simple chown() call that is currently done.
> Udev must not randomly change/reset devices which are already present & in
> active use on the system.    

You could still chmod(), but only after you have installed temporary udev rules, which specify the same permissions, so that udev in case of receiving an update event needs what it should do.

Using ACLs, and not changing primary and standardized permissions, if possible, would be generally preferred though.