Bug 851981

Summary: The migration with macvtap network was denied by the target when i set "setenforce 1" in the target
Product: Red Hat Enterprise Linux 6 Reporter: zhenfeng wang <zhwang>
Component: libvirtAssignee: Gunannan Ren <gren>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, berrange, dallan, diego.ml, dwalsh, dyasny, dyuan, eparis, gsun, jdenemar, laine, mzhan, rwu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-5.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:22:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the audit log on the target
none
the audit log when migrate from target to source (target enforce 1)
none
when migrate the guest from target to the source (the source enforce 0) none

Description zhenfeng wang 2012-08-27 08:20:00 UTC
Created attachment 607160 [details]
the audit log on the target

Description of problem:
The migration with bridged network was denied by the target when i set "setenforce 1" in the target

Version-Release number of selected component (if applicable):
 libvirt-0.10.0-0rc1.el6.x86_64
 qemu-kvm-0.12.1.2-2.305.el6.x86_64
 kernel-2.6.32-298.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. set setenforce 1 && virt_use_nfs 1 (on both source and target)
 2 prepare virtual network on both side
 [Source ~]# virsh net-list
 Name                 State      Autostart
 -----------------------------------------
 bridge-net           active     yes
 default              active     yes

 [Target ~]# virsh net-list
 Name                 State      Autostart
 -----------------------------------------
 bridge-net           active     yes
 default              active     yes

2. start a guest with bridged network
 # virsh start testkf8

 [Source ~]# virsh dumpxml testkf8
 ..
  <interface type='network'>
       <mac address='52:54:00:1e:f2:53'/>
       <source network='bridge-net'/>
       <target dev='macvtap0'/>
       <model type='virtio'/>
       <alias name='net0'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
       function='0x0'/>
     </interface>
 ..
3 migrate the guest from the source to the target

 [Source ~]# virsh migrate --live testkf8
 qemu+ssh://10.66.85.232/system --verbose
 root.85.232's password:
 Migration: [ 97 %]error: Requested operation is not valid: domain
 'testkf8' is not processing incoming migration

4 check the audit log in the target

 type=AVC msg=audit(1346051412.352:5399): avc:  denied  { read write }
 for  pid=9365 comm="qemu-kvm" path="/dev/tap54" dev=devtmpfs
 ino=124022 scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023
 tcontext=system_u:object_revice_t:s0 tclass=chr_file
 type=SYSCALL msg=audit(1346051412.352:5399): arch=c000003e syscall=59
 success=yes exit=0 a0=7f7868185230 a1=7f78680067f0 a2=7f78680efef0
 a3=18 items=0 ppid=1 pid=9365 auid=0 uid=107 gid=107 euid=107
 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=519
 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm"
 subj=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 key=(null)
 type=AVC msg=audit(1346051412.443:5400): avc:  denied  { getattr }
 for  pid=9365 comm="qemu-kvm" name="/" dev=0:1b ino=101
 scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023
 tcontext=system_u:object_r:nfs_t:s0 tclass=filesystem
 type=SYSCALL msg=audit(1346051412.443:5400): arch=c000003e
 syscall=138 success=no exit=-13 a0=9 a1=7fffd39ae980 a2=3 a3=48
 items=0 ppid=1 pid=9365 auid=0 uid=107 gid=107 euid=107 suid=107
 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=519
 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm"
 subj=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 key=(null)
  
Actual results:
the migration fail

Expected results:
the migration should be successufll between the source and target

Additional info:

Comment 3 Laine Stump 2012-09-06 14:34:24 UTC
Please attach the output of "virsh net-dumpxml bridge-net" on both machines.

This appears to be not a bridged connection, but a macvtap connection (although the forward mode in the <network> definition may say "bridged", that is a different use of the word "bridge" - it's just saying that forwarding to the rest of the network is handled at L2 rather than with IP routing as in the other forward modes).

Also please try the migration again after running "setenforce permissive" on the destination host, and re-collect the data from audit.log - that way if there are multiple AVCs that need to be taken care of, we can solve them all at once.

Comment 4 Laine Stump 2012-09-06 14:35:26 UTC
BTW, just to verify - the destination and source of the migration are both running the latest versions of everything, correct?

Comment 5 zhenfeng wang 2012-09-07 07:47:21 UTC
Created attachment 610647 [details]
the audit log when migrate from target to source (target enforce 1)

Comment 6 zhenfeng wang 2012-09-07 07:50:03 UTC
the source bridge-netinfo
[source ~]# virsh net-dumpxml bridge-net
<network connections='1'>
  <name>bridge-net</name>
  <uuid>11ad93d5-0edb-3bcc-3925-355c96205b64</uuid>
  <forward dev='eth0' mode='bridge'>
    <interface dev='eth0' connections='1'/>
  </forward>
</network>

the target bridge-netinfo
[target ~]# virsh net-dumpxml bridge-net
<network>
  <name>bridge-net</name>
  <uuid>62681045-178d-0eef-1016-8e2f37d23b26</uuid>
  <forward dev='eth0' mode='bridge'>
    <interface dev='eth0'/>
  </forward>
</network>


I re-test it in the previous versions ,i can only reproduce this problem when i migrate the guest from the target to the source

# virsh migrate --live testkf8 qemu+ssh://10.66.82.125/system --verbose
root.82.125's password: 
Migration: [ 95 %]error: Requested operation is not valid: domain 'testkf8' is not processing incoming migration

but I can only see the first avc:type=AVC msg=audit(1346051412.352:5399): avc:  denied  { read write }

can't see the second avc type=AVC msg=audit(1346051412.443:5400): avc:  denied  { getattr } in the audit.log

Comment 7 zhenfeng wang 2012-09-07 07:54:30 UTC
Created attachment 610648 [details]
when migrate the guest from target to the source  (the source enforce 0)

Comment 8 zhenfeng wang 2012-09-07 08:05:22 UTC
I can reproduce the problem in the latest pkg
libvirt-0.10.1-1.el6.x86_64

Comment 9 Gunannan Ren 2012-10-10 09:54:08 UTC
When using macvtap, a character device gets created by udev with name /dev/tapN. According to selinux rule, the selinux context of /dev/tapN should automatically be

"system_u:object_r:tun_tap_device_t"

For example to create a macvtap interface by hand
 # ip link add link eth0 name macvtap0 type macvtap
list the selinux context for the corresponding character device:
ls -Z /dev/tap35
crw-rw----. root root system_u:object_r:tun_tap_device_t:s0 /dev/tap35

In this case, when the migration is near to the end, the selinux reports denied error which shows the selinux context of /dev/tapN is 

"system_u:object_r:device_t" 
Note: the object type changed from tun_tap_device_t to device_t
this is the reason of failure.

type=AVC msg=audit(1349858424.233:42507): avc:  denied  { read write } for  pid=19926 comm="qemu-kvm" path="/dev/tap33" dev=devtmpfs ino=131524 scontext=unconfined_u:system_r:svirt_t:s0:c598,c908 tcontext=system_u:object_r:device_t:s0 tclass=chr_file

the AVC message show, the qemu-kvm process have not read/write permission on
character device /dev/tap33 with its context.

I couldn't reproduce this problem on Fedora17.
I compiled the newest code on RHEL6 on reporter's machine, the problem only happend when the guest is migrated back to source machine.

Comment 10 Dave Allan 2012-10-10 12:59:49 UTC
(In reply to comment #9)
> For example to create a macvtap interface by hand
>  # ip link add link eth0 name macvtap0 type macvtap
> list the selinux context for the corresponding character device:
> ls -Z /dev/tap35
> crw-rw----. root root system_u:object_r:tun_tap_device_t:s0 /dev/tap35

Users should not be creating macvtap devices by hand.  If reproducing this behavior requires doing so this BZ should be closed as NOTABUG.

Comment 11 Gunannan Ren 2012-10-10 13:55:23 UTC
no, it is not necessary to create macvtap device by hand, I just take this as an example with right selinux context.

I suspect there is probably an race. when I add usleep(5 * 1000000) after creating macvtap by virNetDevMacVLanCreate(), this problem disappeared.

The bug alwasy occurs when the migration has completed 98/99 hundred percent.
Maybe we changed something in "Finish" phase of migration, I know little about
macvtap code. See if Laine could help out.

Comment 12 Laine Stump 2012-10-10 15:30:54 UTC
I was just looking for the other bug about macvtap that seemed to be timing related, but it seems to be a different situation (Bug 786648) - in that case a macvtap device which was opened by libvirt and apparently functioning properly, was seen by qemu as an invalid file descriptor (so, no AVCs involved).

I think in this case you need to concentrate on the difference in the sequence/timing of netdev initialization for a migration startup vs. a standard cold startup. Perhaps when it's a migration, we're completely missing a codepath, or maybe it's being done by another thread or something. Sorry, that's as much as I can say right now.

Comment 13 Laine Stump 2012-10-10 16:48:42 UTC
I just did a simple test and do see that when the macvtap device is first created, it is system_u:object_r:device_t:s0, and is very shortly afterward changed to system_u:object_r:tun_tap_device_t:so.

According to a discussion on IRC just now, this is because udev is changing the context as soon as its notified of the creation of the device, and what needs to happen is for the device to instead be created directly with the correct context, i.e. probably an selinux policy change.

Comment 14 Eric Paris 2012-10-10 17:07:33 UTC
So in F17+ we have rules in selinux policy that look like this:

type_transition virtd_t device_t : chr_file tun_tap_device_t "tap1"; 

This says that if virtd_t creates a file in a directory labeled device_t with the name "tap1" it should be labeled tun_tap_device_t.  So things should 'just work' in Fedora.  The problem is this only works for files tun0-tun29.  After that Fedora will have this same problem.  RHEL cannot handle these rule types at all.

So what is most likely happening is that udev get notification when new files are created in /dev or when a file is closed by the last process.  It will then jump in and relabel that file back to the expected default.  This is why it works with a delay.

So to fix both RHEL and Fedora for tun devices >29, libvirt is going to need to do it.  I would suggest libirt load the default file context database.

matchpathcon_init()

ask what the label on the new file should be

matchpathcon("tun0", S_IFCHR, &context);

then tell the kernel that it should use this context for the next file created

setfscreatecon(context);

Do the creation of the tun file

[whatever]

The unset the fscreatecon

setfscreatecon(NULL)

Then clean up the matchpathcon database:

matchpathcon_fini()


I'd actually suggest loading the DB once at libvirt start and just reusing it.  Maybe you guys already load it, I don't know.  But that should solve this issue properly...

Comment 15 Laine Stump 2012-10-10 18:15:46 UTC
(In an irc exchange, Eric himself prefaced his description of what needs to be done with the phrase "ugly hack"...)

Thanks for the analysis/description, Eric.

So, basically we need to cover up for the fact that the device that's supposed to be created having selinux context "x" instead temporarily has selinux context "y", even after we've gotten a return from netlink telling us that the device was successfully created and is ready for use - the creation isn't yet complete, though, and it's *not* ready to use.

The entire concept that the selinux policy can/should be set asynchronously by udev at some later time after the device's creation seems flawed, and this seems like a problem that will bite other programs, not just libvirt. Is a better/permanent fix likely?

The question that remains is how to fit the above into libvirt's selinux security driver model.

Comment 16 Gunannan Ren 2012-10-11 08:12:19 UTC
(In reply to comment #14) 
> 
> I'd actually suggest loading the DB once at libvirt start and just reusing
> it.  Maybe you guys already load it, I don't know.  But that should solve
> this issue properly...

Thanks for your solution.
There are two ways in my mind to fix it. One is to load selinux file context DB, libvirt doesn't load selinux DB right now. This would be relatively big move on selinux driver API. This way means we are helping udev with labeling work.

The other is to check for the right selinux context before going on.
In virNetDevMacVLanTapOpen(), add the checking int the following loop code.
This is not perfect I have to admit. This way still leaves the relabeling
work to udev.

    while (1) {
        /* may need to wait for udev to be done */
        tapfd = open(tapname, O_RDWR);
        if (tapfd < 0 && retries > 0) {
            retries--;
            usleep(20000);
            continue;
        }
        break;
    }

Actually The following type rules are perfect solution.

type_transition virtd_t device_t : chr_file tun_tap_device_t "tap1"; 

As we talked in IRC, set needinfo to Daniel for advice.

Comment 17 Daniel Berrangé 2012-10-11 09:43:40 UTC
We don't want SELinux code in the tap device creation code. We need to add support to the security driver framework to let us pass in a list of TAP device FDs (or maybe filenames?), and then relabel them at the same place we do all other relabelling

Comment 18 Gunannan Ren 2012-10-11 14:02:41 UTC
(In reply to comment #17)
> We don't want SELinux code in the tap device creation code. We need to add
> support to the security driver framework to let us pass in a list of TAP
> device FDs (or maybe filenames?), and then relabel them at the same place we
> do all other relabelling

okay, the fsetfilecon() should work well, so we need to add a new security
driver function to relabel TAP fd(tapfd in code) just after we get it in
virNetDevMacVLanTapOpen().
I will try to write a patch for it.
Thanks for these help from all guys here.

Comment 19 Daniel Walsh 2012-10-11 15:55:29 UTC
Yes allowing svirt_t to write to device_t or to tun_tap_device_t:s0 is not what we want. It has to be labeled tun_tap_device_t:MCS label.

Comment 20 Gunannan Ren 2012-10-11 16:17:33 UTC
(In reply to comment #19)
> Yes allowing svirt_t to write to device_t or to tun_tap_device_t:s0 is not
> what we want. It has to be labeled tun_tap_device_t:MCS label.

We could get MCS value as the same as svirt_t process and combine it with tun_tap_device_t to form the right label, is it right?
If so, I don't know if udev are going to relable it back to the label(tun_tap_device_t:s0) which is from file context database when udev get the notification of the creation of /dev/tapN.

Comment 21 Daniel Walsh 2012-10-11 17:18:00 UTC
If it does we would need to handle this with udev.  IE Udev should not be changing labels of devices with the correct type.

Comment 22 Gunannan Ren 2012-10-15 07:13:51 UTC
Patch sent to upstream for review.
https://www.redhat.com/archives/libvir-list/2012-October/msg00675.html

Comment 23 Daniel Berrangé 2012-10-15 08:55:07 UTC
(In reply to comment #19)
> Yes allowing svirt_t to write to device_t or to tun_tap_device_t:s0 is not
> what we want. It has to be labeled tun_tap_device_t:MCS label.

That doesn't make any sense. We have *never* labelled TAP devices with the MCS label and it has always worked just fine thus far, so the current policy must already be allowing access to the  tun_tap_device_t:s0.

Comment 24 Gunannan Ren 2012-10-15 11:49:57 UTC
my patch is trying to label the tapfd with MCS like
system_u:object_r:tun_tap_device_t:s0:c586,c965

the default context is
system_u:object_r:tun_tap_device_t:s0

Both of them are working fine. Need to nail it down :)

Comment 25 Daniel Walsh 2012-10-15 13:31:12 UTC
Dan, yes we allow the acess now, but there is nothing preventing one svirt_t from grabbing another svirt_t tun_tap_device.  If this is not a potential problem, then we can continue with what we have, and try  to fix the race condition on the tun_tap_device_t > 29 that is happening now.

Comment 30 Eric Paris 2012-10-15 18:37:39 UTC
In response to comment #25.  In RHEL6, we have the race for all of them, not just >29.

Comment 31 Eric Paris 2012-10-16 18:24:51 UTC
Is this relabeling both /dev/net/tun and /dev/net/tunX   ?

eblake just reported an SELinux AVC:

type=AVC msg=audit(1350411375.542:759325): avc:  denied  { read } for  pid=4773 comm="openvpn" path="/dev/net/tun" dev=devtmpfs ino=9557 scontext=system_u:system_r:openvpn_t:s0 tcontext=system_u:object_r:tun_tap_device_t:s0:c1,c489 tclass=chr_file

Which looks like /dev/net/tun has an MCS category.  We really only need to label the /dev/net/tunX chr file which is created, not /dev/net/tun itself....

Comment 32 Eric Blake 2012-10-16 18:31:59 UTC
The 3 patches at http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg00852.html are insufficient, as they introduce a NEW problem - libvirtd would now be assigning a label to /dev/net/tun, which makes it impossible for anyone else (like openvpn) to also open a tun device.  Libvirt should only be labeling /dev/net/tunX.

Comment 33 Gunannan Ren 2012-10-17 03:42:19 UTC
Sorry this a big mistake I made, I generated patch on testing machine after testing , but I made some modification on patch on another testing machine.

the patch should relabel "tapfd" of virtual network of type VIR_DOMAIN_NET_TYPE_DIRECT
rather than
VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE

Libvirt should be just labeling fd from /dev/tapN.
Patch sent out already.

Comment 35 Jiri Denemark 2012-10-23 10:46:29 UTC
Final version of the patchset: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg01056.html

Comment 37 zhenfeng wang 2012-10-26 08:48:06 UTC
Verified this bug in the new  pkgs
# rpm -qa libvirt qemu-kvm kernel
libvirt-0.10.2-5.el6.x86_64
qemu-kvm-0.12.1.2-2.330.el6.x86_64
kernel-2.6.32-309.el6.x86_64
steps
1. set setenforce 1 && virt_use_nfs 1 (on both source and target)
 2 prepare virtual network on both side
 [Source ~]# virsh net-list
 Name                 State      Autostart
 -----------------------------------------
 bridge-net           active     yes
 default              active     yes

 [Target ~]# virsh net-list
 Name                 State      Autostart
 -----------------------------------------
 bridge-net           active     yes
 default              active     yes

2. start a guest with bridged network
 # virsh start testkf8

 [Source ~]# virsh dumpxml testkf8
 ..
  <interface type='network'>
       <mac address='52:54:00:1e:f2:53'/>
       <source network='bridge-net'/>
       <target dev='macvtap0'/>
       <model type='virtio'/>
       <alias name='net0'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
       function='0x0'/>
     </interface>
3 migrate the guest from the source to the target and from the target to the source ,and the guest can be migrated successfully in the two direction. and the lable was correct in the dev folder
# ls -Z /dev/tap17 
crw-rw----. root root system_u:object_r:tun_tap_device_t:s0:c723,c762 /dev/tap17
There was not any avc denied information in the audit log . besides, I can reproduce this bug in the pkg 
libvirt-0.10.1-1.el6.x86_64,so this is fixed.

Comment 38 errata-xmlrpc 2013-02-21 07:22:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html