Bug 1013309

Summary: Add support for MACVLAN_FLAG_NOPROMISC when creating macvtap interfaces
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Qian Guo <qiguo>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DEFERRED QA Contact: Luyao Huang <lhuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: ailan, dyuan, gsun, jdenemar, jsuchane, jtomko, juzhang, laine, mst, mzhan, psutter, xuzhang, yalzhang
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1013312 1013584 (view as bug list) Environment:
Last Closed: 2020-03-18 10:13:06 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1013312    

Description Qian Guo 2013-09-29 08:08:08 UTC
Description of problem:
If create a passthru mode macvtap interface, the system will set the physical ineterface promiscuous by default.

Version-Release number of selected component (if applicable):
# uname -r
3.10.0-29.el7.x86_64
# rpm -q iptables 
iptables-1.4.19.1-1.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Create a passthru mode macvtap interface, and set it up.
# ip link add link p4096p4 name ptru1 type macvtap mode passthru
# ip link set ptru1 up

2.Check in dmesg

3.

Actual results:
dmesg shows:
device p4096p4 entered promiscuous mode

Expected results:
physical interface should not entered promiscuous mode.

Additional info:

Comment 2 Qian Guo 2013-09-29 09:08:06 UTC
Test w/ virt-manager, physical interface entered promiscuous mode.
Aattach the xml infos

# virsh dumpxml rhel7
<domain type='kvm' id='5'>
  <name>rhel7</name>
  <uuid>7fd345c7-9e97-44c3-9c7f-0980b22cff30</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/home/rhel7cp1.qcow2_v3'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <interface type='direct'>
      <mac address='52:54:00:f1:3b:75'/>
      <source dev='p4096p4' mode='passthrough'/>
      <target dev='macvtap0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/5'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/5'>
      <source path='/dev/pts/5'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <graphics type='spice' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <sound model='ich6'>
      <alias name='sound0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c442,c719</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c442,c719</imagelabel>
  </seclabel>
</domain>

Comment 3 Jiri Denemark 2013-09-30 11:31:55 UTC
And how is this libvirt related? As you said in bug description, libvirt is not needed to reproduce this issue. Also, iptables have nothing to do with this, the "ip" command comes from iproute package.

Anyway, with my knowledge of networking, I'd say this is inevitable since the physical network interface needs to be able to catch frames addressed to several different MAC addresses. However, I have to admit I haven't really followed all the new virt-related networking techniques and thus I'm moving this bug to kernel component to decide whether this is a bug or not (I guess it isn't).

Comment 4 Michael S. Tsirkin 2013-09-30 11:54:55 UTC
It's not inevitable, and it's not a kernel bug.

macvlans have a flag MACVLAN_FLAG_NOPROMISC which let users
disable promisc in passthrough mode: default can
not be changed without breaking ABIs.

iproute does not seem to know how to activate this flag, and neither does libvirt.
they both should learn to set this flag.

so moving back to libvirt, and will clone to iproute.

Comment 5 Michael S. Tsirkin 2013-09-30 12:04:33 UTC
note: libvirt will need to program mac addresses for it to be really useful

Comment 7 Laine Stump 2014-11-13 18:18:50 UTC
Reassigning this to me, as I recently did some related work (Bug 848199)

Comment 8 Laine Stump 2015-03-16 13:49:51 UTC
Micheal,

 I have a few questions about MACVLAN_FLAG_NOPROMISC:

1) Is it only for passthru mode, or is it also useful for other modes when we are monitoring guest rxfilter events and updating the macvtap device mac address and multicast list appropriately?

2) If the answer to (1) is "works for other modes too", does the physdev promiscuous mode get turned back on if one of the attached macvtaps is put into promiscuous mode? Or would that lead to a guest in promiscuous mode that wasn't really seeing all the traffic?

3) I see the #define for the flag in if_link.h, and the enum for the attribute IFLA_MACVLAN_FLAGS, but there is no corresponding struct ifla_macvlan_flags. So where does attribute go in the netlink packet? Is it a separate attribute like ifla_vlan_flags? Or does it go into the ifinfomsg.ifi_flags (doesn't make sense, as there would then be no use for the IFLA_MACVLAN_FLAGS attribute type) Or is it a simple u32 attribute?

Comment 9 Vlad Yasevich 2015-03-19 15:06:53 UTC
(In reply to Laine Stump from comment #8)
> Micheal,
> 
>  I have a few questions about MACVLAN_FLAG_NOPROMISC:
> 
> 1) Is it only for passthru mode, or is it also useful for other modes when
> we are monitoring guest rxfilter events and updating the macvtap device mac
> address and multicast list appropriately?

This particular flag is only affects passthru devices, so you can only set it then.  This also appears to be the only way to control the promiscuity setting
of the lower level device through the macv* device (which is a bit odd).  

As such, I can think 2 possible scenarios:
 1) User explicitely specify "nopromisc" in some xml setting.  This might
    be problematic as it would allow the user to shoot him/her self in the foot
    by creating a possibly broken configuration.

 2) Allow this setting to take place only when we trust and apply the guest
    rxfilter settings.  This would allow non-promisc passthru devices to
    exist as long as all of guest filters are passed through to device so
    that the guest would receive all the traffic it is interested in.
   
> 
> 2) If the answer to (1) is "works for other modes too", does the physdev
> promiscuous mode get turned back on if one of the attached macvtaps is put
> into promiscuous mode? Or would that lead to a guest in promiscuous mode
> that wasn't really seeing all the traffic?
> 
> 3) I see the #define for the flag in if_link.h, and the enum for the
> attribute IFLA_MACVLAN_FLAGS, but there is no corresponding struct
> ifla_macvlan_flags. So where does attribute go in the netlink packet? Is it
> a separate attribute like ifla_vlan_flags? Or does it go into the
> ifinfomsg.ifi_flags (doesn't make sense, as there would then be no use for
> the IFLA_MACVLAN_FLAGS attribute type) Or is it a simple u32 attribute?

It's just a u16 attribute.

-vlad

Comment 10 Laine Stump 2015-03-19 15:29:06 UTC
(In reply to Vlad Yasevich from comment #9)

> 
> This particular flag is only affects passthru devices, so you can only set
> it then.  This also appears to be the only way to control the promiscuity
> setting
> of the lower level device through the macv* device (which is a bit odd).  

What if you set NOPROMISC when the macvtap device is created, and the guest later sets promisc mode (which libvirt responds to by setting IFF_PROMISC on the macvtap interface, using ioctl(SOICSIFFLAGS))? Would the physdev correctly switch to promisc mode? Or would this be a broken config? (same question for what happens when the guest later turns *off* promisc mode)

Comment 11 Vlad Yasevich 2015-03-19 15:45:48 UTC
(In reply to Laine Stump from comment #10)
> (In reply to Vlad Yasevich from comment #9)
> 
> > 
> > This particular flag is only affects passthru devices, so you can only set
> > it then.  This also appears to be the only way to control the promiscuity
> > setting
> > of the lower level device through the macv* device (which is a bit odd).  
> 
> What if you set NOPROMISC when the macvtap device is created, and the guest
> later sets promisc mode (which libvirt responds to by setting IFF_PROMISC on
> the macvtap interface, using ioctl(SOICSIFFLAGS))? Would the physdev
> correctly switch to promisc mode? Or would this be a broken config? (same
> question for what happens when the guest later turns *off* promisc mode)

This is what I was referring to by my comment above.  It would appear
that setting IFF_PROMISC on the macvtap device does not change the promiscuity
setting on the physical device.  The only way to affect change on
the physical device would be:
  1) Directly control the setting on the physical device.
  2) Use the nopromisc option only on passthru macv* devices.

I am looking to see if this is a bug or oversight..

-vlad

Comment 12 Laine Stump 2015-03-25 15:50:13 UTC
Okay. To get back to basics, there are 3 devices involved:

G - guest emulated ethernet device
M - macvtap device on the host
P - physical device on the host

(assuming passthrough mode in the following discussion)

If we don't set MACVLAN_FLAG_NOPROMISC when M is created, then P and M will just always be IFF_PROMISC. Would it do any good to change them both at some later time (using a separate ioctl(SOICSIFFLAGS)?

If we *do* set MACVLAN_FLAG_NOPROMISC when M is created, then we can (and do, if configured) follow the IFF_PROMISC flag from G by paying attention to RX_FILTER_CHANGED. Currently we set M to mirror the state of this flag from G. Is it true that we should *also* set P to mirror the state of the flag, and is that enough for proper operation?

Comment 13 Vlad Yasevich 2015-03-25 17:02:50 UTC
(In reply to Laine Stump from comment #12)
> Okay. To get back to basics, there are 3 devices involved:
> 
> G - guest emulated ethernet device
> M - macvtap device on the host
> P - physical device on the host
> 
> (assuming passthrough mode in the following discussion)
> 
> If we don't set MACVLAN_FLAG_NOPROMISC when M is created, then P and M will
> just always be IFF_PROMISC. Would it do any good to change them both at some
> later time (using a separate ioctl(SOICSIFFLAGS)?

I am not sure what you mean by "would it do any good"...  If you are asking if
we could enable promiscuity using the ioctls, then yes, absolutely, you can
do that and it would sort of emulate setting of MACVLAN_FLAG_NOPROMISC.

> 
> If we *do* set MACVLAN_FLAG_NOPROMISC when M is created, then we can (and
> do, if configured) follow the IFF_PROMISC flag from G by paying attention to
> RX_FILTER_CHANGED. Currently we set M to mirror the state of this flag from
> G. Is it true that we should *also* set P to mirror the state of the flag,
> and is that enough for proper operation?

Yes, that should be enough.  Mirroring the flag to M allows macvtap to properly
forward multicast traffic.  Mirroring the flag P would allow the hw to actually
receive said traffic and pass it to M.

I think though that we should fix macvtap driver to properly propagate flag changes so that we don't have do the workaround.   I am testing patches to do
just that.

-vlad

Comment 19 Phil Sutter 2016-01-25 21:32:33 UTC
I don't see how this should block bug 1013584 as iproute does not depend on libvirt in any way. Therefore removing the probably incorrect dependency.

Comment 20 Laine Stump 2016-01-26 01:45:26 UTC
Yep, that was just an artifact of Bug 1013584 being created by cloning this bug - a newly cloned bug is automatically marked as dependent on the bug it was cloned from, which in my experience is wrong much more than half the time, but it's easy to forget to remove the dependency before hitting save...

Comment 21 Phil Sutter 2016-01-26 13:21:49 UTC
Hi Laine,

(In reply to Laine Stump from comment #20)
> Yep, that was just an artifact of Bug 1013584 being created by cloning this
> bug - a newly cloned bug is automatically marked as dependent on the bug it
> was cloned from, which in my experience is wrong much more than half the
> time, but it's easy to forget to remove the dependency before hitting save...

Indeed, but in my opinion wrong dependencies are still better than none at all, as the information where a ticket originated from is often vital. Sure, failure to fix this up is one of the likely pitfalls of using the system. :)

Comment 26 Laine Stump 2020-02-10 15:18:26 UTC
Bug 1727263 is adding support for a Linux host bridge option BR_ISOLATED, which can be applied to the bridge port of a guest's conventional) tap device. This is a completely unrelated thing, except that it's "an option to be applied to a port", and so possibly could live in the same config space.

Comment 27 Jaroslav Suchanek 2020-03-18 10:13:06 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.