Bug 1608521 - Backport libvirt virt dirver vrouter multiqueue support
Summary: Backport libvirt virt dirver vrouter multiqueue support
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z10
: 10.0 (Newton)
Assignee: Artom Lifshitz
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-25 17:08 UTC by Andreas Karis
Modified: 2023-03-21 18:57 UTC (History)
17 users (show)

Fixed In Version: openstack-nova-14.1.0-35.el7ost
Doc Type: Enhancement
Doc Text:
This update introduces vrouter multi-queue support required by a Juniper plug-in. One of Juniper's plug-ins relies on nova to create interfaces with the correct mode - multi-queue or single-queue. vrouter VIFs (OpenContrail) now support multi-queue mode, which allows network performance to scale across multiple vCPUs. To use this feature, create an instance with more than one vCPU from an image that has the `hw_vif_multiqueue_enabled` property set to `true`.
Clone Of:
Environment:
Last Closed: 2019-01-16 17:09:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11496 0 None None None 2021-12-10 16:53:07 UTC
Red Hat Knowledge Base (Solution) 3544081 0 None None None 2018-10-02 14:33:56 UTC
Red Hat Product Errata RHBA-2019:0074 0 None None None 2019-01-16 17:09:13 UTC

Description Andreas Karis 2018-07-25 17:08:20 UTC
Description of problem:

One of Juniper's plugins relies on nova to create interfaces with the correct mode - multi-queue or single-queue.

Can we backport nova networking - multiqueue feature + the respective code within vrouter create?

For Juniper's Contrail implementation, called from https://github.com/openstack/nova/blob/newton-eol/nova/virt/libvirt/vif.py#L749

Compare upstream Newton:
/openstack/nova/blob/newton-eol/nova/network/linux_net.py#L1314
~~~
(...)
def create_tap_dev(dev, mac_address=None):
(...)
~~~

To upstream master:
https://github.com/openstack/nova/blob/master/nova/network/linux_utils.py#L74
~~~
def create_tap_dev(dev, mac_address=None, multiqueue=False):
~~~

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L757
~~~
            multiqueue = self._is_multiqueue_enabled(instance.image_meta,
                                                     instance.flavor)
            linux_net_utils.create_tap_dev(dev, multiqueue=multiqueue)
~~~

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L192
~~~
    def _is_multiqueue_enabled(self, image_meta, flavor):
        _, vhost_queues = self._get_virtio_mq_settings(image_meta, flavor)
        return vhost_queues > 1 if vhost_queues is not None else False

    def _get_virtio_mq_settings(self, image_meta, flavor):
        """A methods to set the number of virtio queues,
           if it has been requested in extra specs.
        """
        driver = None
        vhost_queues = None
        if not isinstance(image_meta, objects.ImageMeta):
            image_meta = objects.ImageMeta.from_dict(image_meta)
        img_props = image_meta.properties
        if img_props.get('hw_vif_multiqueue_enabled'):
            driver = 'vhost'
            max_tap_queues = self._get_max_tap_queues()
            if max_tap_queues:
                vhost_queues = (max_tap_queues if flavor.vcpus > max_tap_queues
                    else flavor.vcpus)
            else:
                vhost_queues = flavor.vcpus

        return (driver, vhost_queues)

    def _get_max_tap_queues(self):
        # NOTE(kengo.sakai): In kernels prior to 3.0,
        # multiple queues on a tap interface is not supported.
        # In kernels 3.x, the number of queues on a tap interface
        # is limited to 8. From 4.0, the number is 256.
        # See: https://bugs.launchpad.net/nova/+bug/1570631
        kernel_version = int(os.uname()[2].split(".")[0])
        if kernel_version <= 2:
            return 1
        elif kernel_version == 3:
            return 8
        elif kernel_version == 4:
            return 256
        else:
            return None
~~~

Comment 1 Andreas Karis 2018-07-25 17:08:43 UTC
We need to ensure a few things:
* MQ support in `create_tap_dev`
* enable `multiqueue` flag in all neccessary places in `vif.py` (probably not only `plug_vrouter`)
* update nova/rootwrap.d/compute.filter / nova/rootwrap.d/network.filters (???)

Comment 2 Andreas Karis 2018-07-25 17:10:25 UTC
To give some more background: if nova does set up the tap interface as single queue, then libvirt / qemu will not start:
~~~
error: Failed to start domain instance-00000008
error: Unable to create tap device tap0003: Invalid argument
~~~

The reason is that the instance XML is created with e.g.:
~~~
 <interface type='ethernet'>
+      <mac address='02:d9:f2:fb:00:03'/>
+      <target dev='tap0003'/>
+      <model type='virtio'/>
+      <driver name='vhost' queues='2'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
     </interface>
~~~

However, nova prior to this creates that tap interface in mode single queue. This can be reproduced manually:

I modified an instance that was booted with nova and added a second interface to it.

## Test 1 ##

Just creating a multi queue ethernet port:
~~~
[root@overcloud-compute-1 ~]# diff -u instance-00000008.xml instance-00000008.ethernet.xml
--- instance-00000008.xml	2018-07-25 15:36:57.435311864 +0000
+++ instance-00000008.ethernet.xml	2018-07-25 15:44:54.886835979 +0000
@@ -75,22 +75,22 @@
       <target dev='tapb87f9248-45'/>
       <model type='virtio'/>
       <driver name='vhost' queues='2'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
+    </interface>
+    <interface type='ethernet'>
+      <mac address='02:d9:f2:fb:00:03'/>
+      <target dev='tap0003'/>
+      <model type='virtio'/>
+      <driver name='vhost' queues='2'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
     </interface>
-    <serial type='file'>
-      <source path='/var/lib/nova/instances/f00c48ca-6824-4585-a4b0-4aede96438d7/console.log'/>
-      <target type='isa-serial' port='0'>
-        <model name='isa-serial'/>
-      </target>
-    </serial>
     <serial type='pty'>
       <target type='isa-serial' port='1'>
         <model name='isa-serial'/>
       </target>
     </serial>
-    <console type='file'>
-      <source path='/var/lib/nova/instances/f00c48ca-6824-4585-a4b0-4aede96438d7/console.log'/>
-      <target type='serial' port='0'/>
+    <console type='pty'>
+      <target type='serial' port='1'/>
     </console>
     <input type='tablet' bus='usb'>
       <address type='usb' bus='0' port='1'/>
~~~


The below works, as seen from the instance:
~~~
[root@rhel-test2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc mq state UP qlen 1000
    link/ether fa:16:3e:c5:20:ba brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.11/24 brd 192.168.0.255 scope global dynamic eth0
       valid_lft 86373sec preferred_lft 86373sec
    inet6 2000:192:168:1:f816:3eff:fec5:20ba/64 scope global noprefixroute dynamic 
       valid_lft 86376sec preferred_lft 14376sec
    inet6 fe80::f816:3eff:fec5:20ba/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 02:d9:f2:fb:00:03 brd ff:ff:ff:ff:ff:ff
[root@rhel-test2 ~]# ethtool -l eth1
Channel parameters for eth1:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	2
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	1
~~~

And as seen from the hypervisor:
~~~
[root@overcloud-compute-1 ~]# ip -d link ls tap0003
30: tap0003: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether fe:d9:f2:fb:00:03 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    tun addrgenmode eui64 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535 
~~~

## Test 2 ##

Same XML definition as above.
~~~
[root@overcloud-compute-1 ~]# ip tuntap add tap0003 mode tap
[root@overcloud-compute-1 ~]# ip link set tap0003 up
[root@overcloud-compute-1 ~]# virsh start instance-00000008
error: Failed to start domain instance-00000008
error: Unable to create tap device tap0003: Invalid argument
~~~

## Test 3 ##

~~~
[root@overcloud-compute-1 ~]# ip tuntap add tap0003 mode tap multi_queue
[root@overcloud-compute-1 ~]# ip link set dev tap0003 up
[root@overcloud-compute-1 ~]# virsh start instance-00000008
Domain instance-00000008 started
~~~

## Conclusion ##

tuntap interfaces cannot change their type from single queue to multi-queue after creation. Because you predefine the tap interface as single-queue, libvirt cannot create it as type multiqueue and fails. You either need to pre-create the interface with the `multi_queue` (`mq`) flag set, or not pre-create the tap interface at all because libvirt will create it for you.

Comment 3 Andreas Karis 2018-07-25 17:35:24 UTC
To finalize the theoretical part of this, I created the following small C binary:
~~~
/*
  Author: akaris
  Taking some inspiration from https://www.kernel.org/doc/Documentation/networking/tuntap.txt
  Copy this code into taptest.c
  Compile and run:
    gcc taptest.c -o taptest
    ./taptest <interface name> <mode>    # mode = [ sq | mq]
*/

#include <fcntl.h>
#include <string.h> /* memset */
#include <unistd.h> /* close */
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/socket.h> 
#include <linux/if.h>
#include <linux/if_tun.h>
#include <string.h>

int tun_alloc(char *dev)
{
    struct ifreq ifr;
    int fd, err;

    if( (fd = open("/dev/net/tun", O_RDWR)) < 0 ) {
      printf("Cannot open /dev/net/tun\n");
      exit(1);
    }

    memset(&ifr, 0, sizeof(ifr));

    /* Flags: IFF_TUN   - TUN device (no Ethernet headers) 
     *        IFF_TAP   - TAP device  
     *
     *        IFF_NO_PI - Do not provide packet information  
     */ 
    ifr.ifr_flags = IFF_TAP; 
    if( *dev )
       strncpy(ifr.ifr_name, dev, IFNAMSIZ);

    if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ){
       close(fd);
       return err;
    }
    strcpy(dev, ifr.ifr_name);
    return fd;
}

int tun_alloc_mq(char *dev, int queues, int *fds)
{
    struct ifreq ifr;
    int fd, err, i;

    if (!dev)
        return -1;

    memset(&ifr, 0, sizeof(ifr));
    /* Flags: IFF_TUN   - TUN device (no Ethernet headers)
     *        IFF_TAP   - TAP device
     *
     *        IFF_NO_PI - Do not provide packet information
     *        IFF_MULTI_QUEUE - Create a queue of multiqueue device
     */
    ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_MULTI_QUEUE;
    strcpy(ifr.ifr_name, dev);

    for (i = 0; i < queues; i++) {
        if ((fd = open("/dev/net/tun", O_RDWR)) < 0)
           goto err;
        err = ioctl(fd, TUNSETIFF, (void *)&ifr);
        if (err) {
           close(fd);
           goto err;
        }
        fds[i] = fd;
    }

    return 0;
err:
    for (--i; i >= 0; i--)
        close(fds[i]);
    return err;
}

int main(int argc, char **argv) {
  if(argc > 1) {
    char * device_name = argv[1];
    char * mode = "sq";
    if(argc > 2) {
        mode = argv[2];
    }

    if(!strcmp("sq",mode)) {
        int fd = tun_alloc(device_name);
        if(fd < 0) {
            printf("Cannot create tunnel %s\n",device_name);
            exit(1);
        }
        printf("FD is %d\n",fd);
        sleep(300);
    } else if(!strcmp("mq",mode)) {
        int fds[2];
        int ret_val = tun_alloc_mq(device_name,2,fds);
        if(ret_val < 0) {
            printf("Cannot create tunnel %s\n",device_name);
            exit(1);
        }
        printf("Multiqueu FDs are %d,%d\n",fds[0],fds[1]);
        sleep(300);
    } else {
        printf("Mode '%s' is not supported\n",mode);
    }

  } else { 
      printf("Please provide the name of the tunnel interface to be created\n");
      exit(1);
  }
}
~~~

Test:
~~~
[root@overcloud-compute-1 ~]# ip tuntap add sq mode tap
[root@overcloud-compute-1 ~]# ip tuntap add mq mode tap multi_queue
[root@overcloud-compute-1 ~]# ./taptest sq sq
FD is 3
[root@overcloud-compute-1 ~]# ./taptest sq mq
Cannot create tunnel sq
[root@overcloud-compute-1 ~]# ./taptest mq sq
Cannot create tunnel mq
[root@overcloud-compute-1 ~]# ./taptest mq mq
Multiqueu FDs are 3,4
~~~

Comment 5 Andreas Karis 2018-07-25 18:28:48 UTC
It's *not* in OSP 10 downstream:
~~~
 799     def plug_vrouter(self, instance, vif):
 800         """Plug into Contrail's network port
 801 
 802         Bind the vif to a Contrail virtual port.
 803         """
 804         dev = self.get_vif_devname(vif)
 805         ip_addr = '0.0.0.0'
 806         ip6_addr = None
 807         subnets = vif['network']['subnets']
 808         for subnet in subnets:
 809             if not subnet['ips']:
 810                 continue
 811             ips = subnet['ips'][0]
 812             if not ips['address']:
 813                 continue
 814             if (ips['version'] == 4):
 815                 if ips['address'] is not None:
 816                     ip_addr = ips['address']
 817             if (ips['version'] == 6):
 818                 if ips['address'] is not None:
 819                     ip6_addr = ips['address']
 820 
 821         ptype = 'NovaVMPort'
 822         if (CONF.libvirt.virt_type == 'lxc'):
 823             ptype = 'NameSpacePort'
 824 
 825         cmd_args = ("--oper=add --uuid=%s --instance_uuid=%s --vn_uuid=%s "
 826                     "--vm_project_uuid=%s --ip_address=%s --ipv6_address=%s"
 827                     " --vm_name=%s --mac=%s --tap_name=%s --port_type=%s "
 828                     "--tx_vlan_id=%d --rx_vlan_id=%d" % (vif['id'],
 829                     instance.uuid, vif['network']['id'],
 830                     instance.project_id, ip_addr, ip6_addr,
 831                     instance.display_name, vif['address'],
 832                     vif['devname'], ptype, -1, -1))
 833         try:
 834             linux_net.create_tap_dev(dev)
 835             utils.execute('vrouter-port-control', cmd_args, run_as_root=True)
 836         except processutils.ProcessExecutionError:
 837             LOG.exception(_LE("Failed while plugging vif"), instance=instance)
~~~

It's in OSP 11 downstream:
~~~
 808     def plug_vrouter(self, instance, vif):
 809         """Plug into Contrail's network port
 810 
 811         Bind the vif to a Contrail virtual port.
 812         """
 813         dev = self.get_vif_devname(vif)
 814         ip_addr = '0.0.0.0'
 815         ip6_addr = None
 816         subnets = vif['network']['subnets']
 817         for subnet in subnets:
 818             if not subnet['ips']:
 819                 continue
 820             ips = subnet['ips'][0]
 821             if not ips['address']:
 822                 continue
 823             if (ips['version'] == 4):
 824                 if ips['address'] is not None:
 825                     ip_addr = ips['address']
 826             if (ips['version'] == 6):
 827                 if ips['address'] is not None:
 828                     ip6_addr = ips['address']
 829 
 830         ptype = 'NovaVMPort'
 831         if (CONF.libvirt.virt_type == 'lxc'):
 832             ptype = 'NameSpacePort'
 833 
 834         cmd_args = ("--oper=add --uuid=%s --instance_uuid=%s --vn_uuid=%s "
 835                     "--vm_project_uuid=%s --ip_address=%s --ipv6_address=%s"
 836                     " --vm_name=%s --mac=%s --tap_name=%s --port_type=%s "
 837                     "--tx_vlan_id=%d --rx_vlan_id=%d" % (vif['id'],
 838                     instance.uuid, vif['network']['id'],
 839                     instance.project_id, ip_addr, ip6_addr,
 840                     instance.display_name, vif['address'],
 841                     vif['devname'], ptype, -1, -1))
 842         try:
 843             multiqueue = self._is_multiqueue_enabled(instance.image_meta,
 844                                                      instance.flavor)
 845             linux_net.create_tap_dev(dev, multiqueue=multiqueue)
 846             utils.execute('vrouter-port-control', cmd_args, run_as_root=True)
 847         except processutils.ProcessExecutionError:
 848             LOG.exception(_LE("Failed while plugging vif"), instance=instance)
~~~

Comment 8 Sahid Ferdjaoui 2018-07-31 08:28:19 UTC
Hum perhaps we should just remove line 843 and let libvirt to create the TAP device so we ensure that the multiqueue flag will be set correctly in all places.

I guess we have added that code upstream to support libvirt version under 1.3.1 but that is not necessary for OSP10 which is shipped with RHEL7.

 843             multiqueue = self._is_multiqueue_enabled(instance.image_meta,
 844                                                      instance.flavor)

Comment 9 Andreas Karis 2018-07-31 18:15:18 UTC
Hi Sahid,

I don't know *why* we are creating that tap interface with nova. From my tests, it looks redundant ... libvirt seems to create it quite alright. However, I don't see how:
~~~
 843             multiqueue = self._is_multiqueue_enabled(instance.image_meta,
 844                                                      instance.flavor)
~~~
would be related to the creation of the tap interface ;-)

- Andreas

Comment 10 Andreas Karis 2018-07-31 18:18:39 UTC
I suppose you mean: https://github.com/openstack/nova/blob/newton-eol/nova/virt/libvirt/vif.py#L784

?

Comment 11 Sahid Ferdjaoui 2018-07-31 18:40:40 UTC
(In reply to Andreas Karis from comment #10)
> I suppose you mean:
> https://github.com/openstack/nova/blob/newton-eol/nova/virt/libvirt/vif.
> py#L784
> 
> ?

Yes right :-)

Is that possible you make a try without this line?

Comment 13 Bartosz Kupidura 2018-08-01 07:20:58 UTC
Guys, few things:
1) Why we dont want to simply backport fix? Instead You are trying to invent new change  which requires some additional tests?
2) If You plan to remove that, imho you should also fix other places - not only vrouter part...

Comment 14 Andreas Karis 2018-08-01 16:31:06 UTC
Hi,

From what I understand:

* Red Hat does not support nova networking in Red Hat OpenStack Platform 10 with our own networking solutions. It's pretty much only vrouter that's a) using nova networking and b) needing the multi queue feature, so we only need to address this particular part of code, and node all other places.

* Backporting the fix is more complex than simply removing that one line. And given that we control the libvirt version here which creates the tap interface itself,  we don't need to keep that particular part of code which creates the tap interface. 

* Backporting that fix is a feature backport (nova networking did not have the feature for multiqueue until mitaka)

OpenStack Platform 10 entered maintenance support in June of this year:
https://access.redhat.com/support/policy/updates/openstack/platform

The above link also stipulates the terms of support for maintenance support:
~~~
Full Support:

During the Production Phase, qualified Critical and Important Security Advisories (RHSAs) and urgent and selected High Priority Bug Fix Advisories (RHBAs) may be released as they become available. Other errata advisories may be delivered as appropriate.

If available, select enhanced software functionality may be provided at the discretion of Red Hat.

Maintenance releases will also include available and qualified errata advisories (RHSAs, RHBAs, and RHEAs). Maintenance releases are cumulative and include the contents of previously released updates. The focus for minor releases during this phase lies on resolving defects of medium or higher priority.

In addition, and only during Full Support:

    Customers and Partners have the ability to request new features which are introduced upstream to be selectively backported, pending the review of Red Hat Product Management and Engineering, until the end of this phase
    The installer components may be updated until the end of this phase
    Partner may introduce new plugins to be certified with the version until the end of this phase

Maintenance Support:

Same as Full Support, excluding:

    introduction of new features through backports
    introduction of additional partner plugins
~~~

Kind regards,

Andreas

Comment 15 Bartosz Kupidura 2018-08-02 06:30:37 UTC
So project maintenance/support is good reason :)
I will try to check this fix on env where we have problem and come back with info if it helps.

Comment 20 smooney 2018-08-10 13:38:14 UTC
renaming as this is not related to nova networks.

vrouter integration with openstack is provided via neutron however in osp 10 
vrouter support in nova is not delegated to os-vif.

the changes required to enable this multiqueu suppor from nova appear to be confined to the plug logic in vif.py, the helper functions in linux_net.py and the related test code.

in Rocky these will be facorted out into a separate vrouter os-vif plugin but they are in tree in this release. 

the plug logic in vif.py is common to nova networks and newton however the
code paths used for vrouter is only executed in a neutron deployment.

Comment 21 Andreas Karis 2018-08-16 21:43:48 UTC
Hi,

What's the status for this one?

Thanks!

- Andreas

Comment 24 Artom Lifshitz 2018-08-17 14:46:24 UTC
Check with Andrew and/or Joe, we could consider releasing this as OtherQA.

Comment 35 Andreas Karis 2018-10-18 18:21:05 UTC
The customer can wait for the official release.

Thanks for the work on this!

Comment 46 errata-xmlrpc 2019-01-16 17:09:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0074


Note You need to log in before you can comment on or make changes to this bug.