Bug 1418544

Summary: With vCPU count greater than vhostuser queues, instance not able to bring interface up.
Product: Red Hat OpenStack Reporter: VIKRANT <vaggarwa>
Component: openstack-neutronAssignee: Karthik Sundaravel <ksundara>
Status: CLOSED CANTFIX QA Contact: Toni Freger <tfreger>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: ailan, amuller, atelang, chrisw, fbaudin, fleitner, jhsiao, jraju, ksundara, nyechiel, psahoo, sacpatil, sgordon, srevivo, supadhya, tamar.inbar-shelach, vaggarwa, vchundur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of:
: 1463220 (view as bug list) Environment:
Last Closed: 2017-06-21 06:57:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1463220    

Description VIKRANT 2017-02-02 04:44:03 UTC
Description of problem:

When enabling multi-queue inside a VM, Cu. observed the number of vCPU in the VM has to be no bigger than the number of vhostuser queues otherwise vNIC won't be up. There is no such constraint when multi-queue is not enabled inside a VM.

~~~
(1) vhostuser queues#=2, vCPUs#=2: good
il-yardstick@IL-yardstick:~$ grep eth good-vm-console.log
ci-info: | eth0 | True | 10.0.1.3 | 255.255.255.0 | xx:xx:xx:b2:8b:1c |
ci-info: | 0 | 0.0.0.0 | 10.0.1.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.1.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 10.0.1.2 | 255.255.255.255 | eth0 | UGH |
"eth0" added, 100 Mbit bandwidth limit.


(2) vhostuser queues#=2, vCPUs#=4: not good, eth0 fails to come up
il-yardstick@IL-yardstick:~$ grep eth bad-vm-console.log
ci-info: | eth0 | True | . | . | xx:xx:xx:bf:53:fe |
"eth0" added, 100 Mbit bandwidth limit.


(3) vhostuser queues#=6, vCPUs#=4: good
il-yardstick@IL-yardstick:~$ grep eth good-vm-console.log
ci-info: | eth0 | True | 10.0.1.4 | 255.255.255.0 | xx:xx:xx:2e:1a:95 |
ci-info: | 0 | 0.0.0.0 | 10.0.1.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.0.1.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 10.0.1.2 | 255.255.255.255 | eth0 | UGH |
"eth0" added, 100 Mbit bandwidth limit.
~~~

Here is the output of dpdk pmd. 

~~~
# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 25:
port: dpdk0 queue-id: 2
port: vhu1f292b02-0f queue-id: 0 2
pmd thread numa_id 0 core_id 26:
port: dpdk0 queue-id: 3
port: vhu1f292b02-0f queue-id: 1 3
pmd thread numa_id 0 core_id 1:
port: dpdk0 queue-id: 0 4
port: vhu1f292b02-0f queue-id: 4
pmd thread numa_id 0 core_id 2:
port: dpdk0 queue-id: 1 5
port: vhu1f292b02-0f queue-id: 5
~~~

Version-Release number of selected component (if applicable):
OSP 8, OVS-dpdk 2.5

How reproducible:
Everytime for Cu. 

Steps to Reproduce:
1. Enable multi-queue in glance image and nova flavor followed by booting a VM.
2. Try to spawn instance using different cases which are mentioned in description.
3.

Actual results:
Instance is not able to bring the interface up when vCPU count is greater than vhostuser queues.  

Expected results:
Instance should be able to bring the interface up in all cases. 

Additional info:

Comment 1 Franck Baudin 2017-02-02 17:57:56 UTC
Can you document the use of having more queues than vCPUs? I'm not arguing here, but I would like to understand if this is a VNF requirement or a test matrix requirement or ...

Comment 7 VIKRANT 2017-03-10 13:48:28 UTC
Hi Steve,

Cu. wants to confirm whether it's a bug or design intent. I am setting neeinfo for you as per C#3

Comment 8 Stephen Gordon 2017-03-10 14:54:19 UTC
(In reply to VIKRANT from comment #7)
> Hi Steve,
> 
> Cu. wants to confirm whether it's a bug or design intent. I am setting
> neeinfo for you as per C#3

From a Nova POV it's expected, the number of queues is scaled with the number of vCPUs of the guest by design. The problem as I understand it as the OVS side of things isn't dynamic, and expects to be configured for a fixed maximum number which doesn't work when we can have a guest of a differing vCPU count spawned at any time.

Comment 14 VIKRANT 2017-03-21 08:54:08 UTC
Thanks Assaf.

When we are spawning a instance with more VCPUs than dpdk rx queues in that case instance is not able to pick the IP address from DHCP. This is what we see in nova console-log of that instance:

~~~
(2) vhostuser queues#=2, vCPUs#=4: not good, eth0 fails to come up
il-yardstick@IL-yardstick:~$ grep eth bad-vm-console.log
ci-info: | eth0 | True | . | . | xx:xx:xx:bf:53:fe |
"eth0" added, 100 Mbit bandwidth limit.
~~~

Everything is working fine if we are having vcpu count equal to rx queue or when rx queue count is greater than vcpu count. 

As per Stephen Gordon input: from a Nova POV it's expected, the number of queues is scaled with the number of vCPUs of the guest by design. The problem as I understand it as the OVS side of things isn't dynamic, and expects to be configured for a fixed maximum number which doesn't work when we can have a guest of a differing vCPU count spawned at any time.

Cu. wants to confirm is this a bug or design intent?

Also, I am trying to confirm whether we should include this in documentation that only supported configuration is having equal number of VCPU and rx queue count? if this is a bug, do I need to open a separate bug for openvswitch regarding this issue?

Comment 16 Flavio Leitner 2017-03-29 02:53:44 UTC
Hi,

OVS 2.5 has a static number of RX queues (n-dpdk-rxqs) which is per data path and the requirement is that qemu needs to configured with the right number of vectors:

The $q below is the number of queues.
The $v is the number of vectors, which is '$q x 2 + 2'.
Then the qemu line looks like this:
   -chardev socket,id=char2,path=/var/run/openvswitch/vhost-user-2
   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
   -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v

So, there is no requirement related to vCPUs.  Of course, it doesn't make sense to enable multiple queues on a VM with not enough vCPUs, but the side effect would be just bad performance. Perhaps OSP does some math to estimate the number of vectors based on the number of vCPUs because that somehow is related to the number of queues?  Just check the qemu command line while the VM is reproducing the issue.

If that is correct, please attach a sosreport while reproducing the issue from the host and the guest.

Thanks!
fbl

Comment 17 VIKRANT 2017-03-29 10:40:31 UTC
Thanks for the update Flavio.

here are (In reply to Flavio Leitner from comment #16)
> Hi,
> 
> OVS 2.5 has a static number of RX queues (n-dpdk-rxqs) which is per data
> path and the requirement is that qemu needs to configured with the right
> number of vectors:
> 
> The $q below is the number of queues.
> The $v is the number of vectors, which is '$q x 2 + 2'.
> Then the qemu line looks like this:
>    -chardev socket,id=char2,path=/var/run/openvswitch/vhost-user-2
>    -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
>    -device
> virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
> 

Does that mean to change the number of rx queues inside the instance, we need not  to change the "n-dpdk-rxqs" parameter of ovs-dpdk? If yes, then how can we control the rx queue inside the instance. 

> So, there is no requirement related to vCPUs.  Of course, it doesn't make
> sense to enable multiple queues on a VM with not enough vCPUs, but the side
> effect would be just bad performance. Perhaps OSP does some math to estimate
> the number of vectors based on the number of vCPUs because that somehow is
> related to the number of queues?  Just check the qemu command line while the
> VM is reproducing the issue.
> 

Here scenario is having more number of VCPUs then rx queue count. Cu. is only concern about this scenario because instance is not able to pick the IP address in this scenario. 

> If that is correct, please attach a sosreport while reproducing the issue
> from the host and the guest.
> 
> Thanks!
> fbl

Comment 18 Flavio Leitner 2017-03-29 19:06:16 UTC
(In reply to VIKRANT from comment #17)
> Does that mean to change the number of rx queues inside the instance, we
> need not  to change the "n-dpdk-rxqs" parameter of ovs-dpdk? If yes, then
> how can we control the rx queue inside the instance. 

The OVS datapath should be configured to the maximum number of queues supported by the solution. Then each guest can use one or up to that maximum number of queues. By default the guest starts with 1 queue enabled.

> Here scenario is having more number of VCPUs then rx queue count. Cu. is
> only concern about this scenario because instance is not able to pick the IP
> address in this scenario. 

I understand. My point is that multiple queues has little to do with vCPUs.  In practice you could have 16 queues mapped to a single vCPU, or one queue and 16 vCPUs. All should work.

Without more information, my guess would be that OSP is mapping the number of vCPUs to the number of queues, but it's not updating OVS datapath, just the number of vectors for the guest.

Comment 19 Stephen Gordon 2017-04-03 12:01:42 UTC
*** Bug 1438314 has been marked as a duplicate of this bug. ***

Comment 25 Flavio Leitner 2017-04-19 20:21:26 UTC
Hi,

I got these configs from your end:
instance-00000050:
  -smp 2,sockets=2,cores=1,threads=1
  -netdev type=vhost-user,id=hostnet0,chardev=charnet0,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=fa:16:3e:48:d9:93,bus=pci.0,addr=0x3

instance-00000051:
  -smp 2,sockets=2,cores=1,threads=1
  -netdev type=vhost-user,id=hostnet0,chardev=charnet0,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=fa:16:3e:3d:62:ba,bus=pci.0,addr=0x3

Both instances are identical and they seem to be correct. Do you see an issue with those?

BTW, reviewing again comment#20, the config used in the bad case was 2 PMDs, 1 queue for the vswitch, but the qemu was configured with 2 vCPUs, 2 queues and 6 vectors. That's incompatible/invalid with the vswitch configuration and explains what it doesn't work.

You most probably should have seen an error message in the log:
2017-04-17T09:36:35.256Z|00035|dpdk(vhost_thread1)|ERR|vHost Device '/var/run/openvswitch/vhu88eb58b8-e1' 1 can't be added - too many queues 2 > 1

So, in order to test with less CPUs, then do:
1) Downgrade the OVS number of queues to 1 as you are doing.
2) Reconfigure the instance to use 1 queue if possible
3) Reconfigure the instance to have 2 vCPUs
4) Boot the instance.

I tried the vcpus > queues on my env without OSP and works out:

Number of CPUS:
[root@localhost ~]# lscpu
 Architecture:          x86_64
 CPU op-mode(s):        32-bit, 64-bit
 Byte Order:            Little Endian
>CPU(s):                4


Checking the number of queues:
# ethtool -l eth1 | grep ombin
Combined:       2
Combined:       1

So, I have up to 2 queues, but only 1 enabled.

This is the qemu command line:
-netdev type=vhost-user,id=hostnet1,chardev=charnet1,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=52:54:00:96:41:e3,bus=pci.0,addr=0x3

Note that vectors(6) = 2xqueues(2) + 2.

This is the IP address of this guest:
# ip a show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 52:54:00:96:41:e3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.47.2/24 scope global eth1

Ping the gateway
# ping -c 3 192.168.47.1
PING 192.168.47.1 (192.168.47.1) 56(84) bytes of data.
64 bytes from 192.168.47.1: icmp_seq=1 ttl=64 time=0.312 ms
64 bytes from 192.168.47.1: icmp_seq=2 ttl=64 time=0.132 ms
64 bytes from 192.168.47.1: icmp_seq=3 ttl=64 time=0.116 ms

--- 192.168.47.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.116/0.186/0.312/0.090 ms

openvswitch-2.5.0-22.git20160727.el7fdp.x86_64

Thanks,
fbl

Comment 26 VIKRANT 2017-05-03 06:26:17 UTC
many thanks for the update. Sorry for late response I was caught with other issues.

Kindly find the in-line response: 

==> Both instances are identical and they seem to be correct. Do you see an issue with those?

Yes, when vcpu count is greater than the rx queue then instance is not able to pick the IP address using DHCP. 

==> You most probably should have seen an error message in the log:
2017-04-17T09:36:35.256Z|00035|dpdk(vhost_thread1)|ERR|vHost Device '/var/run/openvswitch/vhu88eb58b8-e1' 1 can't be added - too many queues 2 > 1

I checked but didn't see any error for old spawned instances on compute node. Nothing returned in following output. 

~~~
# grep -ir 'too many' /var/
~~~

I deleted the old instance which was spawned with number of vcpu > number of rx queue and spawned the new instance again with same configuration. yes, I can see this warning message: 

~~~
[root@compute-0 ~]# grep -ir 'too many' /var/ | grep -v Binary
/var/log/openvswitch/ovs-vswitchd.log:2017-05-03T05:45:18.658Z|00041|dpdk(vhost_thread1)|ERR|vHost Device '/var/run/openvswitch/vhu3da0c14b-7c' 1 can't be added - too many queues 2 > 1
/var/log/messages:May  3 05:45:18 compute-0 ovs-vswitchd[19646]: ovs|00041|dpdk(vhost_thread1)|ERR|vHost Device '/var/run/openvswitch/vhu3da0c14b-7c' 1 can't be added - too many queues 2 > 1
~~~

If you see the instance is not reachable about boot.

~~~
[stack@dell-fc430-1 ~]$ nova list --name=testinstance002
+--------------------------------------+-----------------+--------+------------+-------------+---------------------------------+
| ID                                   | Name            | Status | Task State | Power State | Networks                        |
+--------------------------------------+-----------------+--------+------------+-------------+---------------------------------+
| 290d9d99-fe3e-4394-8a1e-a733c1948c60 | testinstance002 | ACTIVE | -          | Running     | dpdk-provider-170=10.65.199.163 |
+--------------------------------------+-----------------+--------+------------+-------------+---------------------------------+
[stack@dell-fc430-1 ~]$ ping 10.65.199.163
PING 10.65.199.163 (10.65.199.163) 56(84) bytes of data.
From 10.65.177.253 icmp_seq=1 Destination Host Unreachable
From 10.65.177.253 icmp_seq=3 Destination Host Unreachable
^C
--- 10.65.199.163 ping statistics ---
3 packets transmitted, 0 received, +2 errors, 100% packet loss, time 2000ms
~~~

qemu output for that particular instance. 

~~~
[root@compute-0 ~]# ps -ef | grep 'instance-00000056'
qemu      141565       1  7 05:45 ?        00:00:48 /usr/libexec/qemu-kvm -name guest=instance-00000056,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-instance-00000056/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Haswell-noTSX,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+dca,+osxsave,+f16c,+rdrand,+arat,+tsc_adjust,+xsaveopt,+pdpe1gb,+abm -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=yes,size=1073741824,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 290d9d99-fe3e-4394-8a1e-a733c1948c60 -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.2-7.el7ost,serial=e1622fe8-eb7d-44d0-a2d5-7ce6991b2120,uuid=290d9d99-fe3e-4394-8a1e-a733c1948c60,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-13-instance-00000056/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/290d9d99-fe3e-4394-8a1e-a733c1948c60/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu3da0c14b-7c -netdev type=vhost-user,id=hostnet0,chardev=charnet0,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=fa:16:3e:76:d3:03,bus=pci.0,addr=0x3 -add-fd set=0,fd=32 -chardev file,id=charserial0,path=/dev/fdset/0,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
root      142615  140115  0 05:55 pts/2    00:00:00 grep --color=auto instance-00000056
~~~

Again, sorry for same question, but how can i control the queues count of instance. Because we can see that by default it's picking the value of "2". You want me to control the queue count to "1" while spawning instance but I didn't find the way to do it using glance image metadata.

I got this output from instance after spawning it. similar to yours only one queue is active. 

~~~
# ethtool -l eth0 | grep combin
Combined:       2
Combined:       1
~~~

Comment 27 VIKRANT 2017-05-03 06:27:59 UTC
Forgot to mention in last comment that ovs-dpdk queue count was one only.

Comment 28 Flavio Leitner 2017-05-03 14:26:08 UTC
(In reply to VIKRANT from comment #26)
> I deleted the old instance which was spawned with number of vcpu > number of
> rx queue and spawned the new instance again with same configuration. yes, I
> can see this warning message: 
> 
> ~~~
> [root@compute-0 ~]# grep -ir 'too many' /var/ | grep -v Binary
> /var/log/openvswitch/ovs-vswitchd.log:2017-05-03T05:45:18.
> 658Z|00041|dpdk(vhost_thread1)|ERR|vHost Device
> '/var/run/openvswitch/vhu3da0c14b-7c' 1 can't be added - too many queues 2 >
> 1
> /var/log/messages:May  3 05:45:18 compute-0 ovs-vswitchd[19646]:
> ovs|00041|dpdk(vhost_thread1)|ERR|vHost Device
> '/var/run/openvswitch/vhu3da0c14b-7c' 1 can't be added - too many queues 2 >
> 1
> ~~~

Right, so the interface back-end is broken and will not work. There is a mis-configuration that the instance is asking to enable more queues than the vswitch can provide.

> Again, sorry for same question, but how can i control the queues count of
> instance. Because we can see that by default it's picking the value of "2".
> You want me to control the queue count to "1" while spawning instance but I
> didn't find the way to do it using glance image metadata.

I hope someone from OpenStack/NFV can help you.
Vijay, could you please have a look?
Thanks,

Comment 29 Karthik Sundaravel 2017-05-03 18:59:42 UTC
Vikrant,
I understand from the earlier comments,  the number of queues in guest will be the same as the number of vCPUs in guests.

I haven't tried, but it looks like we can set the image property
hw_vif_multiqueue_enabled=false
so that the multiqueue is disabled for the guest. 

Steve, 
Can you please let us know if the above is right?

Comment 30 VIKRANT 2017-05-04 06:17:29 UTC
Hi Karthik,

I am spawning instance using two vcpus with multiqueue enabled, by default, it will pick the queue count equal to number of vcpus which in this case is 2. But I want to keep the queue count as 1 instead of having 2 vcpus. 

We need to test if the queue count is 1 which is equal to ovs-dpdk rx queue count then whether instance is able to pick the DHCP IP address. I can't use the vcpu count as 1 because we have to keep it as 2 for this testing. 

Do you know anyway to control the queue count irrespective of vcpu?

Comment 31 Stephen Gordon 2017-05-04 10:18:54 UTC
(In reply to Karthik Sundaravel from comment #29)
> Vikrant,
> I understand from the earlier comments,  the number of queues in guest will
> be the same as the number of vCPUs in guests.
> 
> I haven't tried, but it looks like we can set the image property
> hw_vif_multiqueue_enabled=false
> so that the multiqueue is disabled for the guest. 
> 
> Steve, 
> Can you please let us know if the above is right?

That's correct, the available options at the Nova level are either:

a) Multi-queue disabled (default) - single queue per guest.
b) Multi-queue enabled (opt-in) - single queue per guest vCPU (so 2 for Vikrant's example).

Finer grained tuning is not possible via Nova, and exposing such knobs directly to the user at that level is generally frowned upon.

Comment 32 VIKRANT 2017-05-06 01:16:51 UTC
Hi Flavio,

It seems like that we are out of luck here. Any other idea or test you want me to perform?

Comment 33 Flavio Leitner 2017-05-08 17:52:11 UTC
Hi VIKRANT,

I admit that at this point I am not sure if you found a corner case in your environment because you have less resources available or if the issue is exactly what the customer is finding in his environment.

Anyway, from OVS point of view, that is an invalid configuration. The VMs cannot request for more queues than what is being configured in the vswitch and both parameters are under OSP control.  OVS doesn't care at all about the number of vCPUs. OVS cares about the number of queues requested by the VM and if it can comply or not with the configuration provided by OSP.

Comment 34 VIKRANT 2017-05-15 10:46:02 UTC
Hi Flavio,

Thanks for the update. We are good to close the bug. Before closing this bug, can you please confirm are we good to publish the following kcs related to this query:

https://access.redhat.com/solutions/2995841

Comment 35 VIKRANT 2017-05-17 03:55:19 UTC
Flavio, if following solution looks good to you. kindly let me know so that I can publish it. 

https://access.redhat.com/solutions/2995841

Comment 36 Tamar Inbar Shelach 2017-05-17 10:06:15 UTC
Hi ,sorry for joining the conversation late and if I miss something:

I feel like this is a bug that would be hard to explain to customers. 
Lets say we have 4 queues in the host . 
The customer wants a VM win 8 vcpus, 4 will be used for network processing and 4 for additional for some computations processes - theoretically possible. 

from what I understood the layer that connects with OVS is counting number of queues as number of vcpus - this is a wrong assumption. 

Then the deployment fails. 


so our recommendation to customers should be to deploy the DPDK multi queue with max number of queues or at least the max number of vcpus for tenants? doesn't this have a memory toll on the host? 

I'm not sure it's a reasonable recommendation.

Comment 37 Stephen Gordon 2017-06-01 21:00:21 UTC
(In reply to Tamar Inbar Shelach from comment #36)

> so our recommendation to customers should be to deploy the DPDK multi queue
> with max number of queues or at least the max number of vcpus for tenants?
> doesn't this have a memory toll on the host? 
> 
> I'm not sure it's a reasonable recommendation.

The problem is identifying what *is* a reasonable way to determine the optimal number of queues at instance launch (and ideally not being so restrictive at host setup) in a "cloudy" way - that is, the cloud user should not have to explicitly set it.

See also the long running discussions in the design discussion:

    https://review.openstack.org/#/c/128825/

In particular Dan's comment:

"We could start with a simple impl, where we just have a hardcoded policy for the number of queues we assign, when multiqueue is enabled. As mentioned before, this would not be satisfactory for all use cases, but it might be sufficient for a reasonable number of use cases. We could then spend more time considering how best to express the greater configurability."

This compromise was required to get any solution that allowed a # of queues > 1 into Nova. Scaling # of queues based on # VCPUs was the hardcoded policy selected in lieu of anyone being able to come up with a better heuristic.

Comment 41 Jaison Raju 2017-06-21 06:57:36 UTC
Hello ,

Discussing this with engineering & PM , we have considered for looking for a solution for the current version of ovs .
I have raised a new request for fixing this issue in the current ovs2.6 on RHOS10 .
https://bugzilla.redhat.com/show_bug.cgi?id=1463220
Closing this bug .

Regards,
Jaison R