Bug 1738751

Summary:	NFV live migration fails with dpdk "--iova-mode va": Failed to load virtio-net:virtio
Product:	Red Hat Enterprise Linux 8	Reporter:	Pei Zhang <pezhang>
Component:	dpdk	Assignee:	Adrián Moreno <amorenoz>
Status:	CLOSED WONTFIX	QA Contact:	Pei Zhang <pezhang>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.1	CC:	aadam, amorenoz, chayang, dmarchan, jinzhao, juzhang, jwboyer, kanderso, maxime.coquelin, ovs-qe, tredaelli
Target Milestone:	rc	Keywords:	Regression
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1763815 1764000 (view as bug list)		Environment:
Last Closed:	2019-11-21 08:18:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1763815, 1764000

Description Pei Zhang 2019-08-08 05:08:23 UTC

Description of problem:

NFV live migration fails when there are packets through VM, both below scenarios hit this issue:

(1)Live migration with ovs+vhost-user+dpdk fails.
(2)Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails.

Version-Release number of selected component (if applicable):
4.18.0-128.el8.x86_64
qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.x86_64

upstream dpdk: git://dpdk.org/dpdk  master 
# git log -1
commit de4473d911323ae1d26581c2f85abee5d1aaeb81 (HEAD -> master, origin/master, origin/HEAD)
Author: Jerin Jacob <jerinj>
Date:   Wed Aug 7 08:41:34 2019 +0530

    net/memif: fix build with gcc 9.1
    
    gcc-9 is stricter on NULL arguments for printf.
    Fix the following build error by avoiding NULL argument to printf.
    
    In file included from drivers/net/memif/memif_socket.c:26:
    In function 'memif_socket_create',
    inlined from 'memif_socket_init' at net/memif/memif_socket.c:965:12:
    net/memif/rte_eth_memif.h:35:2: error:
    '%s' directive argument is null [-Werror=format-overflow=]
       35 |  rte_log(RTE_LOG_ ## level, memif_logtype, \
          |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       36 |   "%s(): " fmt "\n", __func__, ##args)
          |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    Fixes: 09c7e63a71f9 ("net/memif: introduce memory interface PMD")
    
    Signed-off-by: Jerin Jacob <jerinj>


How reproducible:
100%


Steps to Reproduce:
1. Boot testpmd in src and des host, refer to [1]

2. Start VM in guest, refer to [2]

3. Start testpmd in guest and start MoonGen in another host, guest can receive packets, refer to[3]

4. Migrate guest from src to des host, fail.

# /bin/virsh migrate --verbose --persistent --live rhel8.0 qemu+ssh://192.168.1.2/system
Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2019-08-08T04:53:36.910570Z qemu-system-x86_64: -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser0.sock,server
2019-08-08T04:53:37.158587Z qemu-system-x86_64: -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser1.sock,server
2019-08-08T04:53:52.371518Z qemu-system-x86_64: VQ 0 size 0x400 < last_avail_idx 0x4d6b - used_idx 0x3ffa
2019-08-08T04:53:52.371633Z qemu-system-x86_64: Failed to load virtio-net:virtio
2019-08-08T04:53:52.371639Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0:00.0/virtio-net'
2019-08-08T04:53:52.373273Z qemu-system-x86_64: load of migration failed: Operation not permitted


Actual results:
Migration fails.

Expected results:
Migration should work well.

Additional info:
1. This bug can only reproduced with upstream latest dpdk. downstream and upstream stable dpdk both work well.

(1)git://dpdk.org/dpdk-stable 18.11   works well
# git log -1
commit 6fc5c48f0b6fb7066e4a192f4e16a142d151bd5a (HEAD, origin/18.11) 

(2)dpdk-18.11-8.el8.x86_64  works well


Reference:

[1]
# /home/nfv-virt-rt-kvm/packages/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 2,4,6,8,10 \
--socket-mem 1024,1024 \
-n 4 \
-d /home/nfv-virt-rt-kvm/packages/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \
--vdev net_vhost0,iface=/tmp/vhostuser0.sock,queues=1,client=1,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhostuser1.sock,queues=1,client=1,iommu-support=1 \
-- \
--portmask=f \
-i \
--rxd=512 --txd=512 \
--rxq=1 --txq=1 \
--nb-cores=4 \
--forward-mode=io

testpmd> set portlist 0,2,1,3
testpmd> start

[2]
<domain type='kvm' id='9'>
  <name>rhel8.0</name>
  <uuid>c67628f0-b996-11e9-8d0c-a0369fc7bbea</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>6</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='31'/>
    <vcpupin vcpu='1' cpuset='29'/>
    <vcpupin vcpu='2' cpuset='30'/>
    <vcpupin vcpu='3' cpuset='28'/>
    <vcpupin vcpu='4' cpuset='26'/>
    <vcpupin vcpu='5' cpuset='24'/>
    <emulatorpin cpuset='1,3,5,7,9,11'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel7.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <pmu state='off'/>
    <vmport state='off'/>
    <ioapic driver='qemu'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-5' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='threads' iommu='on' ats='on'/>
      <source file='/mnt/nfv//rhel8.0.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='none'>
      <alias name='usb'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x0'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x0'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x0'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x0'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x0'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x0'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x0'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x0'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='18:66:da:5f:dd:01'/>
      <source bridge='switch'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:02'/>
      <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:03'/>
      <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
    <iommu model='intel'>
      <driver intremap='on' caching_mode='on' iotlb='on'/>
    </iommu>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c555,c801</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c555,c801</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+107</label>
    <imagelabel>+0:+107</imagelabel>
  </seclabel>
</domain>

[3]
/home/nfv-virt-rt-kvm/packages/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 1,2,3 \
-n 4 \
-d /home/nfv-virt-rt-kvm/packages/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_virtio.so \
-w 0000:06:00.0 -w 0000:07:00.0 \
-- \
--nb-cores=2 \
-i \
--disable-rss \
--rxd=512 --txd=512 \
--rxq=1 --txq=1

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1381992    RX-missed: 0          RX-bytes:  82919520
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1177961    TX-errors: 0          TX-bytes:  70677660

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 1178923    RX-missed: 0          RX-bytes:  70735380
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1381077    TX-errors: 0          TX-bytes:  82864620

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################
testpmd>

Comment 1 David Marchand 2019-08-08 07:07:19 UTC

Do you know which last revision of upstream version was working fine?
This would help bisect the issue.

Comment 2 Pei Zhang 2019-08-11 11:46:41 UTC

(In reply to David Marchand from comment #1)
> Do you know which last revision of upstream version was working fine?
> This would help bisect the issue.

Hi David,

v19.08-rc2 is the first version hit this problem.

commit 07efd6ddc0499688eb11ae4866d3532295d6db2b (tag: v19.05, origin/releases)  works well
commit cc091931dc05212db32ddbd7da3031104ca4963f (tag: v19.08-rc1)               works well
commit 83a124fb73c50b051ee20ef6b1998c81be7e65df (tag: v19.08-rc2)               fail
commit 0710d87b7f5d0a2cd01861d44c4689efd4714b5f (tag: v19.08-rc4)               fail

Best regards,

Pei

Comment 3 Pei Zhang 2019-08-12 03:12:00 UTC

We also file a bug in upstream dpdk:
https://bugs.dpdk.org/show_bug.cgi?id=337

Comment 4 Adrián Moreno 2019-09-13 11:58:43 UTC

I have been able to reproduce this issue and bisected it to the following commit:

commit bbe29a9bd7ab6feab9a52051c32092a94ee886eb
Author: Jerin Jacob <jerinj>
Date:   Mon Jul 22 14:56:53 2019 +0200

    eal/linux: select IOVA as VA mode for default case
    
    When bus layer reports the preferred mode as RTE_IOVA_DC then
    select the RTE_IOVA_VA mode:
    
    - All drivers work in RTE_IOVA_VA mode, irrespective of physical
    address availability.
    
    - By default, a mempool asks for IOVA-contiguous memory using
    RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
    may affect the application boot time.
    
    Signed-off-by: Jerin Jacob <jerinj>
    Acked-by: Anatoly Burakov <anatoly.burakov>
    Signed-off-by: David Marchand <david.marchand>

This commit only changes the default IOVA mode, from IOVA_PA to IOVA_VA so this is just revealing an underlying problem.
Confirmed this by verifying that upstream dpdk with "--iova-mode pa" works fine and stable downstream dpdk fails in the same manner if "--iova-mode va" is used.

Going to qemu, the code that detecting the error is:

           vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
                                vdev->vq[i].used_idx);
            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
                             "used_idx 0x%x",
                             i, vdev->vq[i].vring.num,
                             vdev->vq[i].last_avail_idx,
                             vdev->vq[i].used_idx);
                return -1;
            }

One of the times I've reproduced it, I looked at the index values on the sending qemu just before sending the vmstates:
size 0x100 | last_avail_idx 0x3aa0 | used_idx 0x3aa0
And just after loading the vmstates at the receiving qemu:
VQ 0 size 0x100 < last_avail_idx 0x3aa0 - used_idx 0xbda0

At first I suspected an endianes issue but then confirmed that virtio_lduw_phys_cached handles it properly.

So, it might be that the memory caches don't get properly synchronized before the migration takes place.

Comment 6 Adrián Moreno 2019-10-11 08:58:25 UTC

Although the problem was detected as a regression on guest dpdk (therefore RHEL product), the problem was actually in the host-side. I have sent a patch upstream that fixes it [1], so I suggest moving this bug to the FD stream.

With regards to the possibility of clients being affected by this problem when upgrading to rhel8.1.1, I suggest adding a note in the documentation explaining the workaround which is:
a) upgrade the host to the FD version that contains the fix
b) add "--iova-mode pa" to the EAL's parameters

Comment 20 Adrián Moreno 2019-11-21 08:18:15 UTC

Closing this bug as the issue has to be fixed in the host and BZ 1763815 will take care of that.