Bug 1820120 - After hotunplugging the vitrio device and netdev, hotunpluging the failover VF will cause qemu core dump
Summary: After hotunplugging the vitrio device and netdev, hotunpluging the failover V...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: x86_64
OS: Linux
medium
unspecified
Target Milestone: rc
: ---
Assignee: Juan Quintela
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-02 10:13 UTC by Yanghang Liu
Modified: 2020-07-28 07:13 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 07:12:15 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
(gdb) t a a bt full (51.28 KB, text/plain)
2020-04-02 10:29 UTC, Yanghang Liu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3172 0 None None None 2020-07-28 07:13:08 UTC

Description Yanghang Liu 2020-04-02 10:13:58 UTC
Description of problem:
After hotunplugging the vitrio device and netdev, hotunpluging the failover VF will cause qemu core dump.

Version-Release number of selected component (if applicable):
host:
4.18.0-193.el8.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64
guest:
4.18.0-193.el8.x86_64


How reproducible:
100%

Steps to Reproduce:
1.create NetXtreme BCM57810 VF and set the mac address of the VF
# ip link set enp131s0f0 vf 0  mac 22:2b:62:bb:a9:82

2.start a source guest with NetXtreme BCM57810 VF which enables failover
/usr/libexec/qemu-kvm -name rhel8-2 -M q35 -enable-kvm \
-monitor stdio \
-nodefaults \
-m 4G \
-boot menu=on \
-cpu Haswell-noTSX-IBRS \
-device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \
-device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \
-device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \
-device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \
-smp 2,sockets=1,cores=2,threads=2,maxcpus=4 \
-qmp tcp:0:5555,server,nowait \
-blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/nfsmount/migra_test/192.qcow2,aio=threads \
-blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root.1 \
-device VGA,id=video1,bus=root.2  \
-vnc :5 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:83:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \



3.check the network status in guest
# ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.226  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 2620:52:0:4920:202b:62ff:febb:a982  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::202b:62ff:febb:a982  prefixlen 64  scopeid 0x20<link>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 2166  bytes 154192 (150.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 149  bytes 64606 (63.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 2035  bytes 134389 (131.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 131  bytes 19803 (19.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 149  bytes 64606 (63.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfc800000-fc807fff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 52  bytes 4852 (4.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 52  bytes 4852 (4.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0



4.hotunplug the virtio device and netdev

{"execute": "device_del", "arguments": {"id": "net0"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1585820529, "microseconds": 793648}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1585820529, "microseconds": 891586}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}


{"execute": "netdev_del", "arguments": {"id": "hostnet0"}}
output:
{"return": {}}



5.check the network status in guest
# dmesg
[  616.736605] virtio_net virtio1 enp3s0: failover standby slave:enp3s0nsby unregistered
[  617.169557] virtio_net virtio1 enp3s0: failover primary slave:enp4s0 unregistered
[  617.170686] virtio_net virtio1 enp3s0: failover master:enp3s0 unregistered

# ifconfig -a
enp4s0: flags=4098<BROADCAST,MULTICAST>  mtu 1500   <--- failover VF status is down at this time
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 131  bytes 15177 (14.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 86  bytes 7840 (7.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfc800000-fc807fff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 56  bytes 5380 (5.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 56  bytes 5380 (5.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

6.hotunplug the failover VF
{"execute": "device_del", "arguments": {"id": "hostdev0"}}
output:
{"return": {}}


Actual results:
qemu core dump happens

Expected results:
the failover vf can be hotunpluged successfully.
The vm should work well

Additional info:
(1)
XL710,XXV710,82599ES,82576,Mellanox MT27800 can reproduce this problem as well.
(2)
The backtrace info about qemu core dump is in the attachment.

Comment 1 Yanghang Liu 2020-04-02 10:29:38 UTC
Created attachment 1675671 [details]
(gdb) t a a bt full

Comment 2 Rick Barry 2020-04-09 20:30:09 UTC
I'm not sure this is Live Migration or virtio device hot unplgging, but leaving subcomponent as Live Migration until this can be triaged.

Comment 5 Juan Quintela 2020-07-03 12:33:09 UTC
Hi

I send the fix upstream:

https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01163.html

brew: 29861942

Comment 8 Yanghang Liu 2020-07-07 05:48:56 UTC
Do a quick test based on the build mentioned in comment 5
(https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=29861949)

# uname -r 
4.18.0-193.12.1.el8_2.x86_64
# rpm -q qemu-kvm
qemu-kvm-4.2.0-28.module+el8.2.1+6815+1c792dc8.quintela202007031210.x86_64


Step:
1.create NetXtreme BCM57810 VF and set the mac address of the VF
# ip link set enp130s0f0 vf 0  mac 22:2b:62:bb:a9:82

2.start a source guest with NetXtreme BCM57810 VF which enables failover
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:82:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \


3.hotunplug the virtio device and netdev
{"execute": "device_del", "arguments": {"id": "net0"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1585820529, "microseconds": 793648}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1585820529, "microseconds": 891586}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}


{"execute": "netdev_del", "arguments": {"id": "hostnet0"}}
output:
{"return": {}}

4.check dmesg in guest
# dmesg
...
virtio_net virtio1 enp3s0: failover standby slave:enp3s0nsby unregistered
virtio_net virtio1 enp3s0: failover primary slave:enp4s0 unregistered
virtio_net virtio1 enp3s0: failover master:enp3s0 unregistered


5.hotunplug the failover VF
{"execute": "device_del", "arguments": {"id": "hostdev0"}}
output:
{"return": {}}


6.check the status of vm
The vm works well and qemu core dump does not occur.


7.reboot the vm
The vm still works well after rebooting the vm.

Comment 9 Yanghang Liu 2020-07-07 06:05:18 UTC
This bug can be reproduced in the following test environment:

# rpm -q qemu-kvm
qemu-kvm-4.2.0-28.module+el8.2.1+7211+16dfe810.x86_64
# uname -r 
4.18.0-193.12.1.el8_2.x86_64

Comment 14 Yanghang Liu 2020-07-08 02:11:49 UTC
Verification:


Test env:
host:
4.18.0-193.12.1.el8_2.x86_64
qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64


Step:
1.create NetXtreme BCM57810 VF and set the mac address of the VF
# ip link set enp130s0f0 vf 0  mac 22:2b:62:bb:a9:82


2.start a source guest with NetXtreme BCM57810 VF which enables failover
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:82:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \


3.hotunplug the failover virtio device and netdev
The failover virtio device and netdev are hotplugged from vm successfully.
3.1
{"execute": "device_del", "arguments": {"id": "net0"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1594172986, "microseconds": 25092}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1594172986, "microseconds": 122707}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}
3.2
{"execute": "netdev_del", "arguments": {"id": "hostnet0"}}
output:
{"return": {}}


4.check the vm status
# ifconfig -a
enp4s0: flags=4098<BROADCAST,MULTICAST>  mtu 1500 
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 393  bytes 39898 (38.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 245  bytes 33650 (32.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfc800000-fc807fff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2  bytes 168 (168.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 168 (168.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
# dmesg
...
virtio_net virtio1 enp3s0: failover standby slave:enp3s0nsby unregistered
virtio_net virtio1 enp3s0: failover primary slave:enp4s0 unregistered
virtio_net virtio1 enp3s0: failover master:enp3s0 unregistered


5.hotunplug the failover VF
{"execute": "device_del", "arguments": {"id": "hostdev0"}}
output:
{"return": {}}


6.check the status of vm
The vm works well and qemu core dump does not occur.


7.reboot the vm
The vm still works well after rebooting the vm.

Comment 15 Yanghang Liu 2020-07-08 02:17:27 UTC
By the way , the problem about hotunpluging the failover vf can be tracked through "Bug 1819991 - Hostdev type interface with net failover enabled exists in domain xml and doesn't reattach to host after hot-unplug"

Comment 16 Yanghang Liu 2020-07-08 02:19:12 UTC
According to comment 9 and comment 14, move the bug status to VERIFIED

Comment 18 errata-xmlrpc 2020-07-28 07:12:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3172


Note You need to log in before you can comment on or make changes to this bug.