Bug 1738821

Summary: Ping from/to guest failed when hot-pluging an e1000e nic
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Lei Yang <leiyang>
Component: qemu-kvmAssignee: Yvugenfi <yvugenfi>
qemu-kvm sub component: Devices QA Contact: Lei Yang <leiyang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: chayang, ddepaula, jasowang, jinzhao, juzhang, pezhang, virt-maint, ybendito, yvugenfi
Version: 8.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-12 22:37:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1744438    

Description Lei Yang 2019-08-08 08:39:36 UTC
Description of problem:
Hot plug an e1000e nic to rhel8 guest and dhclient it obtain ip address,it can ping from/to guest successful.Then hot unplug virtual nic and hot plug it again,ping from/to guest failed,but it can obtain a ip address.

Version-Release number of selected component (if applicable):
v4.1.0-rc3 and qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start guest and hot pulg an e1000e nic.

QEMU CLI:
usr/libexec/qemu-kvm -name rhel8 \
-M q35,kernel-irqchip=split -m 4G \
-cpu Haswell-noTSX \
-enable-kvm \
-nodefaults \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/rhel8.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,drive=my,id=virtio-blk0,bus=root.1 \
-qmp tcp:0:5555,server,nowait \
-vnc :2 \
-vga qxl \
-monitor stdio \
-boot menu=on \

# telnet 10.73.73.73 5555
{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "netdev_add", "arguments": {"type":"tap","id":"hostnet0","script":"/etc/qemu-ifup"}}
{"return": {}}
{"execute": "device_add", "arguments": { "driver":"e1000e","netdev":"hostnet0","mac":"00:1a:4a:42:0b:01","id": "net0","bus":"root.3"}}
{"return": {}}

2.Hot unplug vrtual nic and then hot plug it again.

{"execute":"device_del","arguments":{"id":"net0"}}
{"return": {}}
{"timestamp": {"seconds": 1565248844, "microseconds": 989565}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}
{"execute": "device_add", "arguments": { "driver":"e1000e","netdev":"hostnet0","mac":"00:1a:4a:42:0b:01","id": "net0","bus":"root.3"}}
{"return": {}}

3.dhclient this nic.

# dhclient
PING 10.73.75.254 (10.73.75.254) from 10.73.74.223 enp3s0: 56(84) bytes of data

--- 10.73.75.254 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 213ms

4.ping guest from external host.

# ping -c 30 10.73.74.223
PING 10.73.74.223 (10.73.74.223) 8192(8220) bytes of data.
^C
--- 10.73.74.223 ping statistics ---
30 packets transmitted, 0 received, 100% packet loss, time 319ms


Actual results:
ping from/to guest failed.

Expected results:
ping from/to guest successful.

Additional info:
1.qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.x86_64 hit this problem.

2.v4.1.0-rc3 , qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64 and v4.1.0-rc3 and qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64 have a common problem as follows:
 (1) No "NIC_RX_FILTER_CHANGED" event in qmp after plug e1000e nic.
 (2) NO "{"timestamp": {"seconds": 1565248555, "microseconds": 709764}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}" event in qmp after unplug e1000e device.
 I'm not sure if this is a new problem.if it is,please tell me know,I will reopen a new bz,Thanks a lot.

Comment 1 Lei Yang 2019-08-08 09:01:36 UTC
NO problem with the same test step virtio-net-pci nic,it can ping from/to guest successful.

Comment 3 Lei Yang 2019-08-09 07:07:36 UTC
The same test step on win2019 guest are works well,ping from/to guest are successfully.

Comment 4 Lei Yang 2019-12-03 03:11:00 UTC
Hi,Yan

I can reproduce this bug on rhel8.2 host + win2019_guest(q35 + seabios + virtio-net-pci).

host version:
qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64
kernel-4.18.0-158.el8.x86_64
virtio-win-prewhql-0.1-172.iso

Reproduce steps:
1.Boot a guest.
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine q35  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x1 \
-m 14336  \
-smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
-cpu 'EPYC',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt  \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \
-device scsi-cd,id=cd1,drive=drive_cd1 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,once=c,menu=off,strict=off \
-enable-kvm \
-qmp tcp:0:5555,server,nowait \
-device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-device pcie-root-port,id=pcie_extra_root_port_1,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
-device pcie-root-port,id=pcie_extra_root_port_4,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
-device pcie-root-port,id=pcie_extra_root_port_5,slot=8,chassis=8,addr=0x8,bus=pcie.0 \
-monitor stdio \

2.hotplug a virtio-net nic.
# telnet 10.73.196.43 5555
{"execute":"qmp_capabilities"}
{"return": {}}
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idJxmmIZ','vhost':'on'}}
{"return": {}}
{'execute': 'device_add', 'arguments': {'driver': 'virtio-net-pci', 'netdev': 'idJxmmIZ', 'mac': '9a:90:e8:73:1c:72', 'id': 'idguH3SC', 'bus': 'pcie_extra_root_port_0'}}
{"return": {}}
{"timestamp": {"seconds": 1575271737, "microseconds": 753018}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "idguH3SC", "path": "/machine/peripheral/idguH3SC/virtio-backend"}}

3.Hot unplug the nic (only one "DEVICE_DELETE" event return).
{'execute': 'device_del', 'arguments': {'id':'idguH3SC'}}
{"return": {}}
{"timestamp": {"seconds": 1575271884, "microseconds": 833278}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/idguH3SC/virtio-backend"}}
{'execute': 'netdev_del', 'arguments': {'id':'idJxmmIZ'}}
{"return": {}}

4.Hotplug the nic again failed.

{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idJxmmIZ','vhost':'on'}}
{"return": {}}
{'execute': 'device_add', 'arguments': {'driver': 'virtio-net-pci', 'netdev': 'idJxmmIZ', 'mac': '9a:90:e8:73:1c:72', 'id': 'idguH3SC', 'bus': 'pcie_extra_root_port_0'}}
{"error": {"class": "GenericError", "desc": "Duplicate ID 'idguH3SC' for device"}}

I'm not sure if this is a new issue.if it is,please tell me know,I will file a new bz.

Thanks & Regards
LeiYang

Comment 5 Yvugenfi@redhat.com 2019-12-12 11:15:41 UTC
Let's keep it as a single issue for now

Comment 6 ybendito 2019-12-26 05:17:23 UTC
The problem described in comment #4 for virtio-net-pci is not related to this BZ.
See https://bugzilla.redhat.com/show_bug.cgi?id=1708480 for virtio-net-pci

Comment 7 ybendito 2019-12-26 05:46:07 UTC
Please provide more information about the problem with e1000e:
1. With which delay between device_del and following device_add to reproduce the problem? Does it happen if the delay is significant (15 seconds)?
2. When the problem happened, what is the status of the nic in the guest? Does it have the IP address?
3. After the problem happens, does device_del actually removes the device? (info network)

Comment 8 Lei Yang 2019-12-26 07:53:01 UTC
(In reply to ybendito from comment #7)
> Please provide more information about the problem with e1000e:
> 1. With which delay between device_del and following device_add to reproduce
> the problem? Does it happen if the delay is significant (15 seconds)?
Hi

The delay between device_del and device_add above 15 seconds,the issue persists.

> 2. When the problem happened, what is the status of the nic in the guest?
> Does it have the IP address?

When the problem happened,the nic is state up in the guest,but it can not obtain ip address after dhclient.
==>inside guest
# ip -d link show eth0
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 00:1a:4a:42:0b:01 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9212 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 

> 3. After the problem happens, does device_del actually removes the device?
> (info network)

1.After device_del nic.
{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "netdev_add", "arguments": {"type":"tap","id":"hostnet0","script":"/etc/qemu-ifup"}}
{"return": {}}
{"execute": "device_add", "arguments": { "driver":"e1000e","netdev":"hostnet0","mac":"00:1a:4a:42:0b:01","id": "net0","bus":"root.3"}}
{"return": {}}
{"execute":"device_del","arguments":{"id":"net0"}}
{"return": {}}
{"timestamp": {"seconds": 1577345622, "microseconds": 418992}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}
==>inside guest 
#lspci |grep Eth (no return)
==>host hmp
(qemu) info network 
hostnet0: index=0,type=tap,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown

2.Again hotplug this nic
{"execute": "device_add", "arguments": { "driver":"e1000e","netdev":"hostnet0","mac":"00:1a:4a:42:0b:01","id": "net0","bus":"root.3"}}
{"return": {}}
==>inside guest
#lspci |grep Eth
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
==>host hmp
(qemu) info network
net0: index=0,type=nic,model=e1000e,macaddr=00:1a:4a:42:0b:01
 \ hostnet0: index=0,type=tap,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown

Comment 9 Ademar Reis 2020-02-05 23:02:20 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 10 Lei Yang 2020-05-28 08:47:27 UTC
Test e1000e device on qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64, hit same issue.

==>Test steps
1, boot rhel.8.3 guest
qemu cli:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35 \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 7168  \
-smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
-cpu 'Haswell-noTSX',+kvm_pv_unhalt \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel830-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \

2.Hot plug nic through qmp (No "NIC_RX_FILTER_CHANGED" event in qmp after plug e1000e nic).
# telnet 10.73.224.38 5555
Trying 10.73.224.38...
Connected to 10.73.224.38.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 5}, "package": "qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idW9F3zA'}}
{"return": {}}
{'execute': 'device_add', 'arguments': {'driver': 'e1000e', 'netdev': 'idW9F3zA', 'mac': '9a:af:77:01:18:a2', 'id': 'idjmgn2T', 'bus': 'pcie_extra_root_port_0', 'addr': '0x0'}}
{"return": {}}

3.Hot unplug e1000e device, Only one "DEVICE_SELETE" event return.
{'execute': 'device_del', 'arguments': {'id':'idjmgn2T'}}
{"return": {}}
{"timestamp": {"seconds": 1590654515, "microseconds": 560138}, "event": "DEVICE_DELETED", "data": {"device": "idjmgn2T", "path": "/machine/peripheral/idjmgn2T"}}

4.Hot plug the device again,cao not obtain ipaddress.
{'execute': 'device_add', 'arguments': {'driver': 'e1000e', 'netdev': 'idW9F3zA', 'mac': '9a:af:77:01:18:a2', 'id': 'idjmgn2T', 'bus': 'pcie_extra_root_port_0', 'addr': '0x0'}}
{"return": {}}

Comment 12 ybendito 2020-11-19 12:39:25 UTC
Posted to qemu-devel

https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg04494.html

Comment 15 Lei Yang 2020-12-25 06:53:49 UTC
==Verified with qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64
1.Boot guest
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35 \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 6144  \
-smp 6,maxcpus=6,cores=3,threads=1,dies=1,sockets=2  \
-cpu 'Haswell-noTSX',+kvm_pv_unhalt \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \

2.Hot plug nic through qmp
# telnet 10.73.224.38 5555
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 5}, "package": "qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a"}, "capabilities": ["oob"]}}
{"execute":"qmp_capabilities"}
{"return": {}}
{'execute': 'netdev_add', 'arguments': {'type': 'tap', 'id': 'idW9F3zA'}}
{"return": {}}
{'execute': 'device_add', 'arguments': {'driver': 'e1000e', 'netdev': 'idW9F3zA', 'mac': '9a:af:77:01:18:a2', 'id': 'idjmgn2T', 'bus': 'pcie_extra_root_port_0', 'addr': '0x0'}}
{"return": {}}

3.Hot unplug e1000e device
{'execute': 'device_del', 'arguments': {'id':'idjmgn2T'}}
{"return": {}}
{"timestamp": {"seconds": 1608878587, "microseconds": 832085}, "event": "DEVICE_DELETED", "data": {"device": "idjmgn2T", "path": "/machine/peripheral/idjmgn2T"}}

4.Hot plug the device again,guest get ip address, cna ping external host.
{'execute': 'device_add', 'arguments': {'driver': 'e1000e', 'netdev': 'idW9F3zA', 'mac': '9a:af:77:01:18:a2', 'id': 'idjmgn2T', 'bus': 'pcie_extra_root_port_0', 'addr': '0x0'}}
{"return": {}}

5.ping external host from guest
# ping 10.73.224.38 -c 4
PING 10.73.224.38 (10.73.224.38) 56(84) bytes of data.
64 bytes from 10.73.224.38: icmp_seq=1 ttl=64 time=0.499 ms
64 bytes from 10.73.224.38: icmp_seq=2 ttl=64 time=0.229 ms
64 bytes from 10.73.224.38: icmp_seq=3 ttl=64 time=0.168 ms
64 bytes from 10.73.224.38: icmp_seq=4 ttl=64 time=0.229 ms

--- 10.73.224.38 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3062ms
rtt min/avg/max/mdev = 0.168/0.281/0.499/0.128 ms

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 20 Lei Yang 2021-01-14 08:15:20 UTC
Hi Danilo

Because the current deadline for the bug was past, I reset ITM, please help to update into the errata。

Best regards
Lei Yang

Comment 21 Danilo de Paula 2021-01-19 19:13:53 UTC
Adding TestOnly as this didn't require any code change downstream.
Moving to ON_QA so QE can verify it.
Granting devel+
Needs QA_ACK

Comment 22 Chao Yang 2021-01-20 02:06:48 UTC
(In reply to Danilo Cesar Lemes de Paula from comment #21)
> Adding TestOnly as this didn't require any code change downstream.
> Moving to ON_QA so QE can verify it.
> Granting devel+
> Needs QA_ACK

Hi Danilo,

I'd like to remove "TestOnly" as actually there is code delivery in qemu(see comment 12, 13, 14). And QE confirmed the fix is in qemu-5.2 after rebase.