Bug 1762906

Summary: Intermittent error 'Hotplug nic can not get IP address' when executing test case nic_hotplug.vhost_nic.nic_virtio
Product: Red Hat Enterprise Linux 8 Reporter: Karen Mezick <kmezick>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
qemu-kvm sub component: Networking QA Contact: Lei Yang <leiyang>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: aadam, ailan, chayang, ehabkost, juzhang, ldoktor, leiyang, ngu, pezhang, qzhang, rbalakri, virt-maint, wainersm, wyu, yihyu
Version: 8.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-26 09:57:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log file showing nic_virtio test case error 'Hotplug nic can not get ip address'
none
Log file showing nic_hotplug test case passing (attached for comparison purposes)
none
Log file showing nic_virtio test case error with Avocado 71.0 none

Description Karen Mezick 2019-10-17 18:56:26 UTC
Created attachment 1626925 [details]
Log file showing nic_virtio test case error 'Hotplug nic can not get ip address'

Description of problem:

An intermittent error occurs on rhel81-rhel ppc64le when executing the test case type_specific_io_github_autotest_qemu_nic_hotplug_vhost_nic_nic_virtio
on rhel81-rhel-ppc64le:

    --> "Hotplug nic can not get ip address"


Version-Release number of selected component (if applicable):

  Arch: ppc64le
  Avocado 69.0
  Red Hat Enterprise Linux release 8.1 Beta (Ootpa)
  qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.ppc64le
  kernel-4.18.0-141.el8.ppc64le


How reproducible: Fails 1 in 4 test runs


Steps to Reproduce:
1. Install latest Red Hat Enterprise Linux release 8.1
2. Install qemu-kvm-2.12.0-88
3. To install guest, execute the command:

/root/venv/bin/avocado --show=test run unattended_install.url.extra_cdrom_ks.default_install.aio_native --vt-type qemu --vt-guest-os Linux.RHEL.7.devel.ppc64le --vt-machine-type pseries --vt-extra-params url=http://download.devel.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.1.0-Snapshot-3.0/compose/BaseOS/ppc64le/os/ unattended_file=unattended/RHEL-8-tmp.ks kernel_params=ks=cdrom nic delay=60 console=hvc0 console=tty0 inst.repo=http://download.devel.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.1.0-Snapshot-3.0/compose/BaseOS/ppc64le/os/ inst.addrepo=appstream,http://download.devel.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.1.0-Snapshot-3.0/compose/AppStream/ppc64le/os/

4. To run test, execute the command:

avocado run --vt-type qemu --vt-guest-os Linux.RHEL.7.devel.ppc64le --vt-machine-type pseries type_specific.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio --vt-extra-params nettype=bridge skip_cluster_leak_warn=yes


Actual results:

2019-10-16 16:47:35,634 nic_hotplug      L0244 INFO | Re-enabling the primary link(s)
2019-10-16 16:47:35,635 nic_hotplug      L0247 INFO | Reboot vm to verify it alive after hotunplug nic(s)
2019-10-16 16:47:35,635 error_context    L0097 INFO | Context: rebooting 'avocado-vt-vm1'
2019-10-16 16:47:35,635 virt_vm          L1153 DEBUG| Attempting to log into 'avocado-vt-vm1' (timeout 240s)
2019-10-16 16:51:35,931 virt_vm          L1338 DEBUG| Attempting to log into 'avocado-vt-vm1' via serial console (timeout 240s)
2019-10-16 16:51:46,944 client           L1126 DEBUG| Sending command: echo %OS%
2019-10-16 16:51:47,045 client           L1162 DEBUG| Sending command (safe): ifconfig -a
2019-10-16 16:51:47,150 client           L1162 DEBUG| Sending command (safe): ip link | grep -B1 '' -i
2019-10-16 16:51:47,254 client           L1126 DEBUG| Sending command: test -d /sys/class/net/enp0s0
2019-10-16 16:51:47,355 client           L1126 DEBUG| Sending command: echo $?
2019-10-16 16:51:47,456 client           L1126 DEBUG| Sending command: cat /sys/class/net/enp0s0/address
2019-10-16 16:51:47,559 client           L1162 DEBUG| Sending command (safe): ifconfig enp0s0 || ip address show enp0s0
2019-10-16 16:51:47,662 utils_net        L3448 WARNI| No VM's NIC got IP address
2019-10-16 16:51:58,713 client           L1126 DEBUG| Sending command: ipconfig || ifconfig
2019-10-16 16:51:59,215 client           L1126 DEBUG| Sending command: ip route || route print
2019-10-16 16:51:59,318 virt_vm          L1143 ERROR| Guest network status:
 bash: ipconfig: command not found...
enp0s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 9a:b5:b6:b7:b8:b9  txqueuelen 1000  (Ethernet)
        RX packets 393  bytes 20728 (20.2 KiB)
        RX errors 0  dropped 54  overruns 0  frame 0
        TX packets 1  bytes 325 (325.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 8  bytes 480 (480.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 480 (480.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Guest route table:

2019-10-16 16:51:59,321 stacktrace       L0041 ERROR|
2019-10-16 16:51:59,321 stacktrace       L0044 ERROR| Reproduced traceback from: /root/venv/lib/python3.6/site-packages/avocado_vt/test.py:438
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR| Traceback (most recent call last):
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR|   File "/var/lib/avocado/data/avocado-vt/test-providers.d/downloads/io-github-autotest-qemu/qemu/tests/nic_hotplug.py", line 188, in run
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR|     test.fail("Hotplug nic can not get ip address")
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR|   File "/root/venv/lib/python3.6/site-packages/avocado/core/test.py", line 1030, in fail
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR|     raise exceptions.TestFail(message)
2019-10-16 16:51:59,325 stacktrace       L0047 ERROR| avocado.core.exceptions.TestFail: Hotplug nic can not get ip address



Expected results: IP address should be found and test case execution should proceed and not fail. A log file demonstrating this nic_hotplug test case passing is attached.


Additional info: Let me know if you would like me to set up a reserved machine with the necessary s/w installed to reproduce the problem.

Comment 1 Karen Mezick 2019-10-17 18:58:05 UTC
Created attachment 1626926 [details]
Log file showing nic_hotplug test case passing (attached for comparison purposes)

Comment 2 Yihuang Yu 2019-10-18 03:27:51 UTC
(In reply to Karen Mezick from comment #0)
> enp0s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         ether 9a:b5:b6:b7:b8:b9  txqueuelen 1000  (Ethernet)
>         RX packets 393  bytes 20728 (20.2 KiB)
>         RX errors 0  dropped 54  overruns 0  frame 0
>         TX packets 1  bytes 325 (325.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
Hello Karen,
Sometimes we may encounter mac address conflicts due to the way the MAC address generation is implemented (avocado-vt).
For more details, you can see this issue: https://github.com/avocado-framework/avocado-vt/issues/2121
Fortunately, it has been fixed at avocado-vt 71.
Can you update your test environment and retest it? If it still fails, please let me know.

Comment 3 Karen Mezick 2019-10-18 17:54:16 UTC
Created attachment 1627242 [details]
Log file showing nic_virtio test case error with Avocado 71.0

Comment 4 Karen Mezick 2019-10-18 17:58:17 UTC
Hi Yihuang

Thank you for your quick response.
I tried running this with Avocado 71.0 and got the same error.
Please see attached log.

# avocado -v
Avocado 71.0

ppc64le
Red Hat Enterprise Linux release 8.1 Beta (Ootpa)
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.ppc64le
kernel-4.18.0-141.el8.ppc64le

Thanks,
Karen

Comment 5 Karen Mezick 2019-10-18 18:01:09 UTC
One clarification: In this latest test with Avocado 71.0, the error is slightly different:

avocado.core.exceptions.TestFail: Hotplug nic can't get ip after reboot vm

I have also seen this error previously when running with Avocado 69.0.

Comment 7 Yihuang Yu 2019-10-21 08:38:36 UTC
This is some result of the log analysis:

1. Hot plug a nic device with "9a:2e:0b:20:95:0c"
2019-10-18 13:13:12,692 qemu_monitor     L1920 DEBUG| Send command: {'execute': 'device_add', 'arguments': {'driver': 'virtio-net-pci', 'netdev': 'id1hXOy1', 'mac': '9a:2e:0b:20:95:0c', 'id': 'hotplug_nic1', 'vectors': 4}, 'id': 'kxorZOVp'}

2. Guest get ip address "192.168.122.76"
2019-10-18 13:13:20,747 nic_hotplug      L0189 INFO | Got the ip address of new nic: 192.168.122.76

3. Avocado-vt will drop the address cache during reboot
2019-10-18 13:15:45,686 ip_sniffing      L0068 DEBUG| Dropped the address cache of HWADDR (9a:2e:0b:20:95:0c)

4. When guest can not get ip address, we can see from arp table, it has cleaned up the address cache but does not refresh
2019-10-18 13:22:58,924 process          L0437 DEBUG| [stdout] 10.0.1.254               ether   e8:b6:c2:10:dc:81   C                     enP68p1s0f2
2019-10-18 13:22:58,924 process          L0437 DEBUG| [stdout] 192.168.122.186          ether   9a:16:f1:d7:a6:ea   C                     virbr0
2019-10-18 13:22:58,924 process          L0437 DEBUG| [stdout] 192.168.122.76                   (incomplete)                              virbr0    <--------------------------

Actually, the latest code remove this step, I don't know why, will cc the author(@wyu) here.
Please not that this not only happens on ppc, I can also reproduce it on x86 after add this step in the code.
 (1/5) repeat1.Host_RHEL.m8.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.1.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: PASS (200.10 s)
 (2/5) repeat2.Host_RHEL.m8.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.1.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: FAIL: Hotplug nic can't get ip after reboot vm (718.15 s)
 (3/5) repeat3.Host_RHEL.m8.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.1.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: FAIL: Hotplug nic can not get ip address (610.05 s)
 (4/5) repeat4.Host_RHEL.m8.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.1.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: PASS (199.41 s)
 (5/5) repeat5.Host_RHEL.m8.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.1.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: FAIL: Hotplug nic can not get ip address (580.89 s)

It's hard to say whether it's an issue with automation or kvm(network?). Need some time for further analysis.

Comment 8 Yihuang Yu 2019-10-28 06:57:09 UTC
RHEL7 cannot reproduce this issue, so it may be a regression.

Pei and Lei, I am CC'ing you because this is not a ppc-specific problem.
You can modify the automatic script to reproduce this problem in RHEL8 host  as follows:
diff --git a/qemu/tests/nic_hotplug.py b/qemu/tests/nic_hotplug.py
index 00b3609..5156f79 100644
--- a/qemu/tests/nic_hotplug.py
+++ b/qemu/tests/nic_hotplug.py
@@ -221,8 +221,11 @@ def run(test, params, env):
                 err_msg = "New nic failed ping test, error info: '%s'"
                 test.fail(err_msg % output)
 
+            logging.info("Reboot vm after hotplug nic")
             # reboot vm via serial port since some guest can't auto up
             # hotplug nic and next step will check is hotplug nic works.
+            s_session = vm.reboot(session=s_session, serial=True)      
+            vm.verify_alive()
 
             hotnic_ip = get_hotplug_nic_ip(
                 vm,

After that, repeat this case some times, and you will see this case reports "FAIL: Hotplug nic can not get ip address" or "FAIL: Hotplug nic can't get ip after reboot vm"
I can see that ip-sniffer.log has different points, hopeful you can help to debug it.

Thanks,
Yihuang

Comment 9 Gu Nini 2019-10-29 06:36:43 UTC
Lei, could you help trying to reproduce the bug on x86 according to comment 8?

Comment 10 Lei Yang 2019-10-29 06:48:37 UTC
(In reply to Gu Nini from comment #9)
> Lei, could you help trying to reproduce the bug on x86 according to comment
> 8?

Hi,Nini

I have reproduced this bug on x86_64 server with automation method.But i can not reproduce this bug from manually.

Test result:
(21/24) Host_RHEL.m8.u2.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.0.x86_64.io-github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: FAIL: Hotplug nic can not get ip address 

Best regards,
LeiYang

Comment 11 Gu Nini 2019-10-29 07:02:35 UTC
(In reply to Lei Yang from comment #10)
> (In reply to Gu Nini from comment #9)
> > Lei, could you help trying to reproduce the bug on x86 according to comment
> > 8?
> 
> Hi,Nini
> 
> I have reproduced this bug on x86_64 server with automation method.But i can
> not reproduce this bug from manually.
> 
> Test result:
> (21/24)
> Host_RHEL.m8.u2.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.0.x86_64.io-
> github-autotest-qemu.nic_hotplug.vhost_nic.nic_virtio.q35: FAIL: Hotplug nic
> can not get ip address 
> 
> Best regards,
> LeiYang

Thanks Lei. Move the bug to be shared arch.

Comment 17 Ademar Reis 2020-02-05 23:07:23 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 23 Lei Yang 2020-12-23 01:57:32 UTC
Hit same issue:
Test Version:
qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a.x86_64
kernel-4.18.0-262.el8.dt3.x86_64