Bug 1371930

Summary: i40e: "x-req" can not work normally when unbind PF form host or unload PF's driver
Product: Red Hat Enterprise Linux 7 Reporter: Yanan Fu <yfu>
Component: qemu-kvm-rhevAssignee: Eric Auger <eric.auger>
Status: CLOSED NOTABUG QA Contact: Yanan Fu <yfu>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: abologna, ailan, alex.williamson, antonio.lopezgracia, chayang, dhill, ebarrera, eric.auger, hhuang, inetkach, jasowang, jinzhao, jjung, jthomas, juzhang, knoel, laine, mschuppe, mst, pbonzini, slopezpa, tvvcox, virt-maint, yfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-26 07:45:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanan Fu 2016-08-31 12:49:03 UTC
Description of problem:
There exist two scenarios, but i think may be they are one issue.
1. Boot guest with vf assigned and with "x-req=off", then unbind PF form host or unload PF driver, this operation will be blocked and from dmesg, we can get "No device request channel registered, blocked until released by use", but execute "device_del" can not deleted the vf,and has no other output.

2. Boot guest with not only one vfs assigned(no "x-req" added,and "x-req=on" by default), then unbind PF form host or unload PF driver, this operation will be blocked, and there has no info to notice this block in dmesg. In qemu, only the first vf disappear. In guest, all the vfs still exist. device_del can not deleted the remaining vf,and has no other output.

If it is not, i will open one new bug later.

Version-Release number of selected component (if applicable):
qemu-kvm: qemu-kvm-rhev-2.6.0-22.el7.x86_64
kernel: 3.10.0-495.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
scenario 1:
1.Bring up vfs, and bind them to vfio-pci

2.Boot guest with one vf assigned
-device vfio-pci,host=04:02.0,id=vf-02.0,x-req=off

3.after guest boot up, unload the pf's driver in the host:
# modprobe -r i40e
This operation will be blocked, and  dmesg show:  
kernel: vfio-pci 0000:04:02.0: No device request channel registered, blocked until released by user

4.device_del the vf with qmp
{"execute":"device_del","arguments":{"id":"vf-02.0"}}
This operation return with no error output, but "info pci" still can get the vf.
In the guest, vf exist too.

5.quit the guest forcibly, then "modprobe -r i40e" will finished,vf disappear in host as execpt, and dmesg show as execpt:
kernel: iommu: Removing device 0000:04:02.0 from group 56


Actual results:
device_del can not work, and "modprobe -r i40e" will keep block.

Expected results:
After device_del, vf should disappear in both qemu and guest, "modprobe -r i40e" will finished normally, and all vfs disappear in host.


Additional info:
i40e(intel XL710)   +  win2012r2 guest ------> NG
i40e(intel XL710)   +  rhel 7.3 guest  ------> OK
ixgbe(intel 82599)  +  win2012r2 guest ------> OK

The result are same when replace "modprobe -r i40e" with following(unbind pf from host):
# echo 0000:04:00.0 > /sys/bus/pci/devices/0000\:04\:00.0/driver/unbind

# ethtool -i p6p1  (pf's interface in host)
driver: i40e
version: 1.5.10-k
firmware-version: 5.02 0x80002400 17.5.9
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

i40e driver download link:
https://downloadcenter.intel.com/download/23073/Intel-Network-Adapter-Driver-for-Windows-Server-2012-R2-?product=75021

CLI:
/usr/libexec/qemu-kvm \
    -name 'win2012.r2'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl \
    -global kvm-pit.lost_tick_policy=delay \
    -chardev socket,id=qmp_monitor,path=/var/tmp/qmpmonitor,server,nowait \
    -mon chardev=qmp_monitor,mode=control  \
    -device pvpanic,ioport=0x505,id=idkP1Yip  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
    -drive id=virtio-blk-drive,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/win2012r2-virtio-blk.qcow2 \
    -device virtio-blk-pci,bus=pci.0,drive=virtio-blk-drive,id=virtio-blk-disk,bootindex=0 \
    -m 2048  \
    -smp 4,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0 \
    -boot order=cdn,once=c,menu=on,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:4444,server,nowait  \
    -monitor unix:/home/socket,server,nowait  \
    -device vfio-pci,host=04:02.0,id=vf-02.0,x-req=off  \

Comment 1 Yanan Fu 2016-08-31 13:00:07 UTC
Steps to Reproduce for scenario 2:
1.Bring up vfs, and bind them to vfio-pci

2.Boot guest with not only one vf assigned, for example:
    -device vfio-pci,host=04:02.0,id=vf-02.0  \
    -device vfio-pci,host=04:02.1,id=vf-02.1  \

3.after guest boot up, unload the pf's driver in the host:
# modprobe -r i40e
This operation will be blocked, and  dmesg show:  
kernel: vfio-pci 0000:04:02.0: Relaying device request to user (#0)
kernel: iommu: Removing device 0000:04:02.0 from group 56
kernel: vfio-pci 0000:04:02.1: Relaying device request to user (#0)
kernel: vfio-pci 0000:04:02.1: Relaying device request to user (#10)
...
kernel: vfio-pci 0000:04:02.1: Relaying device request to user (#30)

The first vf "vf-02.0" disappear from "info pci", some times it disappear from guest too, but some times it keep in the guest.
The second vf "vf-02.1", exist in both "info pci" and guest.

4.device_del the second vf with qmp
{"execute":"device_del","arguments":{"id":"vf-02.1"}}
This operation return with no error output, but "info pci" still can get the vf.
In the guest, vf exist too.

5.quit the guest forcibly, then "modprobe -r i40e" will finished,vf disappear in host as execpt, and dmesg show as execpt:
kernel: iommu: Removing device 0000:04:02.1 from group 57

Comment 5 Eric Auger 2016-11-30 12:29:10 UTC
Hi Yanan,

I am not able to reproduce case 2). I launch the VM through the virt-manager with the 2 VFs assigned. When unprobing the I40e driver on host, the operation is not blocked as you mention. On the dmesg I can immediatly see:

[ 2422.411811] i40e 0000:03:00.1: i40e_ptp_stop: removed PHC on enp3s0f1
[ 2422.412030] vfio-pci 0000:03:0a.0: Relaying device request to user (#0)
[ 2422.615941] iommu: Removing device 0000:03:0a.0 from group 33
[ 2423.709047] i40e 0000:03:00.1: Deleted LAN device PF1 bus=0x00 func=0x01
[ 2423.711468] i40e 0000:03:00.0: i40e_ptp_stop: removed PHC on enp3s0f0
[ 2423.711683] vfio-pci 0000:03:02.0: Relaying device request to user (#0)
[ 2423.914928] iommu: Removing device 0000:03:02.0 from group 32
[ 2425.007045] i40e 0000:03:00.0: Deleted LAN device PF0 bus=0x00 func=0x00

In the virtual machine "Details" window, I see the 2 VFs disappear as expected. My guest is a win2012r2 as you mentioned + installed i40eVF drivers.

I am using 3.10.0-514.el7.x86_64 + qemu-kvm-rhev-2.6.0-27.el7

As for case 1 using x-req=false (non default), my understanding is it is never used by virt-tools. For my knowmledge, what is rationale behind using it? I will try to run it though...

Thanks

Eric

Comment 6 Eric Auger 2016-11-30 14:59:05 UTC
For use case 1), I don't reproduce either. At the moment I issue the device_del command in QEMU, the modprobe -r completes. See actions and traces below.

# modprobe -v vfio-pci
# modprobe -v i40e
insmod /lib/modules/3.10.0-514.el7.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
# ifconfig enp3s0f0 up
# ifconfig enp3s0f1 up
# echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/sriov_numvfs
# echo 1 > /sys/bus/pci/devices/0000\:03\:00.1/sriov_numvfs

# ./tools/dpdk-devbind.py --bind=vfio-pci 03:02.0
# ./tools/dpdk-devbind.py --bind=vfio-pci 03:0a.0

# /usr/libexec/qemu-kvm-rhev \
-name win2k12r2 \
-machine pc \
-sandbox off  \
-cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff \
-enable-kvm \
-m 2048 -realtime mlock=off \
-smp 8,sockets=8,cores=1,threads=1 \
-nodefaults \
-qmp unix:./qmp-sock,server,nowait \
-rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-drive file=/var/lib/libvirt/images/win2k12r2.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 \
-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-drive file=/var/lib/libvirt/images/en_windows_server_2012_r2_x64_dvd_2707946.iso,format=raw,if=none,id=drive-ide0-0-1,readonly=on \
-device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 \
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vga qxl \
-vnc :0 \
-device vfio-pci,host=03:02.0,id=vf0,x-req=off

# modprobe -r i40e
dmesg output:
[11400.162981] i40e 0000:03:00.1: i40e_ptp_stop: removed PHC on enp3s0f1
[11400.163311] iommu: Removing device 0000:03:0a.0 from group 33
[11401.261076] i40e 0000:03:00.1: Deleted LAN device PF1 bus=0x00 func=0x01
[11401.263594] i40e 0000:03:00.0: i40e_ptp_stop: removed PHC on enp3s0f0
[11401.263794] vfio-pci 0000:03:02.0: No device request channel registered, blocked until released by user

#qemu-kvm/scripts/qmp/qmp-shell  /home/augere/qmp-sock
QMP> device_del id=vf0

At that moment, modprobe -r completes.
dmesg contains
[11476.022236] iommu: Removing device 0000:03:02.0 from group 32
[11477.120047] i40e 0000:03:00.0: Deleted LAN device PF0 bus=0x00 func=0x00

Please can you check the bug still exists?

Thanks

Eric

Comment 11 Eric Auger 2016-12-02 15:29:43 UTC
Hi Alex, Yanan,

So I was eventually able to reproduce the issue on Yanan's machine and added some few traces in QEMU.

Looks the vfio_req_notifier_handler is properly called on both QEMU VFIO-PCI devices. for the first one, 0000:04:02.0 I can see the vfio_instance_finalize function being called. However for the second one it is never called.

In both cases it takes the following path:

- qdev unplug with hotplug_dev=PIIX4_PM
- acpi_pcihp_device_unplug_cb and acpi_send_gpe_event is sent. Eventually there are interrupts sent (SCI?, docs/specs/acpi_pci_hotplug.txt).

Can't it be a BIOS issue (note: the same use case 2 works fine for me with another win2012r2 image).

Thanks

Eric

Comment 12 Alex Williamson 2016-12-07 15:31:15 UTC
(In reply to Eric Auger from comment #11)
> Can't it be a BIOS issue (note: the same use case 2 works fine for me with
> another win2012r2 image).

A system BIOS issue?  How is the system BIOS involved?  Or do you mean a SeaBIOS issue?

Comment 13 Eric Auger 2016-12-07 17:26:29 UTC
I meant a SeaBios issue.

thanks

Eric

Comment 48 Eric Auger 2017-08-25 12:37:13 UTC
Hi Igor,

OK. Actually we have not been able yet to identify the source of the hang. On the last SOS report we got "missed kernel messages" at crucial debug moment. We should reproduce the manipulation requested in https://bugzilla.redhat.com/show_bug.cgi?id=1371930#c20, and make sure we get all kernel logs in the sos or /var/log/messages.

I think this is a matter of journald config.
1) save existing config file:
cp /etc/systemd/journald.conf /tmp/journald.conf.sauv 
2) edit /etc/systemd/journald.conf
Lines with RateLimitInterval and RateLimitBurst should be commented
If yes, add the following lines, otherwise change the value.
RateLimitInterval=0
RateLimitBurst=0
3) restart systemd-journald:
systemctl restart systemd-journald
4) reproduce manip detailed in comment #20:
on the node where it hangs, once the hang is observed, issue:
echo t >  /proc/sysrq-trigger
5) generate the SOS report. It shouldn't have any messages like
"Aug  2 11:21:38 compute-1 journal: Missed 130 kernel messages"
or invoke:
journalctl -k
6) send the SOS report or output of journalctl
7) restore the original journald.conf file
mv /tmp/journald.conf.sauv /etc/systemd/journald.conf
8) restart systemd-journald:
systemctl restart systemd-journald

The above change requires root access though. Obviously if you know an easier/cleaner manner to do that change, please correct. But getting a full log would be useful we think.

Thanks

Eric

Comment 80 Eric Auger 2017-09-26 07:45:31 UTC
The original issue cannot be reproduced anymore on original qemu/kernel combo (after mandated NIC driver update in the guest) and cannot be reproduced with RHEL7.4 either. So after discussion with Yanan, we suggest to close it.