Bug 1362729
Summary: | [RFE] log hot unplug requests | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | jingzhao <jinzhao> | |
Component: | qemu-kvm-rhev | Assignee: | Alex Williamson <alex.williamson> | |
Status: | CLOSED ERRATA | QA Contact: | jingzhao <jinzhao> | |
Severity: | low | Docs Contact: | ||
Priority: | low | |||
Version: | 7.3 | CC: | alex.williamson, chayang, huding, jen, juzhang, knoel, marcel, mrezanin, virt-maint, yfu, yiwei | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1447196 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-01 23:32:13 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
jingzhao
2016-08-03 02:33:55 UTC
[root@hp-z800-01 rhel6.8]# lspci -vvv -s 03:00.0 03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter Physical Slot: 1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 35 Region 0: Memory at e4800000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at e4000000 (32-bit, non-prefetchable) [size=4M] Region 2: I/O ports at c000 [size=32] Region 3: Memory at e4840000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=10 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-1b-21-ff-ff-42-33-84 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+ IOVSta: Migration- Initial VFs: 8, Total VFs: 8, Number of VFs: 4, Function Dependency Link: 00 VF offset: 128, stride: 2, Device ID: 10ca Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000e4848000 (64-bit, non-prefetchable) Region 3: Memory at 00000000e4868000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel modules: igb dmesg on the host will report that it's waiting for the device every 10s, but QEMU does not report anything out if an eject is attempted on a non-hotplug slot. This would be the same as a non-cooperative guest. libvirt won't automatically create such a configuration. Not 7.3 material, deferring. This is really an RFE, where I believe the request is for QEMU to log hot unplug requests. The only thing particularly special about vfio in this case is the eventfd we have for receiving device requests from the kernel. Really the same problem could be identified for any sort of hotplug from libvirt to QEMU, some logging of whether QEMU actually received an unplug request. Then we need to ask why hotplug is special, should QEMU log every event request it receives from libvirt or elsewhere? ex. suspend, resume, migrate, device add, ballooning, etc. Patch posted upstream to report whether qdev_unplug() encounters and error: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04945.html This should resolve the problem encountered in comment 0 where the vfio-pci device is attached to a non-hotplug bus, QEMU will report an error in this case. The more general request in the title, "log hot unplug requests" is, I believe, too verbose. The request from the kernel to the user is already logged in dmesg. QEMU logging every request it receives feels like debugging level output that should maybe occur via tracing. I don't intend to tackle that issue here and will confine this bug to reporting detectable configuration issues which prevent an unplug event from reaching the guest. It will not warn if the guest is simply uncooperative and ignores the request. 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 2. Tested it on qemu-kvm-rhev-2.9.0-1.el7.x86_64 and following are the detailed info Pre: created vf in host 1) Boot guest with qemu command line[1] 2) unbind pf in the host [root@dell-per730-28 ~]# echo 0000:04:00.0 >/sys/bus/pci/devices/0000\:04\:00.0/driver/unbind Test Result: a. qemu prompted warning info (qemu) qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support hotplugging qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support hotplugging qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support hotplugging qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support hotplugging qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support hotplugging b. [root@dell-per730-28 ~]# lspci |grep Ether pcilib: Cannot open /sys/bus/pci/devices/0000:04:02.0/config lspci: Unable to read the standard configuration space header of device 0000:04:02.0 ......... c. the host dmesg info: [ 1199.395817] i40e 0000:04:00.0: i40e_ptp_stop: removed PHC on p6p1 [ 1199.407215] vfio-pci 0000:04:02.0: Relaying device request to user (#0) [ 1219.837856] vfio-pci 0000:04:02.0: Device is currently in use, task "bash" (2681) blocked until device is released [ 1289.850640] vfio-pci 0000:04:02.0: Relaying device request to user (#10) [ 1389.859927] vfio-pci 0000:04:02.0: Relaying device request to user (#20) [ 1489.869183] vfio-pci 0000:04:02.0: Relaying device request to user (#30) [ 1589.878449] vfio-pci 0000:04:02.0: Relaying device request to user (#40) [ 1689.887815] vfio-pci 0000:04:02.0: Relaying device request to user (#50) d. check the vf info in qmp: (qemu) info pci Bus 0, device 5, function 0: Ethernet controller: PCI device 8086:154c BAR0: 64 bit prefetchable memory at 0xfea00000 [0xfea0ffff]. BAR3: 64 bit prefetchable memory at 0xfea20000 [0xfea23fff]. id "vf-10.2" Bus 0, device 6, function 0: Ethernet controller: PCI device 8086:154c BAR0: 64 bit prefetchable memory at 0xfea10000 [0xfea1ffff]. BAR3: 64 bit prefetchable memory at 0xfea24000 [0xfea27fff]. id "vf-10.3" e. hello, alex just want to confirm with you about the behavior, is this the expected behavior? especially test result "b". Could you help to confirm it? Thanks Jing (In reply to jingzhao from comment #8) > 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 > > 2. Tested it on qemu-kvm-rhev-2.9.0-1.el7.x86_64 and following are the > detailed info > > Pre: created vf in host > 1) Boot guest with qemu command line[1] > > 2) unbind pf in the host > [root@dell-per730-28 ~]# echo 0000:04:00.0 > >/sys/bus/pci/devices/0000\:04\:00.0/driver/unbind > > Test Result: > a. qemu prompted warning info > (qemu) qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > hotplugging > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > hotplugging > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > hotplugging > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > hotplugging > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > hotplugging > > b. [root@dell-per730-28 ~]# lspci |grep Ether > pcilib: Cannot open /sys/bus/pci/devices/0000:04:02.0/config > lspci: Unable to read the standard configuration space header of device > 0000:04:02.0 > ......... > > c. the host dmesg info: > [ 1199.395817] i40e 0000:04:00.0: i40e_ptp_stop: removed PHC on p6p1 > [ 1199.407215] vfio-pci 0000:04:02.0: Relaying device request to user (#0) > [ 1219.837856] vfio-pci 0000:04:02.0: Device is currently in use, task > "bash" (2681) blocked until device is released > [ 1289.850640] vfio-pci 0000:04:02.0: Relaying device request to user (#10) > [ 1389.859927] vfio-pci 0000:04:02.0: Relaying device request to user (#20) > [ 1489.869183] vfio-pci 0000:04:02.0: Relaying device request to user (#30) > [ 1589.878449] vfio-pci 0000:04:02.0: Relaying device request to user (#40) > [ 1689.887815] vfio-pci 0000:04:02.0: Relaying device request to user (#50) > > d. check the vf info in qmp: > (qemu) info pci > Bus 0, device 5, function 0: > Ethernet controller: PCI device 8086:154c > BAR0: 64 bit prefetchable memory at 0xfea00000 [0xfea0ffff]. > BAR3: 64 bit prefetchable memory at 0xfea20000 [0xfea23fff]. > id "vf-10.2" > Bus 0, device 6, function 0: > Ethernet controller: PCI device 8086:154c > BAR0: 64 bit prefetchable memory at 0xfea10000 [0xfea1ffff]. > BAR3: 64 bit prefetchable memory at 0xfea24000 [0xfea27fff]. > id "vf-10.3" > > e. > > hello, alex > > just want to confirm with you about the behavior, is this the expected > behavior? > > especially test result "b". > > Could you help to confirm it? Test "b" is a bit undesirable, but the device is in the process of being removed so it's not entirely unexpected. This bug and the update included in QEMU 2.9 to address it certainly does not change the behavior of anything other than test "a". (In reply to Alex Williamson from comment #10) > (In reply to jingzhao from comment #8) > > 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 > > > > 2. Tested it on qemu-kvm-rhev-2.9.0-1.el7.x86_64 and following are the > > detailed info > > > > Pre: created vf in host > > 1) Boot guest with qemu command line[1] > > > > 2) unbind pf in the host > > [root@dell-per730-28 ~]# echo 0000:04:00.0 > > >/sys/bus/pci/devices/0000\:04\:00.0/driver/unbind > > > > Test Result: > > a. qemu prompted warning info > > (qemu) qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > > hotplugging > > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > > hotplugging > > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > > hotplugging > > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > > hotplugging > > qemu-kvm: vfio warning: 0000:04:02.0: Bus 'pcie.0' does not support > > hotplugging > > > > b. [root@dell-per730-28 ~]# lspci |grep Ether > > pcilib: Cannot open /sys/bus/pci/devices/0000:04:02.0/config > > lspci: Unable to read the standard configuration space header of device > > 0000:04:02.0 > > ......... > > > > c. the host dmesg info: > > [ 1199.395817] i40e 0000:04:00.0: i40e_ptp_stop: removed PHC on p6p1 > > [ 1199.407215] vfio-pci 0000:04:02.0: Relaying device request to user (#0) > > [ 1219.837856] vfio-pci 0000:04:02.0: Device is currently in use, task > > "bash" (2681) blocked until device is released > > [ 1289.850640] vfio-pci 0000:04:02.0: Relaying device request to user (#10) > > [ 1389.859927] vfio-pci 0000:04:02.0: Relaying device request to user (#20) > > [ 1489.869183] vfio-pci 0000:04:02.0: Relaying device request to user (#30) > > [ 1589.878449] vfio-pci 0000:04:02.0: Relaying device request to user (#40) > > [ 1689.887815] vfio-pci 0000:04:02.0: Relaying device request to user (#50) > > > > d. check the vf info in qmp: > > (qemu) info pci > > Bus 0, device 5, function 0: > > Ethernet controller: PCI device 8086:154c > > BAR0: 64 bit prefetchable memory at 0xfea00000 [0xfea0ffff]. > > BAR3: 64 bit prefetchable memory at 0xfea20000 [0xfea23fff]. > > id "vf-10.2" > > Bus 0, device 6, function 0: > > Ethernet controller: PCI device 8086:154c > > BAR0: 64 bit prefetchable memory at 0xfea10000 [0xfea1ffff]. > > BAR3: 64 bit prefetchable memory at 0xfea24000 [0xfea27fff]. > > id "vf-10.3" > > > > e. > > > > hello, alex > > > > just want to confirm with you about the behavior, is this the expected > > behavior? > > > > especially test result "b". > > > > Could you help to confirm it? > > Test "b" is a bit undesirable, but the device is in the process of being > removed so it's not entirely unexpected. This bug and the update included > in QEMU 2.9 to address it certainly does not change the behavior of anything > other than test "a". Hi Alex Thanks your confirm Tried the same test steps when VF connected to the pcie-root-port, and didn't hit the test result "b" So QE think we should close the bz and open a new for tracking the undesirable test result "b" Are you agree? Thanks Jing Changed the bz to verified and opened a new bz for tracking test result "b" of comment 8 (bz 1447196) Thanks Jing (In reply to jingzhao from comment #11) > Tried the same test steps when VF connected to the pcie-root-port, and > didn't hit the test result "b" Likely because the guest responded to the hotplug and released the device without getting caught in the state where test "b" can trigger. > So QE think we should close the bz and open a new for tracking the > undesirable test result "b" > Are you agree? Yep, as has already been done. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |