Bug 1053469
| Summary: | Detach-device will lose the driver of 82579LM network card. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jincheng Miao <jmiao> | ||||||
| Component: | kernel | Assignee: | David Arcari <darcari> | ||||||
| kernel sub component: | NIC Drivers | QA Contact: | zenghui.shi <zshi> | ||||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||||
| Severity: | medium | ||||||||
| Priority: | medium | CC: | alex.williamson, chayang, dyuan, gsun, jdenemar, jfeeney, jkc, juzhang, mzhan, network-qe, virt-maint | ||||||
| Version: | 7.0 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 7.3 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-01-08 13:01:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Alex, can the describe behavior be a result of using allow_unsafe_interrupts? Or do you have another idea on what could cause this? (In reply to Jiri Denemark from comment #2) > Alex, can the describe behavior be a result of using > allow_unsafe_interrupts? Or do you have another idea on what could cause > this? No, allow_unsafe_interrupts is just an opt-in to allow vfio to work when the IOMMU doesn't provide interrupt remapping support. I'd say it sounds more like bug 868098 where we found that this device fails to reset occasionally. Unfortunately that bug was closed as un-reproducible, I suspect the problem still appears occasionally. Please try the tests that are described here: https://bugzilla.redhat.com/show_bug.cgi?id=868098#c4 and here: https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27 If you can reproduce it with the 2nd test, then it should be reassigned to kernel, but if it can only be reproduced with the 1st test (and not the 2nd) then I guess it should be reassigned to qemu-kvm (is that right Alex?) Hi Laine, I test it on qemu-kvm for vfio driver, it can be reproduced with 1st test. Following https://bugzilla.redhat.com/show_bug.cgi?id=868098#c4 , After 'device_del mydevice' command, # echo 0000:00:19.0 > /sys/bus/pci/drivers_probe kernel: e1000e: probe of 0000:00:19.0 failed with error -2 So without libvirt, passthrough operation will make 82579LM can't get its driver back. And for pci-stub driver, qemu-kvm could not passthrough "Device initialization failed" . I'm reassigning this to qemu-kvm/alex, but mainly because Jincheng verified that the first test in Comment 4 fails (it uses qemu), but did not say that the 2nd test in Comment 4 fails (no use of qemu). So either test (1) is incorrect, or the failure is not dependent on libvirt (and possibly not on qemu, but that isn't yet certain). My expectation is that Alex will end up reassigning to kernel, but this seemed like a safer course of action. BTW, since the PCI device ID is hardcoded in test 1, I want to verify - the ethernet adapter you are testing does have device ID 8086:1502, correct? You can learn this with the following command: virsh nodedev-dumpxml pci_0000_00_19_0 Look at the <product> and <vendor> elements. Still need to know if this is reproducible with the stand along script: https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27 (In reply to Laine Stump from comment #6) > BTW, since the PCI device ID is hardcoded in test 1, I want to verify - the > ethernet adapter you are testing does have device ID 8086:1502, correct? You > can learn this with the following command: > > virsh nodedev-dumpxml pci_0000_00_19_0 > > Look at the <product> and <vendor> elements. There are <product> and <vendor> elements in xml: # virsh nodedev-dumpxml pci_0000_00_19_0 <device> <name>pci_0000_00_19_0</name> <path>/sys/devices/pci0000:00/0000:00:19.0</path> <parent>computer</parent> <driver> <name>e1000e</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x1502'>82579LM Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> <iommuGroup number='4'> <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/> </iommuGroup> </capability> </device> (In reply to Alex Williamson from comment #7) > Still need to know if this is reproducible with the stand along script: > > https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27 I modified your script a little: --- attachment.txt 2014-01-20 09:30:10.442930947 +0800 +++ attachment2 [details].txt 2014-01-20 09:32:42.604930943 +0800 @@ -11,11 +11,11 @@ echo $DEV > "/sys/bus/pci/devices/$DEV/driver/unbind" echo $DEV > /sys/bus/pci/drivers/pci-stub/bind echo "$VID $DID" > /sys/bus/pci/drivers/pci-stub/remove_id - echo 1 > "/sys/bus/pci/devices/$DEV/enable" + echo 1 > "/sys/bus/pci/devices/$DEV/enabled" } bind_to_e1000e() { - echo 0 > "/sys/bus/pci/devices/$DEV/enable" + echo 0 > "/sys/bus/pci/devices/$DEV/enabled" echo $DEV > "/sys/bus/pci/devices/$DEV/driver/unbind" echo $DEV > /sys/bus/pci/drivers_probe } And it can pass 200+ times(shutdown manually). It seems that the pci-stub driver works well with pci-stub driver. After that, I change 'pci-stub' to 'vfio-pci' in the script. And it still can pass 100+ times. (In reply to Alex Williamson from comment #7) > Still need to know if this is reproducible with the stand along script: > > https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27 I change the reset interval to 10ms, the bug is reproduced, failed at second reset, both pci-stub and vfio-pci. Based on Comment 9, this is still reproducible without virt/vfio, re-assigning to kernel. This device does not reset reliably and needs some sort of device specific reset. But 868098 should probably have never been closed. Jincheng, please upload your modified script for reproducing to this bz. Thanks. (In reply to Alex Williamson from comment #10) > Based on Comment 9, this is still reproducible without virt/vfio, > re-assigning to kernel. This device does not reset reliably and needs some > sort of device specific reset. But 868098 should probably have never been > closed. > > Jincheng, please upload your modified script for reproducing to this bz. > Thanks. OK, I will upload two test scripts, 'test_e1000e_kvm.sh' is for pci-stub, and the other 'test_e1000e_vfio.sh' is for vfio-pci. The reset interval is changed to 10ms in order to reproduce this bug. Created attachment 853564 [details]
test kvm's pci-stub attach/detach
Created attachment 853565 [details]
test vfio
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. The comment above is incorrect. The correct version is bellow. I'm sorry for any inconvenience. --------------------------------------------------------------- This request was NOT resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you need to escalate this bug. This has been open for a while and does not seem to be getting anywhere. Can it be put out of its misery by being closed? Thank you for your consideration in this matter. I would say this probably does need to be closed. I haven't been the maintainer of e1000e for some time so I'm going to reassign to the current maintainer for review. |
Description of problem: Detach-device will lose the driver of 82579LM network card. Some errors are reported from e1000e driver, but not sure whether it is a e1000e or firmware/hardware bug. Version-Release number of selected component (if applicable): libvirt-1.1.1-18.el7.x86_64 kernel-3.10.0-67.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. prepare vfio driver # modprobe vfio_pci # echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts 2. # virsh start r7 3. # cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/> </source> </hostdev> 4. try to attach this device # virsh attach-device r7 hostdev.xml 5. once attach success, detach it # virsh detach-device r7 hostdev.xml 6. the driver is gone # virsh nodedev-dumpxml pci_0000_00_19_0 | grep -1 driver # ll /sys/devices/pci0000:00/0000:00:19.0 | grep driver and a error in /var/log/message: # cat /var/log/message ... systemd-machined: Machine qemu-r7 terminated. kernel: e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode kernel: e1000e: probe of 0000:00:19.0 failed with error -3 ... 7. this device can't get e1000e driver until next os boot # reboot Actual results: driver gone Expected results: driver is stored to e1000e