Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1053469

Summary: Detach-device will lose the driver of 82579LM network card.
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: kernelAssignee: David Arcari <darcari>
kernel sub component: NIC Drivers QA Contact: zenghui.shi <zshi>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: alex.williamson, chayang, dyuan, gsun, jdenemar, jfeeney, jkc, juzhang, mzhan, network-qe, virt-maint
Version: 7.0   
Target Milestone: rc   
Target Release: 7.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-08 13:01:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test kvm's pci-stub attach/detach
none
test vfio none

Description Jincheng Miao 2014-01-15 10:14:09 UTC
Description of problem:
Detach-device will lose the driver of 82579LM network card.
Some errors are reported from e1000e driver, but not sure
whether it is a e1000e or firmware/hardware bug.

Version-Release number of selected component (if applicable):
libvirt-1.1.1-18.el7.x86_64
kernel-3.10.0-67.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare vfio driver
# modprobe vfio_pci

# echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

2. 
# virsh start r7

3.
# cat hostdev.xml
<hostdev mode='subsystem' type='pci' managed='yes'>
    <driver name='vfio'/>
    <source>
        <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
    </source>
</hostdev>

4. try to attach this device
# virsh attach-device r7 hostdev.xml

5. once attach success, detach it 
# virsh detach-device r7 hostdev.xml

6. the driver is gone
# virsh nodedev-dumpxml pci_0000_00_19_0 | grep -1 driver

# ll /sys/devices/pci0000:00/0000:00:19.0 | grep driver

and a error in /var/log/message:
# cat /var/log/message
...
systemd-machined: Machine qemu-r7 terminated.
kernel: e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
kernel: e1000e: probe of 0000:00:19.0 failed with error -3
...

7. this device can't get e1000e driver until next os boot
# reboot

Actual results:
driver gone

Expected results:
driver is stored to e1000e

Comment 2 Jiri Denemark 2014-01-15 13:38:07 UTC
Alex, can the describe behavior be a result of using allow_unsafe_interrupts? Or do you have another idea on what could cause this?

Comment 3 Alex Williamson 2014-01-15 16:18:16 UTC
(In reply to Jiri Denemark from comment #2)
> Alex, can the describe behavior be a result of using
> allow_unsafe_interrupts? Or do you have another idea on what could cause
> this?

No, allow_unsafe_interrupts is just an opt-in to allow vfio to work when the IOMMU doesn't provide interrupt remapping support.  I'd say it sounds more like bug 868098 where we found that this device fails to reset occasionally.  Unfortunately that bug was closed as un-reproducible, I suspect the problem still appears occasionally.

Comment 4 Laine Stump 2014-01-16 14:36:25 UTC
Please try the tests that are described here:

  https://bugzilla.redhat.com/show_bug.cgi?id=868098#c4

and here:

  https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27


If you can reproduce it with the 2nd test, then it should be reassigned to kernel, but if it can only be reproduced with the 1st test (and not the 2nd) then I guess it should be reassigned to qemu-kvm (is that right Alex?)

Comment 5 Jincheng Miao 2014-01-17 03:49:40 UTC
Hi Laine,

I test it on qemu-kvm for vfio driver, it can be reproduced with 1st test.

Following https://bugzilla.redhat.com/show_bug.cgi?id=868098#c4 ,
After 'device_del mydevice' command, 
# echo 0000:00:19.0 > /sys/bus/pci/drivers_probe
kernel: e1000e: probe of 0000:00:19.0 failed with error -2

So without libvirt, passthrough operation will make 82579LM can't get its
driver back.

And for pci-stub driver, qemu-kvm could not passthrough "Device initialization failed" .

Comment 6 Laine Stump 2014-01-17 08:48:43 UTC
I'm reassigning this to qemu-kvm/alex, but mainly because Jincheng verified that the first test in Comment 4 fails (it uses qemu), but did not say that the 2nd test in Comment 4 fails (no use of qemu). So either test (1) is incorrect, or the failure is not dependent on libvirt (and possibly not on qemu, but that isn't yet certain). My expectation is that Alex will end up reassigning to kernel, but this seemed like a safer course of action.

BTW, since the PCI device ID is hardcoded in test 1, I want to verify - the ethernet adapter you are testing does have device ID 8086:1502, correct? You can learn this with the following command:

   virsh nodedev-dumpxml pci_0000_00_19_0

Look at the <product> and <vendor> elements.

Comment 7 Alex Williamson 2014-01-17 16:30:00 UTC
Still need to know if this is reproducible with the stand along script:

https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27

Comment 8 Jincheng Miao 2014-01-20 02:10:01 UTC
(In reply to Laine Stump from comment #6)
> BTW, since the PCI device ID is hardcoded in test 1, I want to verify - the
> ethernet adapter you are testing does have device ID 8086:1502, correct? You
> can learn this with the following command:
> 
>    virsh nodedev-dumpxml pci_0000_00_19_0
> 
> Look at the <product> and <vendor> elements.

There are <product> and <vendor> elements in xml:
# virsh nodedev-dumpxml pci_0000_00_19_0
<device>
  <name>pci_0000_00_19_0</name>
  <path>/sys/devices/pci0000:00/0000:00:19.0</path>
  <parent>computer</parent>
  <driver>
    <name>e1000e</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>0</bus>
    <slot>25</slot>
    <function>0</function>
    <product id='0x1502'>82579LM Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <iommuGroup number='4'>
      <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
    </iommuGroup>
  </capability>
</device>


(In reply to Alex Williamson from comment #7)
> Still need to know if this is reproducible with the stand along script:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27

I modified your script a little:
--- attachment.txt	2014-01-20 09:30:10.442930947 +0800
+++ attachment2 [details].txt	2014-01-20 09:32:42.604930943 +0800
@@ -11,11 +11,11 @@
 	echo $DEV > "/sys/bus/pci/devices/$DEV/driver/unbind"
 	echo $DEV > /sys/bus/pci/drivers/pci-stub/bind
 	echo "$VID $DID" > /sys/bus/pci/drivers/pci-stub/remove_id
-	echo 1 > "/sys/bus/pci/devices/$DEV/enable"
+	echo 1 > "/sys/bus/pci/devices/$DEV/enabled"
 }
 
 bind_to_e1000e() {
-	echo 0 > "/sys/bus/pci/devices/$DEV/enable"
+	echo 0 > "/sys/bus/pci/devices/$DEV/enabled"
 	echo $DEV > "/sys/bus/pci/devices/$DEV/driver/unbind"
 	echo $DEV > /sys/bus/pci/drivers_probe
 }

And it can pass 200+ times(shutdown manually).
It seems that the pci-stub driver works well with pci-stub driver.

After that, I change 'pci-stub' to 'vfio-pci' in the script.
And it still can pass 100+ times.

Comment 9 Jincheng Miao 2014-01-20 03:31:20 UTC
(In reply to Alex Williamson from comment #7)
> Still need to know if this is reproducible with the stand along script:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=868098#c27

I change the reset interval to 10ms, the bug is reproduced, failed at second reset, both pci-stub and vfio-pci.

Comment 10 Alex Williamson 2014-01-21 20:03:52 UTC
Based on Comment 9, this is still reproducible without virt/vfio, re-assigning to kernel.  This device does not reset reliably and needs some sort of device specific reset.  But 868098 should probably have never been closed.

Jincheng, please upload your modified script for reproducing to this bz.  Thanks.

Comment 11 Jincheng Miao 2014-01-22 02:25:19 UTC
(In reply to Alex Williamson from comment #10)
> Based on Comment 9, this is still reproducible without virt/vfio,
> re-assigning to kernel.  This device does not reset reliably and needs some
> sort of device specific reset.  But 868098 should probably have never been
> closed.
> 
> Jincheng, please upload your modified script for reproducing to this bz. 
> Thanks.

OK, I will upload two test scripts, 'test_e1000e_kvm.sh' is for pci-stub, and the other 'test_e1000e_vfio.sh' is for vfio-pci. The reset interval is changed to 10ms in order to reproduce this bug.

Comment 12 Jincheng Miao 2014-01-22 02:27:07 UTC
Created attachment 853564 [details]
test kvm's pci-stub attach/detach

Comment 13 Jincheng Miao 2014-01-22 02:28:02 UTC
Created attachment 853565 [details]
test vfio

Comment 14 Ludek Smid 2014-06-26 10:51:37 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 15 Ludek Smid 2014-06-26 11:16:18 UTC
The comment above is incorrect. The correct version is bellow.
I'm sorry for any inconvenience.
---------------------------------------------------------------

This request was NOT resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you need
to escalate this bug.

Comment 17 John Feeney 2019-01-07 23:30:51 UTC
This has been open for a while and does not seem to be getting anywhere. Can it
be put out of its misery by being closed?

Thank you for your consideration in this matter.

Comment 18 Ken Cox 2019-01-08 13:01:51 UTC
I would say this probably does need to be closed.  I haven't been the maintainer of e1000e for some time so I'm going to reassign to the current maintainer for review.