Bug 698879
Summary: | The pci resource for vf is not released after hot-removing Intel 82576 NIC | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Wu <dwu> | ||||||
Component: | kernel | Assignee: | Don Dutile (Red Hat) <ddutile> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 5.6 | CC: | chrisw, dhoward, jarod, juzhang, moshiro, mstowe, prarit, qcai, tburke | ||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||
Target Release: | 5.8 | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: |
Hot removing a PCIe device and, consequently, hot plugging it again caused kernel panic. This was due to a PCI resource for the SR-IOV Virtual Function (vf) not being released after the hot removing, causing the memory area in the pci_dev struct to be used by another process. With this update, when a PCIe device is removed from a system, all resources are properly released; kernel panic no longer occurs.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-07-21 10:05:40 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 684637, 707606, 707899 | ||||||||
Attachments: |
|
Comment 11
Don Dutile (Red Hat)
2011-05-06 22:04:13 UTC
The patch in c#11 won't work b/c sriov_disable() is invoked by the driver before pci_remove_bus_device() is called, which will always make dev->sriov == NULL, and thus, the patch in c#11 will never invoke release_resources(). Thus, the release has to occur at sriov disable time, like this patch: diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index c182696..4e525b5 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -396,11 +396,20 @@ failed: static void sriov_release(struct pci_dev *dev) { + int i; + BUG_ON(dev->sriov->nr_virtfn); if (dev != dev->sriov->dev) pci_dev_put(dev->sriov->dev); + for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { + struct resource *res = dev->sriov->res + i; + if (!res->parent) + continue; + release_resource(res); + } + mutex_destroy(&dev->sriov->lock); kfree(dev->sriov); Testing with fakephp (remove-only) shows the leak doesn't occur. Also solves another problem where a cat /proc/iomem would eventually crash after hot-unplug w/o this patch since the sriov-struct-contained resource structure, once freed, would get used by kernel eventually, and corrupt iomem-list traversal. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 498551 [details]
Screenshot
Hello, the customer in the case 00456823 requests a z-stream errata for the bug. => Nominating for 5.6.z inclusion. Verified on kernel-2.6.18-261.el5 using comment23's steps,I tried 5 times. after step4,host still works well #cat /proc/iomem | grep igb cf338000-cf33bfff : igb cf33c000-cf33ffff : igb cf340000-cf35ffff : igb cf3a0000-cf3bffff : igb cf400000-cf7fffff : igb cf800000-cfbfffff : igb Patch(es) available in kernel-2.6.18-261.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Hot removing a PCIe device and, consequently, hot plugging it again caused kernel panic. This was due to a PCI resource for the SR-IOV Virtual Function (vf) not being released after the hot removing, causing the memory area in the pci_dev struct to be used by another process. With this update, when a PCIe device is removed from a system, all resources are properly released; kernel panic no longer occurs. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |