Bug 786610
Summary: | PCI device reset can cause a kernel bug | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Don Dutile (Red Hat) <ddutile> | ||||
Component: | kernel | Assignee: | Don Dutile (Red Hat) <ddutile> | ||||
Status: | CLOSED ERRATA | QA Contact: | Endre "Hrebicek" Balint-Nagy <endre> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | benl, kzhang, mjenner | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-238.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 08:21:15 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Test script to reproduce error, and verify (posted) fix: !/bin/bash for i in {1..1000}; do echo "[-- $i iteration -- $(date) -- ]" echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/reset echo "sleep 1 secs" sleep 1 done Patch(es) available on kernel-2.6.32-238.el6 Good job! The 220.el6 kernel hung before the second iteration of reproducer, the 238.el6 kernel survived 744+ iterations till now. After the 1000th iteration I'll set this BZ to VERIFIED state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html |
Created attachment 558926 [details] Test script used to reproduce error, and verify fix. Description of problem: pci_block_user_cfg_access was designed for the use case that a single context, the IPR driver, temporarily delays user space accesses to the config space via sysfs. This assumption became invalid by the time pci_dev_reset was added as locking instance. Today, if you run two loops in parallel that reset the same device via sysfs, you end up with a kernel BUG as pci_block_user_cfg_access detect the broken assumption. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. login as root (to be able to write to sysfs of a device 2. run two shell scripts, each resetting the same pci device, with a 1 second delay between each reset. Pick a device (extra nic card, for example) that host is not using. 3. Actual results: Host hangs within 5 seconds Expected results: The two threads can run indefinitely. Additional info: Backport of upstream commit fb51ccbf217c1c994607b6519c7d85250928553d resolves this problem. Note: straight cherry-pick/backport will break kabi since it renames pci_dev structure element, so must modify backport to maintain kabi.