Bug 717132 - Kernel OOPS, eventual subsequent PANIC when unbinding ehci_hcd device
Summary: Kernel OOPS, eventual subsequent PANIC when unbinding ehci_hcd device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-28 05:35 UTC by Zach C
Modified: 2011-09-26 19:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-26 19:12:03 UTC
Type: ---


Attachments (Terms of Use)
Crash snapshot (1.38 MB, image/jpeg)
2011-06-30 08:02 UTC, Zach C
no flags Details
Crash snapshot 2 (1.44 MB, image/jpeg)
2011-06-30 08:03 UTC, Zach C
no flags Details

Description Zach C 2011-06-28 05:35:38 UTC
Description of problem:

Following the instructions for the workaround at bug 694191 -- that is, adding a script to /etc/pm/sleep.d/20_custom-ehci_hcd to echo device IDs to "unbind" in order to detach from those USB hubs -- I discovered that, before the suspend happens, the kernel will OOPS and then PANIC soon after. 

Version-Release number of selected component (if applicable):

2.6.38.8-32.fc15.x86_64


How reproducible:

Always

Steps to Reproduce:
1. echo -n '0000:00:1a.0' > /sys/bus/pci/drivers/ehci_hcd/unbind
2. Wait a few seconds, maybe a couple of minutes -- also triggered by clicking in my status bar somewhere
3. Watch OOPS and panic messages come up
  
Actual results:

Kernel throws OOPSes and either the machine will simply stop responding or the kernel will panic.

Expected results:

The USB hub safely detaches. 

Additional info:

Machine is an ASUS G73Jh-A1 with Core i7 720 QM, dying battery, 8 GB of RAM, Mobility Radeon HD 5870, BIOS 211, VBIOS updated to fix a known (but unrelated) issue, 2 500 GB hard drives, and an Atheros AR9285 chipset. 

==================
lspci -nn:

00:00.0 Host bridge [0600]: Intel Corporation Core Processor DMI [8086:d132] (rev 11)
00:03.0 PCI bridge [0604]: Intel Corporation Core Processor PCI Express Root Port 1 [8086:d138] (rev 11)
00:08.0 System peripheral [0880]: Intel Corporation Core Processor System Management Registers [8086:d155] (rev 11)
00:08.1 System peripheral [0880]: Intel Corporation Core Processor Semaphore and Scratchpad Registers [8086:d156] (rev 11)
00:08.2 System peripheral [0880]: Intel Corporation Core Processor System Control and Status Registers [8086:d157] (rev 11)
00:08.3 System peripheral [0880]: Intel Corporation Core Processor Miscellaneous Registers [8086:d158] (rev 11)
00:10.0 System peripheral [0880]: Intel Corporation Core Processor QPI Link [8086:d150] (rev 11)
00:10.1 System peripheral [0880]: Intel Corporation Core Processor QPI Routing and Protocol Registers [8086:d151] (rev 11)
00:16.0 Communication controller [0780]: Intel Corporation 5 Series/3400 Series Chipset HECI Controller [8086:3b64] (rev 06)
00:1a.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 06)
00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b56] (rev 06)
00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 06)
00:1c.1 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 [8086:3b44] (rev 06)
00:1c.5 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 [8086:3b4c] (rev 06)
00:1d.0 USB Controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 06)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev a6)
00:1f.0 ISA bridge [0601]: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller [8086:3b09] (rev 06)
00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA AHCI Controller [8086:3b29] (rev 06)
00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 06)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Broadway XT [Mobility Radeon HD 5800 Series] [1002:68a0]
01:00.1 Audio device [0403]: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] [1002:aa58]
03:00.0 Network controller [0280]: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) [168c:002b] (rev 01)
04:00.0 Ethernet controller [0200]: Atheros Communications AR8131 Gigabit Ethernet [1969:1063] (rev c0)
ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-Core Registers [8086:2c52] (rev 04)
ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2c81] (rev 04)
ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2c90] (rev 04)
ff:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2c91] (rev 04)
ff:03.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller [8086:2c98] (rev 04)
ff:03.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Target Address Decoder [8086:2c99] (rev 04)
ff:03.4 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Test Registers [8086:2c9c] (rev 04)
ff:04.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Control Registers [8086:2ca0] (rev 04)
ff:04.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Address Registers [8086:2ca1] (rev 04)
ff:04.2 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Rank Registers [8086:2ca2] (rev 04)
ff:04.3 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Thermal Control Registers [8086:2ca3] (rev 04)
ff:05.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Control Registers [8086:2ca8] (rev 04)
ff:05.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Address Registers [8086:2ca9] (rev 04)
ff:05.2 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Rank Registers [8086:2caa] (rev 04)
ff:05.3 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Thermal Control Registers [8086:2cab] (rev 04)
==================

==================
lsmod:

Module                  Size  Used by
vfat                    8720  0 
fat                    44848  1 vfat
usb_storage            45615  0 
uas                     7783  0 
fuse                   62289  3 
8021q                  18739  0 
garp                    6087  1 8021q
stp                     1951  1 garp
llc                     4716  2 garp,stp
cpufreq_ondemand        9466  8 
acpi_cpufreq            7001  1 
freq_table              3963  2 cpufreq_ondemand,acpi_cpufreq
mperf                   1505  1 acpi_cpufreq
ip6t_REJECT             4048  2 
nf_conntrack_ipv6       7978  1 
nf_defrag_ipv6          9531  1 nf_conntrack_ipv6
ip6table_filter         1695  1 
ip6_tables             16850  1 ip6table_filter
xfs                   694097  1 
exportfs                3486  1 xfs
snd_hda_codec_hdmi     22998  1 
snd_hda_codec_realtek   325262  1 
snd_hda_intel          23660  4 
snd_hda_codec          80838  3 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel
snd_hwdep               6368  1 snd_hda_codec
arc4                    1457  2 
snd_seq                52438  0 
ath9k                  91484  0 
mac80211              234498  1 ath9k
uvcvideo               54609  0 
snd_seq_device          6001  1 snd_seq
snd_pcm                78484  4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec
ath9k_common            2633  1 ath9k
i7core_edac            16084  0 
iTCO_wdt               11480  0 
videodev               63342  1 uvcvideo
snd_timer              19593  2 snd_seq,snd_pcm
ath9k_hw              272585  2 ath9k,ath9k_common
asus_laptop            15827  0 
ath                    14564  2 ath9k,ath9k_hw
edac_core              40712  3 i7core_edac
i2c_i801                9213  0 
joydev                  9651  0 
serio_raw               4426  0 
snd                    62670  16 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_seq,snd_seq_device,snd_pcm,snd_timer
soundcore               6299  1 snd
btusb                  14740  0 
iTCO_vendor_support     2634  1 iTCO_wdt
v4l2_compat_ioctl32     6697  1 videodev
bluetooth              91191  1 btusb
sparse_keymap           3302  1 asus_laptop
cfg80211              135802  3 ath9k,mac80211,ath
snd_page_alloc          7431  2 snd_hda_intel,snd_pcm
rfkill                 16552  4 asus_laptop,bluetooth,cfg80211
microcode              18117  0 
atl1c                  30962  0 
ipv6                  282108  35 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6
video                  12432  0 
radeon                690835  3 
ttm                    55120  1 radeon
drm_kms_helper         27515  1 radeon
drm                   187984  5 radeon,ttm,drm_kms_helper
i2c_algo_bit            5014  1 radeon
i2c_core               25468  6 videodev,i2c_i801,radeon,drm_kms_helper,drm,i2c_algo_bit
==================

==================
ls -alh /sys/bus/pci/drivers/ehci_hcd:

drwxr-xr-x.  2 root root    0 Jun 27 20:34 .
drwxr-xr-x. 22 root root    0 Jun 27 20:34 ..
lrwxrwxrwx.  1 root root    0 Jun 27 22:24 0000:00:1a.0 -> ../../../../devices/pci0000:00/0000:00:1a.0
lrwxrwxrwx.  1 root root    0 Jun 27 22:24 0000:00:1d.0 -> ../../../../devices/pci0000:00/0000:00:1d.0
--w-------.  1 root root 4.0K Jun 27 22:24 bind
lrwxrwxrwx.  1 root root    0 Jun 27 22:24 module -> ../../../../module/ehci_hcd
--w-------.  1 root root 4.0K Jun 27 22:24 new_id
--w-------.  1 root root 4.0K Jun 27 22:24 remove_id
--w-------.  1 root root 4.0K Jun 27 20:34 uevent
--w-------.  1 root root 4.0K Jun 27 22:24 unbind
==================

Comment 1 Chuck Ebbert 2011-06-29 16:47:40 UTC
(In reply to comment #1)
> Description of problem:
> 
> Following the instructions for the workaround at bug 694191 -- that is, adding
> a script to /etc/pm/sleep.d/20_custom-ehci_hcd to echo device IDs to "unbind"
> in order to detach from those USB hubs -- I discovered that, before the suspend
> happens, the kernel will OOPS and then PANIC soon after. 
> 

You need to post the actual messages you get, otherwise there's not much anyone can do about them.

Comment 2 Zach C 2011-06-29 17:44:16 UTC
Is there an easy way to do that besides transcribing them twice?

Comment 3 Chuck Ebbert 2011-06-30 07:26:50 UTC
(In reply to comment #2)
> Is there an easy way to do that besides transcribing them twice?

Take a picture of the screen with a digital camera and attach that.

Comment 4 Zach C 2011-06-30 07:41:20 UTC
I did manage to transcribe this kernel panic after one occurrence: 

Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G       D     2.6.38-8-32.fc15.x86_64 #1
Call Trace:
<IRQ> [<ffffffff8146c6e6>] panic+0x91/0x19c
[<ffffffff81476cc6>] oops_end+0xb4/0xc5
[<ffffffff8100d454>] die+0x5a/0x66
[<ffffffff814765c8>] do_trap+0x121/0x130
[<ffffffff8100aeaa>] do_invalid_op+0x94/0x9d
[<ffffffff81257bbd>] ? alloc_iova+0x184/0x1dc
[<ffffffff8106acc7>] ? queue_work_on+0x37/0x45
[<ffffffff8106ad0e>] ? ieee80211_queue_work+0x2e/0x35 [mac80211]
[<ffffffff8100a85b>] invalid_op+0x1b/0x20
[<ffffffff81257bbd>] ? alloc_iova+0x184/0x1dc
[<ffffffff814759c4>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[<ffffffff810615b0>] ? __mod_timer+0x138/0x14a
[<ffffffff8125ab98>] intel_alloc_iova+0x86/0xbc
[<ffffffff8125af94>] __intel_map_single+0x9b/0x171
[<ffffffff8125b06a>] ? intel_map_page+0x0/0x43
[<ffffffff8125b0ab>] intel_map_page+0x41/0x43
[<ffffffffa04dd341>] dma_map_single_attrs.constprop.7+0x65/0x80 [ath9k]
[<ffffffffa04de432>] ath_rx_tasklet+0x8fa/0x12f6 [ath9k]
[<ffffffff810615b0>] ? __mod_timer+0x138/0x14a
[<ffffffff814759c4>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[<ffffffffa04dc24f>] ath9k_tasklet+0xa3/0x11b [ath9k]
[<ffffffff8105a849>] tasklet_action+0x7f/0xd2
[<ffffffff8105ae4c>] __do_softirq+0xd2/0x19d
[<ffffffff810226b9>] ? ack_APIC_irq+0x15/0x17
[<ffffffff8100fc99>] ? paravirt_read_tsc+0x9/0xd
[<ffffffff8100aadc>] call_softirq+0x1c/0x30
[<ffffffff8100c101>] do_softirq+0x46/0x81
[<ffffffff8105afd0>] irq_exit+0x49/0x8b
[<ffffffff8147c006>] do_IRQ+0x8e/0xa5
[<ffffffff81475f13>] ret_from_intr+0x0/0x15
<EOI>  [<ffffffff8100fc99>] ? paravirt_read_tsc+0x9/0xd
[<ffffffff81274567>] ? intel_idle+0xdb/0x100
[<ffffffff81274546>] ? intel_idle+0xba/0x100
[<ffffffff81398d54>] cpuidle_idle_call+0xe7/0x166
[<ffffffff81008321>] cpu_idle+0xa5/0xdf
[<ffffffff81454cde>] rest_init+0x72/0x74
[<ffffffff81b58c2f>] start_kernel+0x3f2/0x3fe
[<ffffffff81b582c4>] x86_64_start_reservations+0xaf/0xb3
[<ffffffff81b58140>] ? early_idt_handler+0x0/0x71
[<ffffffff81b583cf>] x86_64_start_kernel+0x107/0x116
panic occurred, switching back to text console

----

I would have gotten an example of an OOPS, but those moved way too fast for me to type up.

Comment 5 Zach C 2011-06-30 08:02:45 UTC
Created attachment 510603 [details]
Crash snapshot

One picture taken of the screen during a panic/oops

Comment 6 Zach C 2011-06-30 08:03:42 UTC
Created attachment 510605 [details]
Crash snapshot 2

Another crash snapshot taken (sorry, camera and cameraman aren't so much near the quality they should be for this ;) )

Comment 7 Zach C 2011-06-30 08:14:55 UTC
These all only ever occur after unbinding ehci_hcd from the PCI devices it's bound to, even though they all look like they fail in different places. This failure also occurs on a kernel I compiled myself (2.6.39-ck2). I've also tried recompiling with ehci_hcd as a module and simply doing an rmmod on it; that panics the kernel far more quickly! 

For comparison purposes (if it helps at all), the previous version of Linux Mint worked, and the last time I checked, Ubuntu 11.04 worked as well. No kernel I have for F15 has worked, thus far. I have tried with KMS disabled (by both the nomodeset and radeon.modeset=0 boot args), pcie_aspm=force, these same things with the aforementioned 2.6.39-ck2, and also applied the fix at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=drivers/pci/iova.c;h=c5c274ab5c5a034abe91fb1f1f5dcf6380c9315e;hp=9606e599a47552f9119425c077f62a0c807d3b9f;hb=b0af8dfdd67699e25083478c63eedef2e72ebd85;hpb=25985edcedea6396277003854657b5f3cb31a628 (manually, but double-checked) to that same kernel and still no luck. (Why the last one? Some of the messages mentioned alloc_iova being at fault...)

Comment 8 Zach C 2011-07-08 08:11:21 UTC
Vanilla kernel 3.0.0-rc6 fixes this issue for me and suspend again works as expected.

Comment 9 Josh Boyer 2011-09-26 19:12:03 UTC
(In reply to comment #8)
> Vanilla kernel 3.0.0-rc6 fixes this issue for me and suspend again works as
> expected.

F15 is currently based on the final 3.0 release (2.6.40 is 3.0 renamed).  I'm going to close this bug out given the fix should be included there.  If this is still a problem, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.