Bug 954252
Summary: | Kernel soft lockup during reboot on Fedora 19 pre-alpha | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | IBM Bug Proxy <bugproxy> | ||||||||||||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||
Version: | 19 | CC: | gansalmon, gustavold, itamar, jkachuck, jonathan, kernel-maint, madhu.chinakonda, wgomerin | ||||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||
Hardware: | ppc64 | ||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
Fixed In Version: | kernel-3.9.5-301.fc19 | Doc Type: | Bug Fix | ||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2013-06-14 04:50:25 UTC | Type: | --- | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||
Bug Blocks: | 920770 | ||||||||||||||||||||||||
Attachments: |
|
Description
IBM Bug Proxy
2013-04-22 05:01:03 UTC
Created attachment 738429 [details]
reboot log
Created attachment 738430 [details]
dmesg (after lpar restart)
Created attachment 738431 [details]
/var/log/messages
------- Comment From hbabu.com 2013-04-23 07:20 EDT------- Rebooting. [12658.386814] irq 18: nobody cared (try booting with the "irqpoll" option) [12658.386824] Call Trace: [12658.386831] [c0000007affbfb90] [c000000000015bf0] .show_stack+0x130/0x200 (unreliable) [12658.386839] [c0000007affbfc60] [c0000000001671e8] .__report_bad_irq+0x58/0x150 [12658.386844] [c0000007affbfd00] [c0000000001678c8] .note_interrupt+0x218/0x300 [12658.386849] [c0000007affbfdb0] [c000000000163f24] .handle_irq_event_percpu+0x184/0x310 [12658.386854] [c0000007affbfe90] [c000000000164110] .handle_irq_event+0x60/0xb0 [12658.386859] [c0000007affbff10] [c000000000168e74] .handle_fasteoi_irq+0xd4/0x1f0 [12658.386865] [c0000007affbff90] [c000000000023a84] .call_handle_irq+0x1c/0x2c [12658.386870] [c0000000012d3770] [c000000000010f44] .do_IRQ+0x244/0x2c0 [12658.386875] [c0000000012d3820] [c000000000002364] hardware_interrupt_common+0x164/0x180 [12658.386882] --- Exception: 501 at .plpar_hcall_norets+0x84/0xd4 [12658.386882] LR = .check_and_cede_processor+0x2c/0x40 [12658.386888] [c0000000012d3b10] [c000000000082218] .check_and_cede_processor+0x18/0x40 (unreliable) [12658.386894] [c0000000012d3b80] [c0000000000822c8] .dedicated_cede_loop+0x88/0x150 [12658.386901] [c0000000012d3c40] [c00000000069a30c] .cpuidle_enter+0x2c/0x40 [12658.386906] [c0000000012d3cb0] [c00000000069ad28] .cpuidle_idle_call+0xf8/0x310 [12658.386911] [c0000000012d3d60] [c000000000072358] .pSeries_idle+0x18/0x40 [12658.386916] [c0000000012d3dd0] [c000000000017b58] .cpu_idle+0x168/0x2b0 [12658.386921] [c0000000012d3e80] [c00000000000bbd8] .rest_init+0x98/0xb0 [12658.386926] [c0000000012d3ef0] [c000000000b6470c] .start_kernel+0x4b4/0x4d0 [12658.386931] [c0000000012d3f90] [c000000000009d20] .start_here_common+0x20/0x80 [12658.386935] handlers: [12658.386939] [<c0000000012b2ca8>] .usb_hcd_irq [12658.386943] Disabling IRQ #18 We are seeing dropping IRQs for usb HCD and other cpus are getting soft-lockup. https://lkml.org/lkml/2013/2/9/167 shows the similar issue. Brian, Can someone in your team look at this issue? ------- Comment From thadeul.com 2013-04-23 13:32 EDT------- This seems to be a bug in tg3, that is triggered by the same set of patches that affected USB. In order to find out which one is really causing the issue, can you remove either one and retest, and do it once again with the other card removed instead? Thanks. Cascardo. ------- Comment From gusld.com 2013-04-23 19:42 EDT------- I removed the tg3 card from the profile using HMC but the reboot issue remains. I also tried removing the USB device and the reboot issue still remains. ------- Comment From gusld.com 2013-05-07 13:30 EDT------- I'm still experiencing this issue with latest kernel available on Fedora 19 (kernel-3.9.0-301.fc19). I tried both ppc64 and ppc64p7 flavors and both have this issue. Created attachment 744743 [details]
nosmp
------- Comment on attachment From gusld.com 2013-05-07 13:49 EDT-------
Attached the results passing nosmp to the boot command line, as requested by Thadeu.
Created attachment 744748 [details]
maxcpus=1
------- Comment on attachment From gusld.com 2013-05-07 13:51 EDT-------
Attaching the results passing maxcpus=1 to the boot command line, as requested by Thadeu.
------- Comment From thadeul.com 2013-05-07 14:02 EDT------- I guess this will have no effect at all, but can you try booting with irqpoll? Regards. Thadeu Cascardo. Created attachment 744868 [details]
irqpoll
------- Comment on attachment From gusld.com 2013-05-07 18:56 EDT-------
Attaching the results passing irqpoll to the boot command line, as requested by Thadeu.
------- Comment From thadeul.com 2013-05-08 19:10 EDT------- Hi, Gustavo. Can you try mainline and, if it fails, do a bisect? Regards. Cascardo. ------- Comment From gusld.com 2013-05-13 18:23 EDT------- I did a git bisect and there seems to be two different bugs. The irq messages seem to be unrelated to the soft-lockup issue. Old kernel versions (3.4.0) print the irq messages but don't trigger the soft-lockup. My git bisect pointed to the following as the offending commit (the one that introduced the soft-lockup issue): http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=56d6aa33d3f68471466cb183d6e04b508dfb296f ------- Comment From gusld.com 2013-05-14 15:15 EDT------- The following patch fixes this issue: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=96b04db9f2c16e77c31ef0e17e143da1e0cbfd78 Can we get it added to f19? Proposing as Beta blocker as for release criteria "All release-blocking desktops' offered mechanisms (if any) for shutting down, logging out and rebooting must work" (In reply to comment #14) > ------- Comment From gusld.com 2013-05-14 15:15 EDT------- > The following patch fixes this issue: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=96b04db9f2c16e77c31ef0e17e143da1e0cbfd78 > > Can we get it added to f19? I would suggest you send it upstream and get it included in the 3.9.y stable kernel series. ------- Comment From bjking1.com 2013-05-15 15:55 EDT------- I would have expected the following patch to make the warning go away on shutdown: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/ipr.c?id=bfae7820b87c61c5065338b55405b304d9890085 Created attachment 751455 [details]
1 of 2 patch: fix_config_restore_after_eeh ( against 3.9-y stable kernel)
------- Comment (attachment only) From wenxiong.com 2013-05-22 01:27 EDT-------
Created attachment 751456 [details]
2 of 2: driver_lockdep_patch (against 3.9-y stable kernel)
------- Comment (attachment only) From wenxiong.com 2013-05-22 01:29 EDT-------
Created attachment 751457 [details]
series file
------- Comment (attachment only) From wenxiong.com 2013-05-22 01:30 EDT-------
Created attachment 751781 [details]
reboot log for 3.9.3-301 + patches
------- Comment on attachment From gusld.com 2013-05-22 15:52 EDT-------
Thanks Wendy!
I tested the latest F19 kernel (3.9.3-301.fc19.ppc64) with the 2 patches Wendy posted here. It fixed the reboot hangs, though it still spits irq messages during the reboot. I attached the full log for a reboot.
The patches for this are all now included in the 3.9.5-300 release currently building. kernel-3.9.5-301.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.9.5-301.fc19 Package kernel-3.9.5-301.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.5-301.fc19' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-10689/kernel-3.9.5-301.fc19 then log in and leave karma (feedback). kernel-3.9.5-301.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. |