Bug 1248741

Summary:

kernel crash when bluetooth mouse is used, usually on reboot

Product:

[Fedora] Fedora

Reporter:

Dimitris <dimitris.on.linux>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

benh, gansalmon, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-4.1.5-100.fc21

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-19 08:04:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
screenshot	none
panic with BT mouse	none
another panic with BT mouse	none
panic with slub_debug	none
potential fix	none

Description Dimitris 2015-07-30 18:01:34 UTC

Created attachment 1057783 [details]
screenshot

Description of problem:

Kernel crashes on reboot

Version-Release number of selected component (if applicable):

4.1.3-100.fc21

How reproducible:

Every time

Steps to Reproduce:
1. Reboot
2. Kernel crashes (see attachment with screen photo - log didn't seem to make it to journal)

Actual results:

Kernel crashes, machine stuck, needs hard reset.

Expected results:

reboot

Additional info:

This is on a Thinkpad X200s.  Reverted to previous 4.0.8-200.fc21.x86_64, behavior back to normal.

Comment 1 Laura Abbott 2015-07-30 21:24:28 UTC

There was a previous 4.0.9 update which we decided to skip and go to 4.1 instead. Does that work for you? https://admin.fedoraproject.org/updates/kernel-4.0.9-200.fc21 (ignore the negative karma, that's related to out of tree modules)

Comment 2 Dimitris 2015-07-31 17:04:46 UTC

Hi, 4.0.9-200 works so far - I can reboot without issues.

Comment 3 Laura Abbott 2015-07-31 18:50:01 UTC

Thanks for the confirmation. I suspect something in 4.1 broke it. Bisection is probably going to be the fastest way do test this. You can either do this on the official upstream tree or try the scripts I wrote https://pagure.io/fedbisect 

./fedbisect.sh start v4.0.9 v4.1.3

You can also try to see if you can get more kernel logs. Can you try switching to a dedicated tty and trying the reboot command? It looks like the backtrace has been going off for a while so it would be helpful to see what the first oops is.

Comment 4 Laura Abbott 2015-07-31 18:53:22 UTC

Can you also try

# echo 1 > /proc/sys/kernel/panic_on_oops

Comment 5 Dimitris 2015-07-31 20:50:04 UTC

You're right, the original oops did scroll off by the time I got the camera ready.  I'll try to reproduce and catch it.

In the meantime I got this under 4.0.9-200, but I recall having seen this before with various versions so not sure if related to this bug:

Jul 31 13:30:59 gaspode kernel: ------------[ cut here ]------------
Jul 31 13:30:59 gaspode kernel: WARNING: CPU: 1 PID: 0 at drivers/gpu/drm/i915/intel_display.c:9756 intel_check_page_flip+0xdb/0xf0 [i915]()
Jul 31 13:30:59 gaspode kernel: Kicking stuck page flip: queued at 709754, now 709772
Jul 31 13:30:59 gaspode kernel: Modules linked in: hidp rfcomm fuse ccm nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJE
Jul 31 13:30:59 gaspode kernel:  wmi snd_timer mei_me snd tpm_tis rfkill soundcore tpm mei shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd 
Jul 31 13:30:59 gaspode kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.9-200.fc21.x86_64 #1
Jul 31 13:30:59 gaspode kernel: Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET72WW (3.22 ) 10/25/2012
Jul 31 13:30:59 gaspode kernel:  0000000000000000 d18adbc99f5f8a54 ffff88023bc83cb8 ffffffff8177d708
Jul 31 13:30:59 gaspode kernel:  0000000000000000 ffff88023bc83d10 ffff88023bc83cf8 ffffffff8109d6ca
Jul 31 13:30:59 gaspode kernel:  ffff88023bc83cf8 ffff88022fae7800 ffff880036a33000 0000000000000001
Jul 31 13:30:59 gaspode kernel: Call Trace:
Jul 31 13:30:59 gaspode kernel:  <IRQ>  [<ffffffff8177d708>] dump_stack+0x45/0x57
Jul 31 13:30:59 gaspode kernel:  [<ffffffff8109d6ca>] warn_slowpath_common+0x8a/0xc0
Jul 31 13:30:59 gaspode kernel:  [<ffffffff8109d755>] warn_slowpath_fmt+0x55/0x70
Jul 31 13:30:59 gaspode kernel:  [<ffffffffa020613b>] intel_check_page_flip+0xdb/0xf0 [i915]
Jul 31 13:30:59 gaspode kernel:  [<ffffffffa01cf475>] i915_handle_vblank+0x55/0xb0 [i915]
Jul 31 13:30:59 gaspode kernel:  [<ffffffffa01e4004>] ? gen2_write32+0x34/0xa0 [i915]
Jul 31 13:30:59 gaspode kernel:  [<ffffffffa01d0d76>] i965_irq_handler+0x2b6/0x390 [i915]
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810f5017>] handle_irq_event_percpu+0x77/0x1a0
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810f517b>] handle_irq_event+0x3b/0x60
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810f837e>] handle_edge_irq+0x6e/0x120
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810174e4>] handle_irq+0x74/0x140
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810bd1ba>] ? atomic_notifier_call_chain+0x1a/0x20
Jul 31 13:30:59 gaspode kernel:  [<ffffffff81786b1f>] do_IRQ+0x4f/0xf0
Jul 31 13:30:59 gaspode kernel:  [<ffffffff8178486d>] common_interrupt+0x6d/0x6d
Jul 31 13:30:59 gaspode kernel:  <EOI>  [<ffffffff816150b3>] ? cpuidle_enter_state+0x63/0x160
Jul 31 13:30:59 gaspode kernel:  [<ffffffff816150a1>] ? cpuidle_enter_state+0x51/0x160
Jul 31 13:30:59 gaspode kernel:  [<ffffffff816151e7>] cpuidle_enter+0x17/0x20
Jul 31 13:30:59 gaspode kernel:  [<ffffffff810e042d>] cpu_startup_entry+0x37d/0x420
Jul 31 13:30:59 gaspode kernel:  [<ffffffff8104b635>] start_secondary+0x1a5/0x1f0
Jul 31 13:30:59 gaspode kernel: ---[ end trace 33c12756931d9068 ]---

Comment 6 Dimitris 2015-07-31 22:30:25 UTC

abrt matches the last one to bug 1236721 so looks unrelated

Comment 7 Dimitris 2015-08-02 01:10:11 UTC

With /proc/sys/kernel/panic_on_oops set to 1, I tried to narrow this down, at first it seemed to be display-related (I use a second display via Ultrabase).

It actually ends up correlating perfectly with whether I have connected my bluetooth mouse or not.  Shortest steps to reproduce:

- BT mouse already paired (might want to do with a different kernel first).
- BT mouse off.
- Boot up, get to GDM screen, don't log in.
- BT mouse on, mouve around until mouse connects.
- Ctrl-Alt-F2, log in to tty.
- set /sys/proc/kernel/panic_on_root to 1.
- shutdown -r now
- oops/panic.

At least once I also managed to get a crash just without a reboot, just by turning the mouse off instead.

Without the BT mouse, system runs and reboots normally so far.

I'll add a couple of panic screenshots.

Comment 8 Dimitris 2015-08-02 01:15:27 UTC

Created attachment 1058425 [details]
panic with BT mouse

Comment 9 Dimitris 2015-08-02 01:16:18 UTC

Created attachment 1058426 [details]
another panic with BT mouse

Comment 10 Dimitris 2015-08-02 01:27:49 UTC

BT controller on my thinkpad is also a broadcom one:

Bus 004 Device 004: ID 0a5c:217f Broadcom Corp. BCM2045B (BDC-2.1)

Comment 11 Dimitris 2015-08-03 01:13:59 UTC

Now on F22, running 4.1.3-201.fc22.x86_64.  No change in behavior, staying with my old USB wired mouse for now.

Comment 12 Laura Abbott 2015-08-07 14:54:01 UTC

There was a similar report of corruption on shutdown on LKML. Can you try adding 'slub_debug' to your kernel command line to see if it changes the backtrace you get?

Comment 13 Dimitris 2015-08-07 21:08:58 UTC

Created attachment 1060483 [details]
panic with slub_debug

reproduced with slub_debug in the kernel command line, running 4.1.3-201.fc22.x86_64

Comment 14 Laura Abbott 2015-08-07 22:15:42 UTC

Created attachment 1060510 [details]
potential fix

Can you test the following patch? While going through LKML backlog I found this which sounds promising based on your backtrace

Comment 15 Dimitris 2015-08-08 18:57:08 UTC

(In reply to Laura Abbott from comment #14)
> Created attachment 1060510 [details]
> potential fix
> 
> Can you test the following patch? While going through LKML backlog I found
> this which sounds promising based on your backtrace

Patch works :)

Comment 16 Fedora Update System 2015-08-12 00:21:06 UTC

kernel-4.1.5-100.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/kernel-4.1.5-100.fc21

Comment 17 Fedora Update System 2015-08-12 00:23:02 UTC

kernel-4.1.5-200.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/kernel-4.1.5-200.fc22

Comment 18 Fedora Update System 2015-08-13 16:54:18 UTC

Package kernel-4.1.5-100.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-4.1.5-100.fc21'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-13391/kernel-4.1.5-100.fc21
then log in and leave karma (feedback).

Comment 19 Benjamin Herrenschmidt 2015-08-16 21:58:44 UTC

*** Bug 1253854 has been marked as a duplicate of this bug. ***

Comment 20 Fedora Update System 2015-08-19 08:04:03 UTC

kernel-4.1.5-200.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2015-08-19 08:11:58 UTC

kernel-4.1.5-100.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.