Bug 728314 - abrt should capture all kernel problems
Summary: abrt should capture all kernel problems
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: abrt
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Denys Vlasenko
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-04 16:56 UTC by Nicolas Mailhot
Modified: 2011-11-21 22:51 UTC (History)
9 users (show)

Fixed In Version: abrt-2.0.6-1.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-21 22:51:22 UTC


Attachments (Terms of Use)

Description Nicolas Mailhot 2011-08-04 16:56:28 UTC
Fedora (and even more fedora-devel) kernels are built with lots of expensive debuging options to help improve the kernel code faster, conforming to Fedora's mission.

However the resulting errors are completely ignored by abrt and as far as I can tell all those computing cycles spent verifying kernel correctness are completely wasted. abrt should capture all kernel traces, even if they are informationnal, and send them to bugzilla/kerneloops for analysis

Example of such an unreported trace:

   28.556610] =============================================
[   28.557007] [ INFO: possible recursive locking detected ]
[   28.557007] 3.1.0-0.rc0.git19.1.fc17.x86_64 #1
[   28.557007] ---------------------------------------------
[   28.557007] modprobe/684 is trying to acquire lock:
[   28.557007]  (&hdl->lock){+.+...}, at: [<ffffffffa02919ba>] find_ref_lock+0x24/0x46 [videodev]
[   28.557007] 
[   28.557007] but task is already holding lock:
[   28.557007]  (&hdl->lock){+.+...}, at: [<ffffffffa029380f>] v4l2_ctrl_add_handler+0x49/0x97 [videodev]
[   28.557007] 
[   28.557007] other info that might help us debug this:
[   28.557007]  Possible unsafe locking scenario:
[   28.557007] 
[   28.557007]        CPU0
[   28.557007]        ----
[   28.557007]   lock(&hdl->lock);
[   28.557007]   lock(&hdl->lock);
[   28.557007] 
[   28.557007]  *** DEADLOCK ***
[   28.557007] 
[   28.557007]  May be due to missing lock nesting notation
[   28.557007] 
[   28.557007] 3 locks held by modprobe/684:
[   28.557007]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81314d0c>] __driver_attach+0x3b/0x82
[   28.557007]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81314d1a>] __driver_attach+0x49/0x82
[   28.557007]  #2:  (&hdl->lock){+.+...}, at: [<ffffffffa029380f>] v4l2_ctrl_add_handler+0x49/0x97 [videodev]
[   28.557007] 
[   28.557007] stack backtrace:
[   28.557007] Pid: 684, comm: modprobe Not tainted 3.1.0-0.rc0.git19.1.fc17.x86_64 #1
[   28.557007] Call Trace:
[   28.557007]  [<ffffffff8108eb06>] __lock_acquire+0x917/0xcf7
[   28.557007]  [<ffffffff81014fbe>] ? sched_clock+0x9/0xd
[   28.557007]  [<ffffffff8108dffc>] ? mark_lock+0x2d/0x220
[   28.557007]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.557007]  [<ffffffff8108f3dc>] lock_acquire+0xf3/0x13e
[   28.584886]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffff814f2523>] __mutex_lock_common+0x5d/0x39a
[   28.585146]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffff8108f6db>] ? mark_held_locks+0x6d/0x95
[   28.585146]  [<ffffffff814f282f>] ? __mutex_lock_common+0x369/0x39a
[   28.585146]  [<ffffffff8108f830>] ? trace_hardirqs_on_caller+0x12d/0x164
[   28.585146]  [<ffffffff814f296f>] mutex_lock_nested+0x40/0x45
[   28.585146]  [<ffffffffa02919ba>] find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffffa029367e>] handler_new_ref+0x42/0x18a [videodev]
[   28.585146]  [<ffffffffa0293833>] v4l2_ctrl_add_handler+0x6d/0x97 [videodev]
[   28.585146]  [<ffffffffa028f71b>] v4l2_device_register_subdev+0x16c/0x257 [videodev]
[   28.585146]  [<ffffffffa02ddfe9>] ivtv_gpio_init+0x14e/0x159 [ivtv]
[   28.585146]  [<ffffffffa02ebd57>] ivtv_probe+0xdc4/0x1662 [ivtv]
[   28.585146]  [<ffffffff8108f6c3>] ? mark_held_locks+0x55/0x95
[   28.585146]  [<ffffffff814f41df>] ? _raw_spin_unlock_irqrestore+0x4d/0x61
[   28.585146]  [<ffffffff8126a12b>] local_pci_probe+0x44/0x75
[   28.585146]  [<ffffffff8126acb1>] pci_device_probe+0xd0/0xff
[   28.585146]  [<ffffffff81314bef>] driver_probe_device+0x131/0x213
[   28.585146]  [<ffffffff81314d2f>] __driver_attach+0x5e/0x82
[   28.585146]  [<ffffffff81314cd1>] ? driver_probe_device+0x213/0x213
[   28.585146]  [<ffffffff81313c30>] bus_for_each_dev+0x59/0x8f
[   28.585146]  [<ffffffff813147c3>] driver_attach+0x1e/0x20
[   28.585146]  [<ffffffff813143db>] bus_add_driver+0xd4/0x22a
[   28.585146]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.585146]  [<ffffffff813151f2>] driver_register+0x98/0x105
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffff8126b584>] __pci_register_driver+0x66/0xd2
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffffa02ff078>] module_start+0x78/0x1000 [ivtv]
[   28.618302]  [<ffffffff81002099>] do_one_initcall+0x7f/0x13a
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffff8109a864>] sys_init_module+0x114/0x267
[   28.618302]  [<ffffffff814fafc2>] system_call_fastpath+0x16/0x1b

Comment 1 Jiri Moskovcak 2011-08-04 17:56:52 UTC
is abrt-dump-oops running?

Comment 2 Mads Kiilerich 2011-08-24 21:51:18 UTC
Yes, it is running and catches other errors.

Comment 3 Denys Vlasenko 2011-10-11 19:42:14 UTC
It's easy to find out why it's not detected:

$ abrt-dump-oops -vvvo oops_recursive_locking.test 
abrt-dump-oops: Opened 'oops_recursive_locking.test'
abrt-dump-oops: File grew by 4118 bytes, from 0 to 4118
abrt-dump-oops: Read 4118 bytes
abrt-dump-oops: Found oops at line 1: '[ INFO: possible recursive locking detected ]'
abrt-dump-oops: Dropped oops, too long
abrt-dump-oops: Found oopses: 0

Comment 4 Denys Vlasenko 2011-10-11 19:44:08 UTC
Fixed in git:

commit 29b3f9e23f90ea3ea92b2d845864d60052ca4081
Author: Denys Vlasenko <dvlasenk@redhat.com>
Date:   Tue Oct 11 21:41:30 2011 +0200

    abrt-dump-oops: add checks for "irq NN: nobody cared" and "IRQ handler type mismatch"

    This closes rhbz#728194.
    Also, increase oops size limit from 60 to 80 lines - closes rhbz#728314

Comment 5 Fedora Update System 2011-11-05 18:57:56 UTC
abrt-2.0.6-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/abrt-2.0.6-1.fc16

Comment 6 Fedora Update System 2011-11-06 23:55:27 UTC
Package abrt-2.0.6-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing abrt-2.0.6-1.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-15513
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2011-11-21 22:51:22 UTC
abrt-2.0.6-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.