Bug 728314

Summary: abrt should capture all kernel problems
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: abrtAssignee: Denys Vlasenko <dvlasenk>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: anton, dvlasenk, iprikryl, jmoskovc, joe.christy, kklic, mads, mtoman, npajkovs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: abrt-2.0.6-1.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-21 22:51:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Nicolas Mailhot 2011-08-04 16:56:28 UTC
Fedora (and even more fedora-devel) kernels are built with lots of expensive debuging options to help improve the kernel code faster, conforming to Fedora's mission.

However the resulting errors are completely ignored by abrt and as far as I can tell all those computing cycles spent verifying kernel correctness are completely wasted. abrt should capture all kernel traces, even if they are informationnal, and send them to bugzilla/kerneloops for analysis

Example of such an unreported trace:

   28.556610] =============================================
[   28.557007] [ INFO: possible recursive locking detected ]
[   28.557007] 3.1.0-0.rc0.git19.1.fc17.x86_64 #1
[   28.557007] ---------------------------------------------
[   28.557007] modprobe/684 is trying to acquire lock:
[   28.557007]  (&hdl->lock){+.+...}, at: [<ffffffffa02919ba>] find_ref_lock+0x24/0x46 [videodev]
[   28.557007] 
[   28.557007] but task is already holding lock:
[   28.557007]  (&hdl->lock){+.+...}, at: [<ffffffffa029380f>] v4l2_ctrl_add_handler+0x49/0x97 [videodev]
[   28.557007] 
[   28.557007] other info that might help us debug this:
[   28.557007]  Possible unsafe locking scenario:
[   28.557007] 
[   28.557007]        CPU0
[   28.557007]        ----
[   28.557007]   lock(&hdl->lock);
[   28.557007]   lock(&hdl->lock);
[   28.557007] 
[   28.557007]  *** DEADLOCK ***
[   28.557007] 
[   28.557007]  May be due to missing lock nesting notation
[   28.557007] 
[   28.557007] 3 locks held by modprobe/684:
[   28.557007]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81314d0c>] __driver_attach+0x3b/0x82
[   28.557007]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81314d1a>] __driver_attach+0x49/0x82
[   28.557007]  #2:  (&hdl->lock){+.+...}, at: [<ffffffffa029380f>] v4l2_ctrl_add_handler+0x49/0x97 [videodev]
[   28.557007] 
[   28.557007] stack backtrace:
[   28.557007] Pid: 684, comm: modprobe Not tainted 3.1.0-0.rc0.git19.1.fc17.x86_64 #1
[   28.557007] Call Trace:
[   28.557007]  [<ffffffff8108eb06>] __lock_acquire+0x917/0xcf7
[   28.557007]  [<ffffffff81014fbe>] ? sched_clock+0x9/0xd
[   28.557007]  [<ffffffff8108dffc>] ? mark_lock+0x2d/0x220
[   28.557007]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.557007]  [<ffffffff8108f3dc>] lock_acquire+0xf3/0x13e
[   28.584886]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffff814f2523>] __mutex_lock_common+0x5d/0x39a
[   28.585146]  [<ffffffffa02919ba>] ? find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffff8108f6db>] ? mark_held_locks+0x6d/0x95
[   28.585146]  [<ffffffff814f282f>] ? __mutex_lock_common+0x369/0x39a
[   28.585146]  [<ffffffff8108f830>] ? trace_hardirqs_on_caller+0x12d/0x164
[   28.585146]  [<ffffffff814f296f>] mutex_lock_nested+0x40/0x45
[   28.585146]  [<ffffffffa02919ba>] find_ref_lock+0x24/0x46 [videodev]
[   28.585146]  [<ffffffffa029367e>] handler_new_ref+0x42/0x18a [videodev]
[   28.585146]  [<ffffffffa0293833>] v4l2_ctrl_add_handler+0x6d/0x97 [videodev]
[   28.585146]  [<ffffffffa028f71b>] v4l2_device_register_subdev+0x16c/0x257 [videodev]
[   28.585146]  [<ffffffffa02ddfe9>] ivtv_gpio_init+0x14e/0x159 [ivtv]
[   28.585146]  [<ffffffffa02ebd57>] ivtv_probe+0xdc4/0x1662 [ivtv]
[   28.585146]  [<ffffffff8108f6c3>] ? mark_held_locks+0x55/0x95
[   28.585146]  [<ffffffff814f41df>] ? _raw_spin_unlock_irqrestore+0x4d/0x61
[   28.585146]  [<ffffffff8126a12b>] local_pci_probe+0x44/0x75
[   28.585146]  [<ffffffff8126acb1>] pci_device_probe+0xd0/0xff
[   28.585146]  [<ffffffff81314bef>] driver_probe_device+0x131/0x213
[   28.585146]  [<ffffffff81314d2f>] __driver_attach+0x5e/0x82
[   28.585146]  [<ffffffff81314cd1>] ? driver_probe_device+0x213/0x213
[   28.585146]  [<ffffffff81313c30>] bus_for_each_dev+0x59/0x8f
[   28.585146]  [<ffffffff813147c3>] driver_attach+0x1e/0x20
[   28.585146]  [<ffffffff813143db>] bus_add_driver+0xd4/0x22a
[   28.585146]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.585146]  [<ffffffff813151f2>] driver_register+0x98/0x105
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffff8126b584>] __pci_register_driver+0x66/0xd2
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffffa02ff078>] module_start+0x78/0x1000 [ivtv]
[   28.618302]  [<ffffffff81002099>] do_one_initcall+0x7f/0x13a
[   28.618302]  [<ffffffffa02ff000>] ? 0xffffffffa02fefff
[   28.618302]  [<ffffffff8109a864>] sys_init_module+0x114/0x267
[   28.618302]  [<ffffffff814fafc2>] system_call_fastpath+0x16/0x1b

Comment 1 Jiri Moskovcak 2011-08-04 17:56:52 UTC
is abrt-dump-oops running?

Comment 2 Mads Kiilerich 2011-08-24 21:51:18 UTC
Yes, it is running and catches other errors.

Comment 3 Denys Vlasenko 2011-10-11 19:42:14 UTC
It's easy to find out why it's not detected:

$ abrt-dump-oops -vvvo oops_recursive_locking.test 
abrt-dump-oops: Opened 'oops_recursive_locking.test'
abrt-dump-oops: File grew by 4118 bytes, from 0 to 4118
abrt-dump-oops: Read 4118 bytes
abrt-dump-oops: Found oops at line 1: '[ INFO: possible recursive locking detected ]'
abrt-dump-oops: Dropped oops, too long
abrt-dump-oops: Found oopses: 0

Comment 4 Denys Vlasenko 2011-10-11 19:44:08 UTC
Fixed in git:

commit 29b3f9e23f90ea3ea92b2d845864d60052ca4081
Author: Denys Vlasenko <dvlasenk@redhat.com>
Date:   Tue Oct 11 21:41:30 2011 +0200

    abrt-dump-oops: add checks for "irq NN: nobody cared" and "IRQ handler type mismatch"

    This closes rhbz#728194.
    Also, increase oops size limit from 60 to 80 lines - closes rhbz#728314

Comment 5 Fedora Update System 2011-11-05 18:57:56 UTC
abrt-2.0.6-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/abrt-2.0.6-1.fc16

Comment 6 Fedora Update System 2011-11-06 23:55:27 UTC
Package abrt-2.0.6-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing abrt-2.0.6-1.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-15513
then log in and leave karma (feedback).

Comment 7 Fedora Update System 2011-11-21 22:51:22 UTC
abrt-2.0.6-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.