Bug 247975

Summary: list_del corruption BUG.
Product: [Fedora] Fedora Reporter: Oleg Drokin <green>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7CC: chris.brown
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-18 20:44:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oleg Drokin 2007-07-12 13:54:04 UTC
Description of problem:
I just installed a new dual cpu dual core server yesterday with FC7 (Intel(R)
Xeon(R) CPU 5140  @ 2.33GHz)

Kernel crashed today in the morning, first it experienced the BUG, immediatelly
followed by NULL pointer deref and warning about sleeping in wrong context.
(after that there were tons of soft lookup reports, presumably due to something
that died still holding some locks):

list_del corruption. prev->next should be ffff81012a028840, but was 0000000000000000
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:67!
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 0 
Modules linked in: tun ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables
ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod video sbs
i2c_ec button dock battery ac parport_pc lp parport loop bnx2 sr_mod cdrom
i2c_i801 ata_generic i2c_core pcspkr joydev sg ata_piix libata shpchp aacraid
sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
Pid: 14, comm: events/0 Not tainted 2.6.21-1.3228.fc7 #1
RIP: 0010:[<ffffffff8033578e>]  [<ffffffff8033578e>] list_del+0x21/0x5b
RSP: 0018:ffff81013fec9db0  EFLAGS: 00010086
RAX: 0000000000000058 RBX: ffff81012a028840 RCX: ffffffff8055b008
RDX: ffffffff8055b008 RSI: 0000000000000086 RDI: 0000000000000000
RBP: ffff810103824440 R08: ffffffff8055b008 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8101038b9840 R12: ffff81000060b000
R13: ffff810103827100 R14: ffff810103835418 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff8059d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaab4438e0 CR3: 00000001039dc000 CR4: 00000000000006e0
Process events/0 (pid: 14, threadinfo ffff81013fec8000, task ffff81013fb7a7a0)
Stack:  000000003ffaa940 ffffffff802c9547 0000000400000000 ffff810103835418
 0000000000000004 ffff810103835400 0000000000000000 ffff810103824440
 ffff810103827100 ffffffff802c9663 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff802c9547>] free_block+0xb1/0x142
 [<ffffffff802c9663>] drain_array+0x8b/0xbf
 [<ffffffff802ca1a3>] cache_reap+0x9d/0x20a
 [<ffffffff802ca106>] cache_reap+0x0/0x20a
 [<ffffffff802482f7>] run_workqueue+0x8f/0x137
 [<ffffffff80244e7b>] worker_thread+0x0/0x14a
 [<ffffffff80244f8f>] worker_thread+0x114/0x14a
 [<ffffffff802819c3>] default_wake_function+0x0/0xe
 [<ffffffff8023023b>] kthread+0xd0/0xff
 [<ffffffff80257f38>] child_rip+0xa/0x12
 [<ffffffff8023016b>] kthread+0x0/0xff
 [<ffffffff80257f2e>] child_rip+0x0/0x12
Code: 0f 0b eb fe 48 8b 07 48 8b 50 08 48 39 fa 74 12 48 c7 c7 0d 
RIP  [<ffffffff8033578e>] list_del+0x21/0x5b RSP <ffff81013fec9db0>

Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: 
 [<ffffffff8025d872>] _spin_lock+0x0/0xf
PGD 11cfa3067 PUD 11cfab067 PMD 0 
Oops: 0002 [2] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 0 
Modules linked in: tun ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables
ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod video sbs
i2c_ec button dock battery ac parport_pc lp parport loop bnx2 sr_mod cdrom
i2c_i801 ata_generic i2c_core pcspkr joydev sg ata_piix libata shpchp aacraid
sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
Pid: 2912, comm: hald-addon-stor Not tainted 2.6.21-1.3228.fc7 #1
RIP: 0010:[<ffffffff8025d872>]  [<ffffffff8025d872>] _spin_lock+0x0/0xf
RSP: 0018:ffff81011cfb1ad0  EFLAGS: 00010002
RAX: 000000000000044c RBX: 000000000000044c RCX: 0000000000000000
RDX: ffff810103835600 RSI: ffff810001000000 RDI: 0000000000000040
RBP: 000000000000044c R08: ffffffff8055fc80 R09: ffff81013e8e0620
R10: ffff8100a7efe800 R11: ffffffff802403d8 R12: ffff81000060be00
R13: ffff810103827100 R14: 0000000000000286 R15: ffff81013eee62f0
FS:  00002aaaaaabf8f0(0000) GS:ffffffff8059d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 000000011d333000 CR4: 00000000000006e0
Process hald-addon-stor (pid: 2912, threadinfo ffff81011cfb0000, task
ffff81011cf927e0)
Stack:  ffffffff8020ade1 000000000000d47d ffff81000060be00 0000000000000000
 0000000000000000 ffff81000060be00 ffff81013eee62c0 00000000ffffff85
 ffffffff88114d39 0000000000000043 000000000000400c ffff81000060be00
Call Trace:
 [<ffffffff8020ade1>] kfree+0x1ab/0x209
 [<ffffffff88114d39>] :sr_mod:sr_cd_check+0x413/0x425
 [<ffffffff88113162>] :sr_mod:sr_media_change+0x7d/0x1fe
 [<ffffffff8021d319>] __pollwait+0x0/0xe1
 [<ffffffff88107076>] :cdrom:media_changed+0x44/0x74
 [<ffffffff802e0426>] check_disk_change+0x1f/0x76
 [<ffffffff8810b358>] :cdrom:cdrom_open+0x92c/0x979
 [<ffffffff8020cd03>] dput+0x3c/0x123
 [<ffffffff80209f42>] __link_path_walk+0xc4c/0xd9c
 [<ffffffff8022af5d>] mntput_no_expire+0x1c/0x92
 [<ffffffff8020e337>] link_path_walk+0xce/0xe0
 [<ffffffff80229325>] iput+0x42/0x7b
 [<ffffffff80233fb5>] __strncpy_from_user+0x28/0x52
 [<ffffffff80251bbb>] kobject_get+0x12/0x17
 [<ffffffff8032b7cd>] get_disk+0x40/0x5b
 [<ffffffff8025443d>] exact_lock+0xc/0x14
 [<ffffffff80251bbb>] kobject_get+0x12/0x17
 [<ffffffff881133e9>] :sr_mod:sr_block_open+0x96/0xa6
 [<ffffffff802e0a66>] do_open+0xa2/0x2b9
 [<ffffffff80210e83>] may_open+0x5b/0x21b
 [<ffffffff802e0eda>] blkdev_open+0x0/0x5d
 [<ffffffff802e0f08>] blkdev_open+0x2e/0x5d
 [<ffffffff8021d20b>] __dentry_open+0xd9/0x1aa
 [<ffffffff80225f8b>] do_filp_open+0x2a/0x38
 [<ffffffff80229325>] iput+0x42/0x7b
 [<ffffffff80233fb5>] __strncpy_from_user+0x28/0x52
 [<ffffffff802182f2>] do_sys_open+0x44/0xc1
 [<ffffffff8025729c>] tracesys+0xdc/0xe1

Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 
RIP  [<ffffffff8025d872>] _spin_lock+0x0/0xf RSP <ffff81011cfb1ad0>

BUG: sleeping function called from invalid context
 at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1

Call Trace:
 [<ffffffff802951d2>] down_read+0x15/0x23
 [<ffffffff802a24c4>] acct_collect+0x42/0x18e
 [<ffffffff80213e4c>] do_exit+0x1fd/0x7e0
 [<ffffffff8025fd0c>] do_page_fault+0x73a/0x7b5
 [<ffffffff8024ab8c>] blk_put_request+0x2a/0x42
 [<ffffffff8025ddfd>] error_exit+0x0/0x84
 [<ffffffff802403d8>] mempool_free_slab+0x0/0xe
 [<ffffffff8025d872>] _spin_lock+0x0/0xf
 [<ffffffff8020ade1>] kfree+0x1ab/0x209
 [<ffffffff88114d39>] :sr_mod:sr_cd_check+0x413/0x425
 [<ffffffff88113162>] :sr_mod:sr_media_change+0x7d/0x1fe
 [<ffffffff8021d319>] __pollwait+0x0/0xe1
 [<ffffffff88107076>] :cdrom:media_changed+0x44/0x74
 [<ffffffff802e0426>] check_disk_change+0x1f/0x76
 [<ffffffff8810b358>] :cdrom:cdrom_open+0x92c/0x979
 [<ffffffff8020cd03>] dput+0x3c/0x123
 [<ffffffff80209f42>] __link_path_walk+0xc4c/0xd9c
 [<ffffffff8022af5d>] mntput_no_expire+0x1c/0x92
 [<ffffffff8020e337>] link_path_walk+0xce/0xe0
 [<ffffffff80229325>] iput+0x42/0x7b
 [<ffffffff80233fb5>] __strncpy_from_user+0x28/0x52
 [<ffffffff80251bbb>] kobject_get+0x12/0x17
 [<ffffffff8032b7cd>] get_disk+0x40/0x5b
 [<ffffffff8025443d>] exact_lock+0xc/0x14
 [<ffffffff80251bbb>] kobject_get+0x12/0x17
 [<ffffffff881133e9>] :sr_mod:sr_block_open+0x96/0xa6
 [<ffffffff802e0a66>] do_open+0xa2/0x2b9
 [<ffffffff80210e83>] may_open+0x5b/0x21b
 [<ffffffff802e0eda>] blkdev_open+0x0/0x5d
 [<ffffffff802e0f08>] blkdev_open+0x2e/0x5d
 [<ffffffff8021d20b>] __dentry_open+0xd9/0x1aa
 [<ffffffff80225f8b>] do_filp_open+0x2a/0x38
 [<ffffffff80229325>] iput+0x42/0x7b
 [<ffffffff80233fb5>] __strncpy_from_user+0x28/0x52
 [<ffffffff802182f2>] do_sys_open+0x44/0xc1
 [<ffffffff8025729c>] tracesys+0xdc/0xe1

Version-Release number of selected component (if applicable):
Kernel itself is 2.6.21-1.3228.fc7 which is current as of now, I believe.

Comment 1 Christopher Brown 2007-09-18 15:09:37 UTC
Hello Oleg,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 2 Oleg Drokin 2007-09-18 15:56:03 UTC
I no longer see this problem, so it might be fixed, but might be not - read on.
Originally I believed the problem was related to hald polling newly created
devices. My initial workaround was to kill hald and the problem stopped.
The device appears when remote console for ibm server is attached that brings a
virtual cd drive with itself.
I see that for quite some time I am running with hald enabled again and no
crashes, but on the other hand for all that time nobody used the console
software. This is my production machine so I am not very keen on trying to crash
it, unfortunately.
I understand that info provided is most probably not enough to find the culprit,
but this is all I have.
If you feel like it, you can probably close the bug.

Comment 3 Christopher Brown 2007-09-18 20:44:43 UTC
Rather than killing hald you can run this option:

hal-disable-polling $DEVICE

where $DEVICE is your cdrom drive, e.g. /dev/scd0

Anyway, thanks for initially filing the bug, sorry it took a while to get
reviewed and I will close as WORKSFORME. Please re-open if you feel this still
needs looking into at any point.

Cheers
Chris