739467 – list_add corruption. next->prev should be prev - enqueue_task

Bug 739467 - list_add corruption. next->prev should be prev - enqueue_task

Summary: list_add corruption. next->prev should be prev - enqueue_task

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-09-19 07:30 UTC by Jan Kratochvil
Modified:	2011-09-26 18:41 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-09-26 18:41:26 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jan Kratochvil 2011-09-19 07:30:07 UTC

Description of problem:
random crash

Version-Release number of selected component (if applicable):
kernel-2.6.35.14-95.fc14.x86_64

How reproducible:
probably not, happened after a month of uptime

Additional info:
I have the kdump vmcore file for further analysis.

------------[ cut here ]------------
WARNING: at lib/list_debug.c:26 __list_add+0x3f/0x81()
Hardware name: EX58-UD4
list_add corruption. next->prev should be prev (ffff8801a7f80cf8), but was ffff88015e4b97a0. (next=ffff8801a7f80cf8).
Modules linked in: netconsole iptable_mangle nls_utf8 vfat fat usb_storage ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat usblp hidp fuse configfs rfcomm sco bnep l2cap nfsd lockd nfs_acl auth_rpcgss exportfs it87 hwmon_vid coretemp sunrpc tun cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ipv6 kvm_intel kvm uinput snd_hda_intel snd_usb_audio snd_usbmidi_lib snd_hda_codec snd_rawmidi snd_hwdep uvcvideo snd_seq snd_seq_device snd_pcm btusb snd_timer snd_page_alloc videodev v4l2_compat_ioctl32 i2c_i801 snd i7core_edac bluetooth edac_core soundcore r8169 mii iTCO_wdt iTCO_vendor_support microcode rfkill sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt raid1 firewire_ohci firewire_core crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: netconsole]
Pid: 646, comm: kcryptd Not tainted 2.6.35.14-95.fc14.x86_64 #1
Call Trace:
 [<ffffffff8104dd31>] warn_slowpath_common+0x85/0x9d
 [<ffffffff8104ddec>] warn_slowpath_fmt+0x46/0x48
 [<ffffffff812272ca>] __list_add+0x3f/0x81
 [<ffffffff8103da85>] list_add+0x11/0x13
 [<ffffffff81040fed>] enqueue_entity+0x89/0x2e8
 [<ffffffff810414f5>] enqueue_task_fair+0x2a/0x48
 [<ffffffff81042f7d>] enqueue_task+0x5d/0x6d
 [<ffffffff81042fba>] activate_task+0x2d/0x36
 [<ffffffff81046cf4>] try_to_wake_up+0x1f8/0x2c5
 [<ffffffff81046dd3>] default_wake_function+0x12/0x14
 [<ffffffff81066aa1>] autoremove_wake_function+0x16/0x39
 [<ffffffff81066af3>] wake_bit_function+0x2f/0x31
 [<ffffffff81039ddf>] __wake_up_common+0x4e/0x84
 [<ffffffff8103d147>] __wake_up+0x39/0x4d
 [<ffffffff81066a5f>] __wake_up_bit+0x31/0x33
 [<ffffffff810d418f>] unlock_page+0x27/0x2c
 [<ffffffff811404b6>] mpage_end_io_read+0x65/0x7d
 [<ffffffff8113b8ca>] bio_endio+0x2b/0x2d
 [<ffffffff81382cac>] dec_pending+0x153/0x15c
 [<ffffffff81382e6d>] clone_endio+0xaa/0xb7
 [<ffffffff8113b8ca>] bio_endio+0x2b/0x2d
 [<ffffffffa0121775>] crypt_dec_pending+0x5e/0x8c [dm_crypt]
 [<ffffffffa0122a89>] kcryptd_crypt+0x3fa/0x45d [dm_crypt]
 [<ffffffff8146b70f>] ? _raw_spin_unlock_irqrestore+0x17/0x19
 [<ffffffff810bf8d3>] ? probe_workqueue_execution+0xb1/0xcd
 [<ffffffff81062c2d>] worker_thread+0x1c5/0x251
 [<ffffffffa012268f>] ? kcryptd_crypt+0x0/0x45d [dm_crypt]
 [<ffffffff81066a8b>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff81062a68>] ? worker_thread+0x0/0x251
 [<ffffffff810665f1>] kthread+0x7f/0x87
 [<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81066572>] ? kthread+0x0/0x87
 [<ffffffff8100aaa0>] ? kernel_thread_helper+0x0/0x10
---[ end trace 39e7b2d4e2073c82 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff81040c70>] pick_next_task_fair+0x89/0x146
PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
CPU 2 
Modules linked in: netconsole iptable_mangle nls_utf8 vfat fat usb_storage ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat usblp hidp fuse configfs rfcomm sco bnep l2cap nfsd lockd nfs_acl auth_rpcgss exportfs it87 hwmon_vid coretemp sunrpc tun cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ipv6 kvm_intel kvm uinput snd_hda_intel snd_usb_audio snd_usbmidi_lib snd_hda_codec snd_rawmidi snd_hwdep uvcvideo snd_seq snd_seq_device snd_pcm btusb snd_timer snd_page_alloc videodev v4l2_compat_ioctl32 i2c_i801 snd i7core_edac bluetooth edac_core soundcore r8169 mii iTCO_wdt iTCO_vendor_support microcode rfkill sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt raid1 firewire_ohci firewire_core crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: netconsole]

Pid: 0, comm: swapper Tainted: G        W   2.6.35.14-95.fc14.x86_64 #1 EX58-UD4/EX58-UD4
RIP: 0010:[<ffffffff81040c70>]  [<ffffffff81040c70>] pick_next_task_fair+0x89/0x146
RSP: 0018:ffff8801a8d3bdd8  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000020170f41
RDX: ffff88000a295540 RSI: ffff88000a2955d8 RDI: 0000000000000000
RBP: ffff8801a8d3be18 R08: 0000000000000000 R09: ffff8801a8d3bec8
R10: 0007edd2641c2e09 R11: ffffffff81b81f60 R12: ffff8801a7f80cc0
R13: 0000000000000000 R14: ffff88000a295540 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000128781000 CR4: 00000000000006e0
DR0: 0000000008049bd4 DR1: 00000000080498e0 DR2: 000000000804a2b8
DR3: 000000000804a2bc DR6: 00000000ffff0ff0 DR7: 0000000000000600
Process swapper (pid: 0, threadinfo ffff8801a8d3a000, task ffff8801a8d32e80)
Stack:
 0007edd264912b08 ffffffff817ad7de ffff8801a8d3be18 ffff88000a295540
<0> ffffffff81b81f60 ffff8801a8d33238 0000000000000000 0000000000000002
<0> ffff8801a8d3be38 ffffffff81040d57 ffff88000a295540 ffffffff81b81f60
Call Trace:
 [<ffffffff81040d57>] pick_next_task+0x2a/0x49
 [<ffffffff81469d96>] schedule+0x2c9/0x5c0
 [<ffffffff810732d4>] ? hrtimer_start_expires.clone.5+0x1e/0x20
 [<ffffffff8100832b>] cpu_idle+0xca/0xcc
 [<ffffffff81464500>] start_secondary+0x24d/0x28e
Code: 8b 5c 24 58 49 8b 44 24 60 48 85 c0 74 15 48 8b 78 50 4c 89 ee e8 4c fa ff ff 85 c0 7f 05 49 8b 5c 24 60 48 89 df e8 d4 f5 ff ff <83> 7b 38 00 74 32 49 8d 7c 24 70 48 89 de 4c 8d 6b 10 e8 5a fc 
RIP  [<ffffffff81040c70>] pick_next_task_fair+0x89/0x146
 RSP <ffff8801a8d3bdd8>
CR2: 0000000000000038

Comment 1 Dave Jones 2011-09-26 18:41:26 UTC

this part of the scheduler has seen extensive rewriting since 2.6.35.  There's no obvious fix for this particular bug, which makes backporting complicated.

Given the late stage of f14 (we're not rebasing it again), and that this isn't easily reproducible, I think we're best off just closing this out.   For f15 onwards, we're going back to aggressively rebasing, so we should be able to track such bugs a lot easier.

Note You need to log in before you can comment on or make changes to this bug.