Description of problem: System Crash after running weekly backups. Did not happen until recent upgrade to kernel. I have rolled back to kernel kernel-smp-2.6.17-1.2187_FC5 temporary solution. Version-Release number of selected component (if applicable): kernel-smp-2.6.18-1.2200.fc5 The system is our main file server running both NFS and samba: samba-3.0.23c-1.fc5 nfs-utils-1.0.8-3.fc5 How reproducible: Usually happens on weekend just after or during the backups are run. Large files are copied via NFS to another system with the tape drive. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Nov 5 04:10:08 cmxcen kernel: list_del corruption. next->prev should be d164a520, but was 65746e69 Nov 5 04:10:08 cmxcen kernel: ------------[ cut here ]------------ Nov 5 04:10:08 cmxcen kernel: kernel BUG at lib/list_debug.c:70! Nov 5 04:10:08 cmxcen kernel: invalid opcode: 0000 [#1] Nov 5 04:10:08 cmxcen kernel: SMP Nov 5 04:10:08 cmxcen kernel: last sysfs file: /block/hdb/removable Nov 5 04:10:08 cmxcen kernel: Modules linked in: nls_utf8 cifs nfsd exportfs lockd nfs_acl sunrpc ipv6 dm_mirror dm_mod raid456 xor lp parport_pc parport sg tulip ide_cd serio_raw floppy cdrom pcspkr i2c_i801 i2c_core ext3 jbd BusLogic sd_mod scsi_mod Nov 5 04:10:08 cmxcen kernel: CPU: 1 Nov 5 04:10:08 cmxcen kernel: EIP: 0060:[<c04e850c>] Not tainted VLI Nov 5 04:10:08 cmxcen kernel: EFLAGS: 00010096 (2.6.18-1.2200.fc5smp #1) Nov 5 04:10:08 cmxcen kernel: EIP is at list_del+0x48/0x6c Nov 5 04:10:08 cmxcen kernel: eax: 00000048 ebx: d164a520 ecx: c06789d0 edx: 00000096 Nov 5 04:10:08 cmxcen kernel: esi: c149d6a0 edi: db6a4000 ebp: c149fca0 esp: c17d8ef8 Nov 5 04:10:08 cmxcen kernel: ds: 007b es: 007b ss: 0068 Nov 5 04:10:08 cmxcen kernel: Process events/1 (pid: 9, ti=c17d8000 task=c1500050 task.ti=c17d8000) Nov 5 04:10:08 cmxcen kernel: Stack: c063d4a6 d164a520 65746e69 d164a520 c046a3e5 c149cda0 00000005 c14b5340 Nov 5 04:10:08 cmxcen kernel: 00000000 c14b5340 00000005 c14b5320 00000000 c046a4e3 00000000 00000000 Nov 5 04:10:08 cmxcen kernel: c149fca0 c149d6c4 c149d6a0 c149fca0 c14b50a0 00000282 c046b919 00000000 Nov 5 04:10:08 cmxcen kernel: Call Trace: Nov 5 04:10:08 cmxcen kernel: [<c046a3e5>] free_block+0x68/0xdc Nov 5 04:10:08 cmxcen kernel: [<c046a4e3>] drain_array+0x8a/0xb5 Nov 5 04:10:08 cmxcen kernel: [<c046b919>] cache_reap+0x53/0x117 Nov 5 04:10:08 cmxcen kernel: [<c04340ef>] run_workqueue+0x86/0xc6 Nov 5 04:10:08 cmxcen kernel: [<c04349dd>] worker_thread+0xd9/0x10c Nov 5 04:10:08 cmxcen kernel: [<c0436eb3>] kthread+0xc0/0xed Nov 5 04:10:08 cmxcen kernel: [<c0404ccb>] kernel_thread_helper+0x7/0x10 Nov 5 04:10:08 cmxcen kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10 Nov 5 04:10:08 cmxcen kernel: Leftover inexact backtrace: Nov 5 04:10:08 cmxcen kernel: ======================= Nov 5 04:10:09 cmxcen kernel: Code: c0 e8 ab d7 f3 ff 0f 0b 41 00 95 d4 63 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 a6 d4 63 c0 e8 86 d7 f3 ff <0f> 0b 46 00 95 d4 63 c0 8b 13 8b 43 04 89 42 04 89 10 c7 43 04 Nov 5 04:10:09 cmxcen kernel: EIP: [<c04e850c>] list_del+0x48/0x6c SS:ESP 0068:c17d8ef8 Nov 5 04:10:09 cmxcen kernel: <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Nov 5 04:10:09 cmxcen kernel: in_atomic():0, irqs_disabled():1 Nov 5 04:10:09 cmxcen kernel: [<c04050ef>] dump_trace+0x69/0x1af Nov 5 04:10:09 cmxcen kernel: [<c040524d>] show_trace_log_lvl+0x18/0x2c Nov 5 04:10:09 cmxcen kernel: [<c0405800>] show_trace+0xf/0x11 Nov 5 04:10:09 cmxcen kernel: [<c04058fa>] dump_stack+0x15/0x17 Nov 5 04:10:09 cmxcen kernel: [<c04398a6>] down_read+0x12/0x20 Nov 5 04:10:09 cmxcen kernel: [<c0431adf>] blocking_notifier_call_chain+0xe/0x29 Nov 5 04:10:09 cmxcen kernel: [<c0427b14>] do_exit+0x1b/0x776 Nov 5 04:10:09 cmxcen kernel: [<c04057a1>] die+0x29d/0x2c2 Nov 5 04:10:09 cmxcen kernel: [<c0405ee3>] do_invalid_op+0xa2/0xab Nov 5 04:10:09 cmxcen kernel: [<c0404aa5>] error_code+0x39/0x40 Nov 5 04:10:09 cmxcen kernel: DWARF2 unwinder stuck at error_code+0x39/0x40 Nov 5 04:10:09 cmxcen kernel: Leftover inexact backtrace: Nov 5 04:10:10 cmxcen kernel: [<c04e850c>] list_del+0x48/0x6c Nov 5 04:10:10 cmxcen kernel: [<c046a3e5>] free_block+0x68/0xdc Nov 5 04:10:10 cmxcen kernel: [<c046a4e3>] drain_array+0x8a/0xb5 Nov 5 04:10:10 cmxcen kernel: [<c046b919>] cache_reap+0x53/0x117 Nov 5 04:10:10 cmxcen kernel: [<c04340ef>] run_workqueue+0x86/0xc6 Nov 5 04:10:10 cmxcen kernel: [<c046b8c6>] cache_reap+0x0/0x117 Nov 5 04:10:10 cmxcen kernel: [<c04349dd>] worker_thread+0xd9/0x10c Nov 5 04:10:10 cmxcen kernel: [<c041f568>] default_wake_function+0x0/0xc Nov 5 04:10:10 cmxcen kernel: [<c0434904>] worker_thread+0x0/0x10c Nov 5 04:10:10 cmxcen kernel: [<c0436eb3>] kthread+0xc0/0xed Nov 5 04:10:10 cmxcen kernel: [<c0436df3>] kthread+0x0/0xed Nov 5 04:10:10 cmxcen kernel: [<c0404ccb>] kernel_thread_helper+0x7/0x10 Nov 5 04:10:10 cmxcen kernel: ======================= Nov 5 11:02:46 cmxcen syslogd 1.4.1: restart (remote reception). There was also pages of logs to the console, which I could not copy. It seemed to be related to SMP spin locks. Sorry that is all I recall, I needed to get the system up and running again.
This is the same bug as bug 216001. In that bug they're doing backups too. Could you move to that bug and post your complete dmesg? I'm interested in what drivers you're using so maybe attach the lsmod output as well.
good eyes Dan, I'll dupe this over. It'll be interesting to know if the test kernels I point to in that bug fix this, as it includes the per-cpu bugfix (which got into 2.6.18.3) *** This bug has been marked as a duplicate of 216001 ***