Description of problem: I couldn't use kernel 2.6.18-1.2200.fc5 due to problems with CIFS mounting: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211070 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212827 After 2.6.18-1.2239.fc5 came out it appeared to solve the CIFS issues I had seen, but the stability is poor. It crashes intermittently, while performing different operations. My machine is basically being used as a file backup server. It runs a backup script each night that mounts a cifs share on a win2003 server and rsyncs the files to an automounted external USB drive (FAT32). It then mails the log results using mutt, and unmounts the cifs share. Here is a partial log from one crash: grump kernel: ------------[ cut here ]------------ grump kernel: kernel BUG at lib/list_debug.c:65! grump kernel: invalid opcode: 0000 [#1] grump kernel: CPU: 0 grump kernel: EIP is at list_del+0x23/0x6c grump kernel: eax: 00000048 ebx: cba9b9a0 ecx: c0652330 edx: cf7df000 grump kernel: esi: cf7ed6a0 edi: c45ce000 ebp: cf7ef600 esp: cf7dfef8 grump kernel: ds: 007b es: 007b ss: 0068 grump kernel: Process events/0 (pid: 4, ti=cf7df000 task=cf6c0590 task.ti=cf7df000) grump kernel: Stack: c0617c52 cba9b9a0 808ce480 cba9b9a0 c0457367 c04570d1 00000001 cf7edec0 grump kernel: 00000000 cf7edec0 00000001 cf7edea0 00000000 c045745f 00000000 00000000 grump kernel: cf7ef600 cf7ed6c4 cf7ed6a0 cf7ef600 cf6f04a0 00000282 c0458402 00000000 grump kernel: Call Trace: grump kernel: [<c0457367>] free_block+0x65/0xd3 grump kernel: [<c045745f>] drain_array+0x8a/0xb5 grump kernel: [<c0458402>] cache_reap+0x3f/0xd6 grump kernel: [<c0423edc>] run_workqueue+0x85/0xc5 grump kernel: [<c04243da>] worker_thread+0xe8/0x11a grump kernel: [<c0426611>] kthread+0xad/0xd8 grump kernel: [<c04032c7>] kernel_thread_helper+0x7/0x10 grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10 grump kernel: Leftover inexact backtrace: grump kernel: ======================= grump kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 52 7c 61 c0 e8 29 3e f4 ff <0f> 0b 41 00 8f 7c 61 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04 grump kernel: EIP: [<c04d289f>] list_del+0x23/0x6c SS:ESP 0068:cf7dfef8 How reproducible: FC5 crashes Intermittently, but not on command. Additional info: I have again reverted back to kernel 2.6.17-1.2187_FC5 and everything seems to be working fine. Stability problems with 2.6.18 in general (on FC5) and 2.6.18-1.2239 on FC5 have been reported here: http://lkml.org/lkml/2006/11/21/116 http://sources.redhat.com/ml/frysk/2006-q4/msg00209.html https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216001 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216247 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216474 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211672 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217044 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217858 Thanks, Wayne Sherman
On 2006-Dec-19, I updated to the newly released kernel: uname -a Linux grump 2.6.18-1.2257.fc5 #1 Fri Dec 15 16:04:33 EST 2006 i686 i686 i386 GNU/Linux The server ran for a few days and survived 4 nightly backup operations. On Dec 23, the backup operation ran from 2:00am to 2:11am, completed without errors, and reported success via email. At 2:13am I these messages were logged from kernel: <START MESSAGES> Dec 23 02:13:04 grump kernel: list_del corruption. next->prev should be ca9804e0, but was e580b0e4 Dec 23 02:13:04 grump kernel: ------------[ cut here ]------------ Dec 23 02:13:04 grump kernel: kernel BUG at lib/list_debug.c:70! Dec 23 02:13:04 grump kernel: invalid opcode: 0000 [#1] Dec 23 02:13:04 grump kernel: last sysfs file: /block/hda/hda1/size Dec 23 02:13:04 grump kernel: Modules linked in: nls_utf8 cifs vfat fat nfsd exportfs lockd nfs_acl sunrpc autofs4 ip_conntrack_netbios_ns ipt_MASQUERADE iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables dm_mirror dm_mod video sbs i2c_ec container button battery ac lp sd_mod sg usb_storage scsi_mod uhci_hcd snd_via82xx gameport snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq serio_raw snd_pcm_oss snd_mixer_oss cyblafb parport_pc parport 8139cp snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device 8139too snd i2c_viapro via686a hwmon mii i2c_isa soundcore i2c_core pcspkr ext3 jbd Dec 23 02:13:04 grump kernel: CPU: 0 Dec 23 02:13:04 grump kernel: EIP: 0060:[<c04d0514>] Not tainted VLI Dec 23 02:13:04 grump kernel: EFLAGS: 00010096 (2.6.18-1.2257.fc5 #1) Dec 23 02:13:04 grump kernel: EIP is at list_del+0x48/0x6c Dec 23 02:13:04 grump kernel: eax: 00000048 ebx: ca9804e0 ecx: c064f350 edx: cf7df000 Dec 23 02:13:04 grump kernel: esi: cf7ed6a0 edi: c4a9e000 ebp: cf7ef600 esp: cf7dfef8 Dec 23 02:13:04 grump kernel: ds: 007b es: 007b ss: 0068 Dec 23 02:13:04 grump kernel: Process events/0 (pid: 4, ti=cf7df000 task=cf6c05a0 task.ti=cf7df000) Dec 23 02:13:04 grump kernel: Stack: c0614a53 ca9804e0 e580b0e4 ca9804e0 c0454f7f c0454ce9 00000005 cf7edec0 Dec 23 02:13:04 grump kernel: 00000003 cf7edec0 00000005 cf7edea0 00000000 c0455077 00000000 00000000 Dec 23 02:13:04 grump kernel: cf7ef600 cf7ed6c4 cf7ed6a0 cf7ef600 cf6f04a0 00000282 c045601a 00000000 Dec 23 02:13:04 grump kernel: Call Trace: Dec 23 02:13:04 grump kernel: [<c0454f7f>] free_block+0x65/0xd3 Dec 23 02:13:04 grump kernel: [<c0455077>] drain_array+0x8a/0xb5 Dec 23 02:13:04 grump kernel: [<c045601a>] cache_reap+0x3f/0xd6 Dec 23 02:13:04 grump kernel: [<c0423958>] run_workqueue+0x85/0xc5 Dec 23 02:13:05 grump kernel: [<c0423e56>] worker_thread+0xe8/0x11a Dec 23 02:13:05 grump kernel: [<c0426085>] kthread+0xad/0xd8 Dec 23 02:13:05 grump kernel: [<c04032d7>] kernel_thread_helper+0x7/0x10 Dec 23 02:13:05 grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10 Dec 23 02:13:05 grump kernel: Leftover inexact backtrace: Dec 23 02:13:05 grump kernel: ======================= Dec 23 02:13:05 grump kernel: Code: c0 e8 d5 62 f4 ff 0f 0b 41 00 42 4a 61 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 53 4a 61 c0 e8 b0 62 f4 ff <0f> 0b 46 00 42 4a 61 c0 8b 13 8b 43 04 89 42 04 89 10 c7 43 04 Dec 23 02:13:05 grump kernel: EIP: [<c04d0514>] list_del+0x48/0x6c SS:ESP 0068:cf7dfef8 Dec 23 02:13:05 grump kernel: <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Dec 23 02:13:05 grump kernel: in_atomic():0, irqs_disabled():1 Dec 23 02:13:05 grump kernel: [<c040371f>] dump_trace+0x69/0x1af Dec 23 02:13:05 grump kernel: [<c040387d>] show_trace_log_lvl+0x18/0x2c Dec 23 02:13:05 grump kernel: [<c0403df3>] show_trace+0xf/0x11 Dec 23 02:13:05 grump kernel: [<c0403e7d>] dump_stack+0x15/0x17 Dec 23 02:13:05 grump kernel: [<c042890c>] down_read+0x12/0x1f Dec 23 02:13:05 grump kernel: [<c0421270>] blocking_notifier_call_chain+0xe/0x29 Dec 23 02:13:05 grump kernel: [<c0418437>] do_exit+0x1b/0x796 Dec 23 02:13:05 grump kernel: [<c0403d94>] die+0x266/0x28b Dec 23 02:13:05 grump kernel: [<c040441b>] do_invalid_op+0xa2/0xab Dec 23 02:13:05 grump kernel: [<c0403079>] error_code+0x39/0x40 Dec 23 02:13:05 grump kernel: DWARF2 unwinder stuck at error_code+0x39/0x40 Dec 23 02:13:05 grump kernel: Leftover inexact backtrace: Dec 23 02:13:05 grump kernel: [<c04d0514>] list_del+0x48/0x6c Dec 23 02:13:05 grump kernel: [<c0454f7f>] free_block+0x65/0xd3 Dec 23 02:13:05 grump kernel: [<c0454ce9>] kmem_freepages+0x7d/0x97 Dec 23 02:13:05 grump kernel: [<c0455077>] drain_array+0x8a/0xb5 Dec 23 02:13:05 grump kernel: [<c045601a>] cache_reap+0x3f/0xd6 Dec 23 02:13:05 grump kernel: [<c0423958>] run_workqueue+0x85/0xc5 Dec 23 02:13:05 grump kernel: [<c0455fdb>] cache_reap+0x0/0xd6 Dec 23 02:13:05 grump kernel: [<c0423e56>] worker_thread+0xe8/0x11a Dec 23 02:13:05 grump kernel: [<c0412952>] default_wake_function+0x0/0xc Dec 23 02:13:05 grump kernel: [<c0423d6e>] worker_thread+0x0/0x11a Dec 23 02:13:05 grump kernel: [<c0426085>] kthread+0xad/0xd8 Dec 23 02:13:05 grump kernel: [<c0425fd8>] kthread+0x0/0xd8 Dec 23 02:13:05 grump kernel: [<c04032d7>] kernel_thread_helper+0x7/0x10 Dec 23 02:13:05 grump kernel: ======================= <END MESSAGES> It appears that the system continued to run for almost 2 more hours, at which time the last log message was: Dec 23 04:01:47 grump nmbd[2204]: This response was from IP 192.168.141.5, reporting an IP address of 192.168.141.5. It must have crashed soon after. The system was rebooted on the morning of 26th when it was discovered. We are still running 2.6.18-1.2257.fc5 to see if/when it happens again.
Created attachment 144380 [details] dmesg output on our server Output of dmesg from our server.
Forgot to mention, we have been using kernel 2.6.17-1.2187_FC5 on that server without problems. All released kernels after that have had issues on our server. In addition to the dmesg output posted above, here is some more info on the motherboard and cpu: > cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Ezra stepping : 8 cpu MHz : 797.961 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1597.16 > lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia] (rev 05) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP] 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1a) 00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1a) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50) 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1 (rev 6a)
Update: Kernel 2.6.18-1.2257.fc5 still has intermittent problems. After the first crash incident reported above, the server ran for a few days without noticeable problems. Upon issuing a reboot command, the server hung with a kernel error. It then ran again fine for almost 2 weeks and then crashed again during a nightly backup. We are reverting back to kernel 2.6.17-1.2187_FC5.
For that last crash, here is what the log contained: Jan 12 02:00:04 grump kernel: BUG: unable to handle kernel paging request at virtual address e580b0e4 Jan 12 02:00:04 grump kernel: printing eip: Jan 12 02:00:04 grump kernel: c04d04d5 Jan 12 02:00:04 grump kernel: *pde = 00000000 Jan 12 02:00:04 grump kernel: Oops: 0000 [#1] Jan 12 02:00:04 grump kernel: last sysfs file: /block/hda/hda1/size Jan 12 02:00:04 grump kernel: Modules linked in: nls_utf8 cifs vfat fat autofs4 dm_mirror dm_mod video sbs i2c_ec container button battery ac lp sd_mod sg usb_storage scsi_mod uhci_hcd snd_via82xx gameport snd_ac97_codec snd_ac97_bus cyblafb snd_seq_dummy serio_raw snd_seq_oss snd_seq_midi_event snd_seq parport_pc 8139cp parport i2c_viapro via686a hwmon snd_pcm_oss 8139too snd_mixer_oss mii i2c_isa i2c_core snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore pcspkr ext3 jbd Jan 12 02:00:04 grump kernel: CPU: 0 Jan 12 02:00:04 grump kernel: EIP: 0060:[<c04d04d5>] Not tainted VLI Jan 12 02:00:04 grump kernel: EFLAGS: 00010092 (2.6.18-1.2257.fc5 #1) Jan 12 02:00:04 grump kernel: EIP is at list_del+0x9/0x6c Jan 12 02:00:04 grump kernel: eax: e580b0e4 ebx: c519c9a0 ecx: 00000002 edx: c11a5320 Jan 12 02:00:04 grump kernel: esi: cf7eb7c0 edi: cd299000 ebp: cf7eff00 esp: cf7dfef8 Jan 12 02:00:04 grump kernel: ds: 007b es: 007b ss: 0068 Jan 12 02:00:04 grump kernel: Process events/0 (pid: 4, ti=cf7df000 task=cf6c05a0 task.ti=cf7df000) Jan 12 02:00:04 grump kernel: Stack: 0000000b cd36c000 cf7eb740 c519c9a0 c0454f7f c0454ce9 00000002 cf7eb860 Jan 12 02:00:04 grump kernel: 00000000 cf7eb860 00000002 cf7eb840 00000000 c0455077 00000000 00000000 Jan 12 02:00:04 grump kernel: cf7eff00 cf7eb7e4 cf7eb7c0 cf7eff00 cf6f04a0 00000282 c045601a 00000000 Jan 12 02:00:05 grump kernel: Call Trace: Jan 12 02:00:05 grump kernel: [<c0454f7f>] free_block+0x65/0xd3 Jan 12 02:00:05 grump kernel: [<c0455077>] drain_array+0x8a/0xb5 Jan 12 02:00:05 grump kernel: [<c045601a>] cache_reap+0x3f/0xd6 Jan 12 02:00:05 grump kernel: [<c0423958>] run_workqueue+0x85/0xc5 Jan 12 02:00:05 grump kernel: [<c0423e56>] worker_thread+0xe8/0x11a Jan 12 02:00:05 grump kernel: [<c0426085>] kthread+0xad/0xd8 Jan 12 02:00:05 grump kernel: [<c04032d7>] kernel_thread_helper+0x7/0x10 Jan 12 02:00:05 grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10 Jan 12 02:00:05 grump kernel: Leftover inexact backtrace: Jan 12 02:00:05 grump kernel: ======================= Jan 12 02:00:05 grump kernel: Code: 8d 46 04 e8 86 00 00 00 8d 4b 0c 8b 51 04 8d 46 0c 83 c4 14 5b 5e 5f e9 72 00 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 <8b> 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 05 4a 61 c0 Jan 12 02:00:05 grump kernel: EIP: [<c04d04d5>] list_del+0x9/0x6c SS:ESP 0068:cf7dfef8 Jan 12 02:00:05 grump kernel: <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 Jan 12 02:00:05 grump kernel: in_atomic():0, irqs_disabled():1 Jan 12 02:00:05 grump kernel: [<c040371f>] dump_trace+0x69/0x1af Jan 12 02:00:05 grump kernel: [<c040387d>] show_trace_log_lvl+0x18/0x2c Jan 12 02:00:05 grump kernel: [<c0403df3>] show_trace+0xf/0x11 Jan 12 02:00:05 grump kernel: [<c0403e7d>] dump_stack+0x15/0x17 Jan 12 02:00:05 grump kernel: [<c042890c>] down_read+0x12/0x1f Jan 12 02:00:05 grump kernel: [<c0421270>] blocking_notifier_call_chain+0xe/0x29 Jan 12 02:00:05 grump kernel: [<c0418437>] do_exit+0x1b/0x796 Jan 12 02:00:05 grump kernel: [<c0403d94>] die+0x266/0x28b Jan 12 02:00:05 grump kernel: [<c05eeb09>] do_page_fault+0x425/0x507 Jan 12 02:00:05 grump kernel: [<c0403079>] error_code+0x39/0x40 Jan 12 02:00:05 grump kernel: DWARF2 unwinder stuck at error_code+0x39/0x40 Jan 12 02:00:05 grump kernel: Leftover inexact backtrace: Jan 12 02:00:05 grump kernel: [<c04d04d5>] list_del+0x9/0x6c Jan 12 02:00:05 grump kernel: [<c0454f7f>] free_block+0x65/0xd3 Jan 12 02:00:05 grump kernel: [<c0454ce9>] kmem_freepages+0x7d/0x97 Jan 12 02:00:05 grump kernel: [<c0455077>] drain_array+0x8a/0xb5 Jan 12 02:00:05 grump kernel: [<c045601a>] cache_reap+0x3f/0xd6 Jan 12 02:00:05 grump kernel: [<c0423958>] run_workqueue+0x85/0xc5 Jan 12 02:00:05 grump kernel: [<c0455fdb>] cache_reap+0x0/0xd6 Jan 12 02:00:05 grump kernel: [<c0423e56>] worker_thread+0xe8/0x11a Jan 12 02:00:05 grump kernel: [<c0412952>] default_wake_function+0x0/0xc Jan 12 02:00:05 grump kernel: [<c0423d6e>] worker_thread+0x0/0x11a Jan 12 02:00:05 grump kernel: [<c0426085>] kthread+0xad/0xd8 Jan 12 02:00:05 grump kernel: [<c0425fd8>] kthread+0x0/0xd8 Jan 12 02:00:05 grump kernel: [<c04032d7>] kernel_thread_helper+0x7/0x10 Jan 12 02:00:05 grump kernel: =======================
Ok, I have confirmed one of the kernel components that is causing errors/crashes. On another machine that has been working fine with the latest FC5 updates and kernel (2.6.18-1.2257.fc5) I manually mounted a windows 2000 share using CIFS: mount -t CIFS //192.168.1.108/d$ /mnt/winserv -o user=a_user,pass=a_pass When I was browsing the files with mc and I started getting kernel errors. Upon trying to reboot the system crashed. So, I think CIFS in these newer kernels has bugs that are causing crashes and stability issues. I havent have problems with 2.6.17-1.2187_FC5, so perhaps whatever changed after that version can be examined to find the cause.
Updated kernel to 2.6.19-1.2288.fc5 a couple of weeks ago and everything seems to be running fine now. No crashes. I tested on two machines. Good job.