Bug 217935 - Kernel 2.6.18-1.2239.fc5 crashing
Kernel 2.6.18-1.2239.fc5 crashing
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-30 15:23 EST by Wayne Sherman
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version: Kernel 2.6.19-1.2288.fc5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-03-02 14:25:03 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output on our server (11.98 KB, text/plain)
2006-12-26 15:50 EST, Wayne Sherman
no flags Details

  None (edit)
Description Wayne Sherman 2006-11-30 15:23:52 EST
Description of problem:
  I couldn't use kernel 2.6.18-1.2200.fc5 due to problems with CIFS mounting:

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211070
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212827

After 2.6.18-1.2239.fc5 came out it appeared to solve the CIFS issues I had
seen, but the stability is poor.  It crashes intermittently, while performing
different operations.  My machine is basically being used as a file backup
server.  It runs a backup script each night that mounts a cifs share on a
win2003 server and rsyncs the files to an automounted external USB drive
(FAT32).  It then mails the log results using mutt, and unmounts the cifs share.

Here is a partial log from one crash:

grump kernel: ------------[ cut here ]------------
grump kernel: kernel BUG at lib/list_debug.c:65!
grump kernel: invalid opcode: 0000 [#1]
grump kernel: CPU:    0
grump kernel: EIP is at list_del+0x23/0x6c
grump kernel: eax: 00000048   ebx: cba9b9a0   ecx: c0652330   edx: cf7df000
grump kernel: esi: cf7ed6a0   edi: c45ce000   ebp: cf7ef600   esp: cf7dfef8
grump kernel: ds: 007b   es: 007b   ss: 0068
grump kernel: Process events/0 (pid: 4, ti=cf7df000 task=cf6c0590 task.ti=cf7df000)
grump kernel: Stack: c0617c52 cba9b9a0 808ce480 cba9b9a0 c0457367 c04570d1
00000001 cf7edec0
grump kernel:        00000000 cf7edec0 00000001 cf7edea0 00000000 c045745f
00000000 00000000
grump kernel:        cf7ef600 cf7ed6c4 cf7ed6a0 cf7ef600 cf6f04a0 00000282
c0458402 00000000
grump kernel: Call Trace:
grump kernel:  [<c0457367>] free_block+0x65/0xd3
grump kernel:  [<c045745f>] drain_array+0x8a/0xb5
grump kernel:  [<c0458402>] cache_reap+0x3f/0xd6
grump kernel:  [<c0423edc>] run_workqueue+0x85/0xc5
grump kernel:  [<c04243da>] worker_thread+0xe8/0x11a
grump kernel:  [<c0426611>] kthread+0xad/0xd8
grump kernel:  [<c04032c7>] kernel_thread_helper+0x7/0x10
grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
grump kernel: Leftover inexact backtrace:
grump kernel:  =======================
grump kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39
d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 52 7c 61 c0 e8 29 3e f4 ff <0f> 0b 41
00 8f 7c 61 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04
grump kernel: EIP: [<c04d289f>] list_del+0x23/0x6c SS:ESP 0068:cf7dfef8

How reproducible:
  FC5 crashes Intermittently, but not on command.

Additional info:
  I have again reverted back to kernel 2.6.17-1.2187_FC5 and everything seems to
be working fine.  Stability problems with 2.6.18 in general (on FC5) and
2.6.18-1.2239 on FC5 have been reported here:

  http://lkml.org/lkml/2006/11/21/116
  http://sources.redhat.com/ml/frysk/2006-q4/msg00209.html
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216001
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216247
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=216474
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=211672
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217044
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217858

Thanks,

Wayne Sherman
Comment 1 Wayne Sherman 2006-12-26 15:27:53 EST
On 2006-Dec-19, I updated to the newly released kernel:

uname -a
Linux grump 2.6.18-1.2257.fc5 #1 Fri Dec 15 16:04:33 EST 2006 i686 i686 i386
GNU/Linux

The server ran for a few days and survived 4 nightly backup operations.  On Dec
23, the backup operation ran from 2:00am to 2:11am, completed without errors,
and reported success via email.  At 2:13am I these messages were logged from kernel:

<START MESSAGES>
Dec 23 02:13:04 grump kernel: list_del corruption. next->prev should be
ca9804e0, but was e580b0e4
Dec 23 02:13:04 grump kernel: ------------[ cut here ]------------
Dec 23 02:13:04 grump kernel: kernel BUG at lib/list_debug.c:70!
Dec 23 02:13:04 grump kernel: invalid opcode: 0000 [#1]
Dec 23 02:13:04 grump kernel: last sysfs file: /block/hda/hda1/size
Dec 23 02:13:04 grump kernel: Modules linked in: nls_utf8 cifs vfat fat nfsd
exportfs lockd nfs_acl sunrpc autofs4 ip_conntrack_netbios_ns ipt_MASQUERADE
iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables
dm_mirror dm_mod video sbs i2c_ec container button battery ac lp sd_mod sg
usb_storage scsi_mod uhci_hcd snd_via82xx gameport snd_ac97_codec snd_ac97_bus
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq serio_raw snd_pcm_oss
snd_mixer_oss cyblafb parport_pc parport 8139cp snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi snd_seq_device 8139too snd i2c_viapro via686a hwmon
mii i2c_isa soundcore i2c_core pcspkr ext3 jbd
Dec 23 02:13:04 grump kernel: CPU:    0
Dec 23 02:13:04 grump kernel: EIP:    0060:[<c04d0514>]    Not tainted VLI
Dec 23 02:13:04 grump kernel: EFLAGS: 00010096   (2.6.18-1.2257.fc5 #1)
Dec 23 02:13:04 grump kernel: EIP is at list_del+0x48/0x6c
Dec 23 02:13:04 grump kernel: eax: 00000048   ebx: ca9804e0   ecx: c064f350  
edx: cf7df000
Dec 23 02:13:04 grump kernel: esi: cf7ed6a0   edi: c4a9e000   ebp: cf7ef600  
esp: cf7dfef8
Dec 23 02:13:04 grump kernel: ds: 007b   es: 007b   ss: 0068
Dec 23 02:13:04 grump kernel: Process events/0 (pid: 4, ti=cf7df000
task=cf6c05a0 task.ti=cf7df000)
Dec 23 02:13:04 grump kernel: Stack: c0614a53 ca9804e0 e580b0e4 ca9804e0
c0454f7f c0454ce9 00000005 cf7edec0
Dec 23 02:13:04 grump kernel:        00000003 cf7edec0 00000005 cf7edea0
00000000 c0455077 00000000 00000000
Dec 23 02:13:04 grump kernel:        cf7ef600 cf7ed6c4 cf7ed6a0 cf7ef600
cf6f04a0 00000282 c045601a 00000000
Dec 23 02:13:04 grump kernel: Call Trace:
Dec 23 02:13:04 grump kernel:  [<c0454f7f>] free_block+0x65/0xd3
Dec 23 02:13:04 grump kernel:  [<c0455077>] drain_array+0x8a/0xb5
Dec 23 02:13:04 grump kernel:  [<c045601a>] cache_reap+0x3f/0xd6
Dec 23 02:13:04 grump kernel:  [<c0423958>] run_workqueue+0x85/0xc5
Dec 23 02:13:05 grump kernel:  [<c0423e56>] worker_thread+0xe8/0x11a
Dec 23 02:13:05 grump kernel:  [<c0426085>] kthread+0xad/0xd8
Dec 23 02:13:05 grump kernel:  [<c04032d7>] kernel_thread_helper+0x7/0x10
Dec 23 02:13:05 grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
Dec 23 02:13:05 grump kernel: Leftover inexact backtrace:
Dec 23 02:13:05 grump kernel:  =======================
Dec 23 02:13:05 grump kernel: Code: c0 e8 d5 62 f4 ff 0f 0b 41 00 42 4a 61 c0 8b
03 8b 40 04 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 53 4a 61 c0 e8 b0 62 f4
ff <0f> 0b 46 00 42 4a 61 c0 8b 13 8b 43 04 89 42 04 89 10 c7 43 04
Dec 23 02:13:05 grump kernel: EIP: [<c04d0514>] list_del+0x48/0x6c SS:ESP
0068:cf7dfef8
Dec 23 02:13:05 grump kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Dec 23 02:13:05 grump kernel: in_atomic():0, irqs_disabled():1
Dec 23 02:13:05 grump kernel:  [<c040371f>] dump_trace+0x69/0x1af
Dec 23 02:13:05 grump kernel:  [<c040387d>] show_trace_log_lvl+0x18/0x2c
Dec 23 02:13:05 grump kernel:  [<c0403df3>] show_trace+0xf/0x11
Dec 23 02:13:05 grump kernel:  [<c0403e7d>] dump_stack+0x15/0x17
Dec 23 02:13:05 grump kernel:  [<c042890c>] down_read+0x12/0x1f
Dec 23 02:13:05 grump kernel:  [<c0421270>] blocking_notifier_call_chain+0xe/0x29
Dec 23 02:13:05 grump kernel:  [<c0418437>] do_exit+0x1b/0x796
Dec 23 02:13:05 grump kernel:  [<c0403d94>] die+0x266/0x28b
Dec 23 02:13:05 grump kernel:  [<c040441b>] do_invalid_op+0xa2/0xab
Dec 23 02:13:05 grump kernel:  [<c0403079>] error_code+0x39/0x40
Dec 23 02:13:05 grump kernel: DWARF2 unwinder stuck at error_code+0x39/0x40
Dec 23 02:13:05 grump kernel: Leftover inexact backtrace:
Dec 23 02:13:05 grump kernel:  [<c04d0514>] list_del+0x48/0x6c
Dec 23 02:13:05 grump kernel:  [<c0454f7f>] free_block+0x65/0xd3
Dec 23 02:13:05 grump kernel:  [<c0454ce9>] kmem_freepages+0x7d/0x97
Dec 23 02:13:05 grump kernel:  [<c0455077>] drain_array+0x8a/0xb5
Dec 23 02:13:05 grump kernel:  [<c045601a>] cache_reap+0x3f/0xd6
Dec 23 02:13:05 grump kernel:  [<c0423958>] run_workqueue+0x85/0xc5
Dec 23 02:13:05 grump kernel:  [<c0455fdb>] cache_reap+0x0/0xd6
Dec 23 02:13:05 grump kernel:  [<c0423e56>] worker_thread+0xe8/0x11a
Dec 23 02:13:05 grump kernel:  [<c0412952>] default_wake_function+0x0/0xc
Dec 23 02:13:05 grump kernel:  [<c0423d6e>] worker_thread+0x0/0x11a
Dec 23 02:13:05 grump kernel:  [<c0426085>] kthread+0xad/0xd8
Dec 23 02:13:05 grump kernel:  [<c0425fd8>] kthread+0x0/0xd8
Dec 23 02:13:05 grump kernel:  [<c04032d7>] kernel_thread_helper+0x7/0x10
Dec 23 02:13:05 grump kernel:  =======================
<END MESSAGES>

It appears that the system continued to run for almost 2 more hours, at which
time the last log message was:

Dec 23 04:01:47 grump nmbd[2204]:   This response was from IP 192.168.141.5,
reporting an IP address of 192.168.141.5.

It must have crashed soon after.  The system was rebooted on the morning of 26th
when it was discovered.

We are still running 2.6.18-1.2257.fc5 to see if/when it happens again.
Comment 2 Wayne Sherman 2006-12-26 15:50:51 EST
Created attachment 144380 [details]
dmesg output on our server

Output of dmesg from our server.
Comment 3 Wayne Sherman 2006-12-26 15:52:46 EST
Forgot to mention, we have been using kernel 2.6.17-1.2187_FC5 on that server
without problems.  All released kernels after that have had issues on our
server.  In addition to the dmesg output posted above, here is some more info on
the motherboard and cpu:

> cat /proc/cpuinfo

processor	: 0
vendor_id	: CentaurHauls
cpu family	: 6
model		: 7
model name	: VIA Ezra
stepping	: 8
cpu MHz		: 797.961
cache size	: 64 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips	: 1597.16

> lspci 
00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia] (rev 05)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 1a)
00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 1a)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio
Controller (rev 50)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1 (rev 6a)
Comment 4 Wayne Sherman 2007-01-12 13:42:13 EST
Update:  Kernel 2.6.18-1.2257.fc5 still has intermittent problems.  After the
first crash incident reported above, the server ran for a few days without
noticeable problems.  Upon issuing a reboot command, the server hung with a
kernel error.  It then ran again fine for almost 2 weeks and then crashed again
during a nightly backup.

We are reverting back to kernel 2.6.17-1.2187_FC5.
Comment 5 Wayne Sherman 2007-01-12 13:51:11 EST
For that last crash, here is what the log contained:

Jan 12 02:00:04 grump kernel: BUG: unable to handle kernel paging request at
virtual address e580b0e4
Jan 12 02:00:04 grump kernel:  printing eip:
Jan 12 02:00:04 grump kernel: c04d04d5
Jan 12 02:00:04 grump kernel: *pde = 00000000
Jan 12 02:00:04 grump kernel: Oops: 0000 [#1]
Jan 12 02:00:04 grump kernel: last sysfs file: /block/hda/hda1/size
Jan 12 02:00:04 grump kernel: Modules linked in: nls_utf8 cifs vfat fat autofs4
dm_mirror dm_mod video sbs i2c_ec container button battery ac lp sd_mod sg
usb_storage scsi_mod uhci_hcd snd_via82xx gameport snd_ac97_codec snd_ac97_bus
cyblafb snd_seq_dummy serio_raw snd_seq_oss snd_seq_midi_event snd_seq
parport_pc 8139cp parport i2c_viapro via686a hwmon snd_pcm_oss 8139too
snd_mixer_oss mii i2c_isa i2c_core snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore pcspkr ext3 jbd
Jan 12 02:00:04 grump kernel: CPU:    0
Jan 12 02:00:04 grump kernel: EIP:    0060:[<c04d04d5>]    Not tainted VLI
Jan 12 02:00:04 grump kernel: EFLAGS: 00010092   (2.6.18-1.2257.fc5 #1)
Jan 12 02:00:04 grump kernel: EIP is at list_del+0x9/0x6c
Jan 12 02:00:04 grump kernel: eax: e580b0e4   ebx: c519c9a0   ecx: 00000002  
edx: c11a5320
Jan 12 02:00:04 grump kernel: esi: cf7eb7c0   edi: cd299000   ebp: cf7eff00  
esp: cf7dfef8
Jan 12 02:00:04 grump kernel: ds: 007b   es: 007b   ss: 0068
Jan 12 02:00:04 grump kernel: Process events/0 (pid: 4, ti=cf7df000
task=cf6c05a0 task.ti=cf7df000)
Jan 12 02:00:04 grump kernel: Stack: 0000000b cd36c000 cf7eb740 c519c9a0
c0454f7f c0454ce9 00000002 cf7eb860
Jan 12 02:00:04 grump kernel:        00000000 cf7eb860 00000002 cf7eb840
00000000 c0455077 00000000 00000000
Jan 12 02:00:04 grump kernel:        cf7eff00 cf7eb7e4 cf7eb7c0 cf7eff00
cf6f04a0 00000282 c045601a 00000000 
Jan 12 02:00:05 grump kernel: Call Trace:
Jan 12 02:00:05 grump kernel:  [<c0454f7f>] free_block+0x65/0xd3
Jan 12 02:00:05 grump kernel:  [<c0455077>] drain_array+0x8a/0xb5
Jan 12 02:00:05 grump kernel:  [<c045601a>] cache_reap+0x3f/0xd6
Jan 12 02:00:05 grump kernel:  [<c0423958>] run_workqueue+0x85/0xc5
Jan 12 02:00:05 grump kernel:  [<c0423e56>] worker_thread+0xe8/0x11a
Jan 12 02:00:05 grump kernel:  [<c0426085>] kthread+0xad/0xd8
Jan 12 02:00:05 grump kernel:  [<c04032d7>] kernel_thread_helper+0x7/0x10
Jan 12 02:00:05 grump kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
Jan 12 02:00:05 grump kernel: Leftover inexact backtrace:
Jan 12 02:00:05 grump kernel:  =======================
Jan 12 02:00:05 grump kernel: Code: 8d 46 04 e8 86 00 00 00 8d 4b 0c 8b 51 04 8d
46 0c 83 c4 14 5b 5e 5f e9 72 00 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40
04 <8b> 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 05 4a 61 c0 
Jan 12 02:00:05 grump kernel: EIP: [<c04d04d5>] list_del+0x9/0x6c SS:ESP
0068:cf7dfef8
Jan 12 02:00:05 grump kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Jan 12 02:00:05 grump kernel: in_atomic():0, irqs_disabled():1
Jan 12 02:00:05 grump kernel:  [<c040371f>] dump_trace+0x69/0x1af
Jan 12 02:00:05 grump kernel:  [<c040387d>] show_trace_log_lvl+0x18/0x2c
Jan 12 02:00:05 grump kernel:  [<c0403df3>] show_trace+0xf/0x11
Jan 12 02:00:05 grump kernel:  [<c0403e7d>] dump_stack+0x15/0x17
Jan 12 02:00:05 grump kernel:  [<c042890c>] down_read+0x12/0x1f
Jan 12 02:00:05 grump kernel:  [<c0421270>] blocking_notifier_call_chain+0xe/0x29
Jan 12 02:00:05 grump kernel:  [<c0418437>] do_exit+0x1b/0x796
Jan 12 02:00:05 grump kernel:  [<c0403d94>] die+0x266/0x28b
Jan 12 02:00:05 grump kernel:  [<c05eeb09>] do_page_fault+0x425/0x507
Jan 12 02:00:05 grump kernel:  [<c0403079>] error_code+0x39/0x40
Jan 12 02:00:05 grump kernel: DWARF2 unwinder stuck at error_code+0x39/0x40
Jan 12 02:00:05 grump kernel: Leftover inexact backtrace:
Jan 12 02:00:05 grump kernel:  [<c04d04d5>] list_del+0x9/0x6c
Jan 12 02:00:05 grump kernel:  [<c0454f7f>] free_block+0x65/0xd3
Jan 12 02:00:05 grump kernel:  [<c0454ce9>] kmem_freepages+0x7d/0x97
Jan 12 02:00:05 grump kernel:  [<c0455077>] drain_array+0x8a/0xb5
Jan 12 02:00:05 grump kernel:  [<c045601a>] cache_reap+0x3f/0xd6
Jan 12 02:00:05 grump kernel:  [<c0423958>] run_workqueue+0x85/0xc5
Jan 12 02:00:05 grump kernel:  [<c0455fdb>] cache_reap+0x0/0xd6
Jan 12 02:00:05 grump kernel:  [<c0423e56>] worker_thread+0xe8/0x11a
Jan 12 02:00:05 grump kernel:  [<c0412952>] default_wake_function+0x0/0xc
Jan 12 02:00:05 grump kernel:  [<c0423d6e>] worker_thread+0x0/0x11a
Jan 12 02:00:05 grump kernel:  [<c0426085>] kthread+0xad/0xd8
Jan 12 02:00:05 grump kernel:  [<c0425fd8>] kthread+0x0/0xd8
Jan 12 02:00:05 grump kernel:  [<c04032d7>] kernel_thread_helper+0x7/0x10
Jan 12 02:00:05 grump kernel:  =======================
Comment 6 Wayne Sherman 2007-02-06 11:33:59 EST
Ok, I have confirmed one of the kernel components that is causing
errors/crashes.  On another machine that has been working fine with the latest
FC5 updates and kernel (2.6.18-1.2257.fc5) I manually mounted a windows 2000
share using CIFS:

mount -t CIFS //192.168.1.108/d$ /mnt/winserv -o user=a_user,pass=a_pass

When I was browsing the files with mc and I started getting kernel errors.  Upon
trying to reboot the system crashed.

So, I think CIFS in these newer kernels has bugs that are causing crashes and
stability issues.  I havent have problems with 2.6.17-1.2187_FC5, so perhaps
whatever changed after that version can be examined to find the cause.
Comment 7 Wayne Sherman 2007-03-02 14:25:03 EST
Updated kernel to 2.6.19-1.2288.fc5 a couple of weeks ago and everything seems
to be running fine now. No crashes.  I tested on two machines.  Good job.


Note You need to log in before you can comment on or make changes to this bug.