211672 – System crash with the kernel 2.6.18

Bug 211672 - System crash with the kernel 2.6.18

Summary: System crash with the kernel 2.6.18

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	5
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	221787 (view as bug list)
Depends On:
Blocks:	224359
TreeView+	depends on / blocked

Reported:	2006-10-20 18:45 UTC by Akemi Yagi
Modified:	2015-01-04 22:29 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-04-18 05:24:05 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Relevant lines from /var/adm/messages (7.24 KB, text/plain) 2006-10-20 18:45 UTC, Akemi Yagi	no flags	Details
1st example (5.39 KB, text/plain) 2006-11-02 05:40 UTC, Akemi Yagi	no flags	Details
2nd example (7.09 KB, text/plain) 2006-11-02 05:41 UTC, Akemi Yagi	no flags	Details
3rd example (4.71 KB, text/plain) 2006-11-02 05:42 UTC, Akemi Yagi	no flags	Details
4th example (5.19 KB, text/plain) 2006-11-02 05:43 UTC, Akemi Yagi	no flags	Details
dmesg with cifs debugging enabled (38.98 KB, text/plain) 2006-11-02 16:09 UTC, Akemi Yagi	no flags	Details
View All

Description Akemi Yagi 2006-10-20 18:45:13 UTC

Description of problem: I updated to kernel 2.6.18-1.2200.fc5smp.  The system
crashes a few minutes after I start working in X (KDE).  I am using the nvidia
driver.  Crash-related lines from the /var/adm/messages files are attached.


Version-Release number of selected component (if applicable):2.6.18-1.2200.fc5smp


How reproducible:Booted the system twice and crashed both times


Steps to Reproduce:
1.boot the 2.6.18 kernel
2.start X
3.work for a few minutes
  
Actual results:
Total system freeze

Expected results:


Additional info:

Comment 1 Akemi Yagi 2006-10-20 18:45:13 UTC

Created attachment 139017 [details]
Relevant lines from /var/adm/messages

Comment 2 Dave Jones 2006-10-20 19:35:59 UTC

This is a bug in the nvidia module. Only nvidia can fix this.

Comment 3 Akemi Yagi 2006-10-23 16:17:42 UTC

I have uninstalled the Nvidia driver from the system and used the nv driver that
came with the distribution.  I still have exactly the same symptom - a hard lockup.

Comment 4 Akemi Yagi 2006-10-25 15:26:26 UTC

Since FC6 came out yesterday, I did a fresh installation of it.  The system has
not crashed so far (for 1 day).

Comment 5 Akemi Yagi 2006-10-27 22:08:04 UTC

After 3 days of stable operation of freshly installed FC6, the crash problem
came back.  It started happening when/after I created links to remote Windows
shares on my Desktop (KDE).  Those shares are cifs-mounted through autofs.  It
may be possible that my problem is related to bug# 211070 (although mounting
itself is not a problem whether it is done manually or via autofs).

Comment 6 Akemi Yagi 2006-10-31 17:28:46 UTC

I am now more convinced that my system crash is related to cifs.  As far as I
stay away from cifs mounting, the system is stable.

Comment 7 Steve French 2006-11-02 04:10:20 UTC

I did not see cifs symbols in the call stack of the crash.  Do you have another
example of dmesg output we could look at?

Any chance you could reproduce the crash with cifs debugging enabled and send
the dmesg output ("echo 1 > /proc/fs/cifs/cifsFYI") - since this could log a lot
to dmesg even if started just before the failure, it may wrap the buffer but it
might be helpful to see the operation before the list del.

If the target directory or subdirectory has more than 100 files and you are not
running the fixed version of fc6, it might run into the EINVAL on readdir
problem - but that seems unlikely to cause list corruption

Comment 8 Akemi Yagi 2006-11-02 05:40:40 UTC

Created attachment 140084 [details]
1st example

Comment 9 Akemi Yagi 2006-11-02 05:41:51 UTC

Created attachment 140085 [details]
2nd example

Comment 10 Akemi Yagi 2006-11-02 05:42:40 UTC

Created attachment 140086 [details]
3rd example

Comment 11 Akemi Yagi 2006-11-02 05:43:50 UTC

Created attachment 140087 [details]
4th example

Comment 12 Akemi Yagi 2006-11-02 05:48:31 UTC

I sent 4 other examples from dmesg output.

As for debugging, could you tell me a bit more about how this is done?
Thanks for looking into the problem.

Comment 13 Akemi Yagi 2006-11-02 06:29:36 UTC

If I understand correctly, I just need to set cifsFYI to 1 to enable debugging?
 No need to compile the cifs modules with additional flags?

Comment 14 Akemi Yagi 2006-11-02 16:06:55 UTC

OK, I enabled cifs debugging and "successfully" crashed the system.  The
attached file is from /var/log/messages.

Comment 15 Akemi Yagi 2006-11-02 16:09:37 UTC

Created attachment 140143 [details]
dmesg with cifs debugging enabled

Comment 16 Akemi Yagi 2006-11-09 00:24:21 UTC

Bug#214622 apparently has the same root cause as mine.

Comment 17 Akemi Yagi 2006-11-10 19:10:29 UTC

This is just a confirmation that the 2.6.18-1 kernel in testing has the same
problem as expected because the bug in this report (as well as in bug#214622)
has not been addressed.

Comment 18 Akemi Yagi 2006-11-21 15:07:55 UTC

Found this post on the kernel maillist:

Subject:      Kernel panic in cifs_revalidate
From:         "Chakri n" <chakriin5>
Newsgroups:   gmane.linux.kernel
Date:         Tue, 21 Nov 2006 00:24:40 -0800

Hi,

I am seeing a kernel panic in cifs module. It seems to be a result of
invalid inode entry in dentry for the file it is trying to validate.

The inode->i_ino is set zero and inode->i_mapping is set to NULL in
the inode pointer in the dentry (0xdf8ea200) structure. I went through
the cifs code and could not find any valid case that could trigger
this situation. Is there any case which can lead to this situation?

0xed47fe70 0xc0133b30 filemap_fdatawait+0x20 (0x0, 0xe0e1c780, 0x0,
0xf5b35000, 0x0)
                               kernel .text 0xc0100000 0xc0133b10 0xc0133bc0
0xed47feb8 0xf8b49855 [cifs]cifs_revalidate+0x225 (0xdf8ea200)
                               cifs .text 0xf8b27060 0xf8b49630 0xf8b49af0
0xed47fec4 0xf8b3ec71 [cifs]cifs_d_revalidate+0x11 (0xdf8ea200, 0x0, 0xef47a031)
                               cifs .text 0xf8b27060 0xf8b3ec60 0xf8b3ec7d
0xed47fed8 0xc0151c03 cached_lookup+0x43 (0xe8e03a00, 0xed47fefc, 0x0,
0x1, 0xe7f5b0f8)
                               kernel .text 0xc0100000 0xc0151bc0 0xc0151c20
0xed47ff18 0xc01522a8 link_path_walk+0x3e8
                               kernel .text 0xc0100000 0xc0151ec0 0xc0152610
0xed47ff20 0xc0152629 path_walk+0x19 (0x8002, 0x8003, 0x83141a0)
                               kernel .text 0xc0100000 0xc0152610 0xc0152630
0xed47ff34 0xc015280a path_lookup+0x3a (0x0, 0x0, 0x2, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc01527d0 0xc0152810
0xed47ff64 0xc0152d3a open_namei+0x6a (0xef47a000, 0x8003, 0x0,
0xed47ff7c, 0xe8e03a00)
                               kernel .text 0xc0100000 0xc0152cd0 0xc0153260
0xed47ffa0 0xc01448b1 filp_open+0x41 (0xef47a000, 0x8002, 0x0,
0xed47e000, 0x8002)
                               kernel .text 0xc0100000 0xc0144870 0xc01448e0
0xed47ffbc 0xc0144ca1 sys_open+0x51 (0x83141a0, 0x8002, 0x0, 0x8002, 0x83141a0)

Thanks
--Chakri

Comment 19 Steve French 2006-11-27 21:35:32 UTC

I am not aware of any cases the cifs code where you could reference from dentry
to invalid inode, but the attachment posted was caused by list_del ... (which
seems at first glance to be unrelated) it would be useful to know what list it
was trying to delete (since there is no cifs code in the call stack there).  It
looks like the kernel dmesg log is overflowing - any chance you could delete the
message log ("dmesg -c") and use a kernel (rebuild with different debug config
options) with a larger dmesg size so the dmesg log does not drop so many entries.

Comment 20 Akemi Yagi 2006-11-29 00:43:37 UTC

This is an update of my status.

In an attmpt to debug the problem better, I installed FC6 as a guest in VMware.
 After many times of cifs-mounting, kernel panic has not happened.  This made me
wonder what the difference between the virtual FC6 and the real one.  One
obvious difference is that the VM is running on a single processor and the two
machines that have the lockup problem are dual-core Athlons.

After searching through postings/maillists on the net, I came across what might
be related to my issue.  It is regarding the AMD processors and the choice of
clocksource.  Following someone's post, I booted the system with the
clocksource=acpi_pm option.  No crash despite cifs-mounts.  Then I booted again
without that option.  The machine locked up as soon as I did a cifs-mount.  I
went back with the acpi_pm option again, and it has been running stable for a
few hours now.

With some more digging I found that the system uses hpet if I do not specify the
clocksource.  The VM uses TSC.

But I still do not understand why and how cifs is involved.  Hope this new piece
of info helps resolve the problem.

Comment 21 Need Real Name 2006-11-29 17:10:14 UTC

The problem also seems to be resolved for me when I set clocksource=acpi_pm . 
Ive been running for a while with that option and remounted the volume several
times with no crashes.  My cpu information:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 2992.602
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips        : 7485.77

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 2992.602
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc up pni monitor ds_cpl cid xtpr
bogomips        : 7485.77

Comment 22 Need Real Name 2006-11-29 17:58:41 UTC

sorry, I spoke too soon.  After another 45 minutes using the system and having a
CIFS mount, my system had a panic:

Nov 29 12:50:21 x kernel: BUG: unable to handle kernel paging request at virtual
address 0080b4e4
Nov 29 12:50:21 x kernel:  printing eip:
Nov 29 12:50:21 x kernel: c04e0b51
Nov 29 12:50:21 x kernel: 2c0d3000 -> *pde = 00000000:14eb0001
Nov 29 12:50:21 x kernel: 292b0000 -> *pme = 00000000:14ed4067
Nov 29 12:50:21 x kernel: 292d4000 -> *pte = 00000000:00000000
Nov 29 12:50:21 x kernel: Oops: 0000 [#1]
Nov 29 12:50:21 x kernel: SMP
Nov 29 12:50:21 x kernel: last sysfs file: /power/state
Nov 29 12:50:21 x kernel: Modules linked in: nls_utf8 cifs bridge netloop netbk
blktap blkbk hidp l2cap bluetooth sunrpc dm_mirror dm_multipath dm_mod video sbs
i2c_ec i2c_core button battery asus_acpi ac sg ipv6 parport_pc lp parport floppy
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore pcspkr tg3 snd_page_alloc i82875p_edac serio_raw edac_mc
usb_storage ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Nov 29 12:50:21 x kernel: CPU:    0
Nov 29 12:50:21 x kernel: EIP:    0061:[<c04e0b51>]    Not tainted VLI
Nov 29 12:50:21 x kernel: EFLAGS: 00010096   (2.6.18-1.2849.fc6xen #1)
Nov 29 12:50:21 x kernel: EIP is at list_del+0x9/0x6c
Nov 29 12:50:21 x kernel: eax: 0080b4e4   ebx: e8897f20   ecx: 00000006   edx:
00000000
Nov 29 12:50:21 x kernel: esi: ed7fd7c0   edi: e89b3000   ebp: c0d39dc0   esp:
c0b4eefc
Nov 29 12:50:21 x kernel: ds: 007b   es: 007b   ss: 0069
Nov 29 12:50:21 x kernel: Process events/0 (pid: 8, ti=c0b4e000 task=ed7c25e0
task.ti=c0b4e000)
Nov 29 12:50:21 x kernel: Stack: c14fc400 e7f2d040 ed7fd340 e8897f20 c0462271
c0b1ca80 00000006 00000000
Nov 29 12:50:21 x kernel:        ed7f8220 ed7f8220 00000006 ed7f8200 00000000
c0462374 00000000 00000000
Nov 29 12:50:21 x kernel:        c0d39dc0 ed7fd7e4 ed7fd7c0 c0d39dc0 ed7c9cc0
00000000 c0463814 00000000
Nov 29 12:50:21 x kernel: Call Trace:
Nov 29 12:50:21 x kernel:  [<c0462271>] free_block+0x63/0xdc
Nov 29 12:50:21 x kernel:  [<c0462374>] drain_array+0x8a/0xb5
Nov 29 12:50:21 x kernel:  [<c0463814>] cache_reap+0x85/0x117
Nov 29 12:50:21 x kernel:  [<c042b210>] run_workqueue+0x83/0xc5
Nov 29 12:50:21 x kernel:  [<c042bb00>] worker_thread+0xd9/0x10d
Nov 29 12:50:21 x kernel:  [<c042e013>] kthread+0xc0/0xed
Nov 29 12:50:21 x kernel:  [<c0402a69>] kernel_thread_helper+0x5/0xb
Nov 29 12:50:21 x kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Nov 29 12:50:21 x kernel:
Nov 29 12:50:21 x kernel: Leftover inexact backtrace:
Nov 29 12:50:21 x kernel:
Nov 29 12:50:21 x kernel:  =======================
Nov 29 12:50:21 x kernel: Code: 8d 46 04 e8 86 00 00 00 8d 4b 0c 8b 51 04 8d 46
0c 83 c4 14 5b 5e 5f e9 72 00 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04
<8b> 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 38 0e 63 c0
Nov 29 12:50:21 x kernel: EIP: [<c04e0b51>] list_del+0x9/0x6c SS:ESP 0069:c0b4eefc
Nov 29 12:50:21 x kernel:  <3>BUG: sleeping function called from invalid context
at kernel/rwsem.c:20
Nov 29 12:50:21 x kernel: in_atomic():0, irqs_disabled():1
Nov 29 12:50:21 x kernel:  [<c0405707>] dump_trace+0x69/0x1af
Nov 29 12:50:21 x kernel:  [<c0405865>] show_trace_log_lvl+0x18/0x2c
Nov 29 12:50:21 x kernel:  [<c0405e05>] show_trace+0xf/0x11
Nov 29 12:50:21 x kernel:  [<c0405e34>] dump_stack+0x15/0x17
Nov 29 12:50:21 x kernel:  [<c0430b92>] down_read+0x12/0x20
Nov 29 12:50:21 x kernel:  [<c0428c41>] blocking_notifier_call_chain+0xe/0x29
Nov 29 12:50:21 x kernel:  [<c041ed09>] do_exit+0x1b/0x776
Nov 29 12:50:21 x kernel:  [<c0405da6>] die+0x289/0x2ae
Nov 29 12:50:22 x kernel:  [<c060abf0>] do_page_fault+0xabf/0xc3c
Nov 29 12:50:22 x kernel:  [<c040502b>] error_code+0x2b/0x30
Nov 29 12:50:22 x kernel: DWARF2 unwinder stuck at error_code+0x2b/0x30
Nov 29 12:50:22 x kernel:
Nov 29 12:50:22 x kernel: Leftover inexact backtrace:
Nov 29 12:50:22 x kernel:
Nov 29 12:50:22 x kernel:  [<c04e0b51>] list_del+0x9/0x6c
Nov 29 12:50:22 x kernel:  [<c0462271>] free_block+0x63/0xdc
Nov 29 12:50:22 x kernel:  [<c0462374>] drain_array+0x8a/0xb5
Nov 29 12:50:22 x kernel:  [<c0463814>] cache_reap+0x85/0x117
Nov 29 12:50:22 x kernel:  [<c042b210>] run_workqueue+0x83/0xc5
Nov 29 12:50:22 x kernel:  [<c060936b>] _spin_lock_irqsave+0x12/0x17
Nov 29 12:50:22 x kernel:  [<c046378f>] cache_reap+0x0/0x117
Nov 29 12:50:22 x kernel:  [<c042bb00>] worker_thread+0xd9/0x10d
Nov 29 12:50:22 x kernel:  [<c04178a1>] default_wake_function+0x0/0xc
Nov 29 12:50:22 x kernel:  [<c042ba27>] worker_thread+0x0/0x10d
Nov 29 12:50:22 x kernel:  [<c042e013>] kthread+0xc0/0xed
Nov 29 12:50:22 x kernel:  [<c042df53>] kthread+0x0/0xed
Nov 29 12:50:22 x kernel:  [<c0402a69>] kernel_thread_helper+0x5/0xb
Nov 29 12:50:22 x kernel:  =======================

Comment 23 Akemi Yagi 2006-11-29 19:02:57 UTC

I spoke too soon, too.  My system crashed next morning at 4 AM.  This has
happened before.  One or more cronjobs run at this time, which apparently caused
the panic.  But it looks like the clocksource option makes it a bit harder to
trigger the crash.

Akemi

Comment 24 Greg Trounson 2006-11-30 23:00:48 UTC

I'm having the same problems.  From my post to the Fedora mailing list:

When trying to mount cifs shares with any 2.6.18 kernel on FC5 or FC6 the system
becomes unstable, usually crashing with a kernel panic a few seconds after the
mount.  I've seen this on several machines, including a dual athlon box (i386),
and an athlon64 (x86_64).  A typical log for what goes on is attached below.

I've also been having stability problems with 2.6.18 on my Debian Sid box,
though not related to cifs mounts, so I'm wondering if this kernel release might
have just escaped the barn a bit early.

In every case I've had to revert to 2.6.17 and had no problems since.

Message from syslogd@mgl26 at Fri Dec  1 10:14:06 2006 ...
mgl26 kernel: ------------[ cut here ]------------

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: kernel BUG at lib/list_debug.c:65!

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: invalid opcode: 0000 [#1]

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: SMP

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: CPU:    0

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: EIP is at list_del+0x23/0x6c

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: eax: 00000048   ebx: d0b53da0   ecx: c067e1d0   edx: 00000086

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: esi: c176f944   edi: c176f920   ebp: c176f930   esp: ddfdff20

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: ds: 007b   es: 007b   ss: 0068

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Process events/0 (pid: 5, ti=ddfdf000 task=dde00030 task.ti=ddfdf000)

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Stack: c0641cb6 d0b53da0 d0b50080 d0b53da0 c046b6dc 00000005
ddd413e0 00000003

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:        c176f920 ddd413e0 c146ff40 00000282 c046caf9 00000000
00000000 c13c7aa0

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:        c13c7aa4 c0433c38 00000246 c146ff40 c146ff60 c046ca47
00000000 c146ff60

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Call Trace:

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:  [<c046b6dc>] drain_freelist+0x3b/0x7b

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:  [<c046caf9>] cache_reap+0xb2/0x117

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0433c38>] run_workqueue+0x83/0xc5

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0434528>] worker_thread+0xd9/0x10d

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c04369fb>] kthread+0xc0/0xed

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0404dab>] kernel_thread_helper+0x7/0x10

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: Leftover inexact backtrace:

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  =======================

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39
d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 b6 1c 64 c0 e8 dc bd f3 ff <0f> 0b 41
00 f3 1c 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: EIP: [<c04e99eb>] list_del+0x23/0x6c SS:ESP 0068:ddfdff2

Comment 25 Akemi Yagi 2006-12-20 22:33:17 UTC

Just a quick note for those who are seeing this problem.  Samba programmers have
been working on this and will be posting a fix soon.  I understand it might be a
temporary fix but things are looking good now.

Akemi

Comment 26 Shirish S. Pargaonkar 2006-12-21 18:41:11 UTC

This is a patch for 1.45 version of cifs.  I think this should help.

diff -u sess.c sess.c.mod
--- sess.c      2006-08-02 16:15:17.000000000 -0500
+++ sess.c.mod  2006-12-21 09:43:19.000000000 -0600
@@ -179,10 +179,9 @@
        cFYI(1,("bleft %d",bleft));


-       /* word align, if bytes remaining is not even */
-       if(bleft % 2) {
+       /* word align, if bytes remaining is even */
+       if(!(bleft % 2)) {
                bleft--;
-               data++;
        }
        words_left = bleft / 2;

@@ -506,6 +505,7 @@
        /* and lanman response is 3 */
        bytes_remaining = BCC(smb_buf);
        bcc_ptr = pByteArea(smb_buf);
+       bcc_ptr++;

        if(smb_buf->WordCount == 4) {
                __u16 blob_len;

Comment 27 Akemi Yagi 2006-12-22 17:58:49 UTC

I have two test machines running with the patch provided by Shirish.  Both used
to have system lockups before the patch.  After the patch was applied, I have
not seen a single kernel oops/crash on either machine.  This is with a number of
mounts/umounts/reboots.

The test kernel was 2.6.18-1.2868.fc6 compiled with the above patch.  Later, I
installed the same kernel using rpm's and replaced cifs.ko with my patched
version.  That worked, too.

Akemi

Comment 28 Steve French 2006-12-23 20:01:52 UTC

Has anyone tried this against Win9x (or OS/2) or anything which is ASCII only -
at first glance it looks odd that the bcc is updated even for the ascii case.

Comment 29 Shirish S. Pargaonkar 2007-01-25 15:24:05 UTC

This is the patch

http://www.kernel.org/git/?p=linux/kernel/git/sfrench/cifs-
2.6.git;a=commitdiff;h=8e6f195af0e1f226e9b2e0256af8df46adb9d595

It is slightly different than the one I posted above.
How is the process to make it into various existing distros such as RHEL5.
This is not needed in RHEL4

Comment 30 Akemi Yagi 2007-01-25 16:05:16 UTC

Just applied the latest patch to FC6 and it worked (although your earlier patch
also worked in my case).

I want to see this patch included in the kernel which would eventually be
propagated to all distros.  But it is more important that Fedora Core gets fixed
now.  With the demise of Fedora Legacy, FC5 will be EOL'd sooner than expected.
 Those who cannot move on to FC6 because of this cifs bug will be in trouble if
the revised kernel is not made available in time.

Akemi

Comment 31 Akemi Yagi 2007-02-09 16:22:07 UTC

Just noticed today that there is a new FC5 kernel in the testing directory
(2.6.19-1.2287.fc5).  Apparently, this does not have the cifs fix.

Dave, can you tell us if/when this cifs patch gets included?

Akemi

Comment 32 Akemi Yagi 2007-02-13 05:04:24 UTC

News! The cifs patch has been included in the latest kernels which are available
from the Fedora testing directory.

FC5 is:
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/5/

FC6 is:
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/6/

Thank you, Chuck and Dave.

Akemi

Comment 33 Dave Bradley 2007-02-20 23:05:44 UTC

*** Bug 221787 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.