224359 – CIFS: bad session setup packet alignment leads to kernel crash

Bug 224359 - CIFS: bad session setup packet alignment leads to kernel crash

Summary: CIFS: bad session setup packet alignment leads to kernel crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	217914 220806 223328 235296 (view as bug list)
Depends On:	211672
Blocks:
TreeView+	depends on / blocked

Reported:	2007-01-25 13:06 UTC by Steve Dickson
Modified:	2014-06-18 07:35 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHBA-2007-0959
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 19:22:13 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Upstream patch (1.39 KB, patch) 2007-01-25 13:10 UTC, Steve Dickson	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0959	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5 Update 1	2007-11-08 00:47:37 UTC

Description Steve Dickson 2007-01-25 13:06:47 UTC

+++ This bug was initially created as a clone of Bug #211672 +++

Description of problem: I updated to kernel 2.6.18-1.2200.fc5smp.  The system
crashes a few minutes after I start working in X (KDE).  I am using the nvidia
driver.  Crash-related lines from the /var/adm/messages files are attached.


Version-Release number of selected component (if applicable):2.6.18-1.2200.fc5smp


How reproducible:Booted the system twice and crashed both times


Steps to Reproduce:
1.boot the 2.6.18 kernel
2.start X
3.work for a few minutes
  
Actual results:
Total system freeze

Expected results:


Additional info:

-- Additional comment from amyagi on 2006-10-20 14:45 EST --
Created an attachment (id=139017)
Relevant lines from /var/adm/messages


-- Additional comment from davej on 2006-10-20 15:35 EST --
This is a bug in the nvidia module. Only nvidia can fix this.


-- Additional comment from amyagi on 2006-10-23 12:17 EST --
I have uninstalled the Nvidia driver from the system and used the nv driver that
came with the distribution.  I still have exactly the same symptom - a hard lockup.

-- Additional comment from amyagi on 2006-10-25 11:26 EST --
Since FC6 came out yesterday, I did a fresh installation of it.  The system has
not crashed so far (for 1 day).

-- Additional comment from amyagi on 2006-10-27 18:08 EST --
After 3 days of stable operation of freshly installed FC6, the crash problem
came back.  It started happening when/after I created links to remote Windows
shares on my Desktop (KDE).  Those shares are cifs-mounted through autofs.  It
may be possible that my problem is related to bug# 211070 (although mounting
itself is not a problem whether it is done manually or via autofs).

-- Additional comment from amyagi on 2006-10-31 12:28 EST --
I am now more convinced that my system crash is related to cifs.  As far as I
stay away from cifs mounting, the system is stable.

-- Additional comment from smfrench.com on 2006-11-01 23:10 EST --
I did not see cifs symbols in the call stack of the crash.  Do you have another
example of dmesg output we could look at?

Any chance you could reproduce the crash with cifs debugging enabled and send
the dmesg output ("echo 1 > /proc/fs/cifs/cifsFYI") - since this could log a lot
to dmesg even if started just before the failure, it may wrap the buffer but it
might be helpful to see the operation before the list del.

If the target directory or subdirectory has more than 100 files and you are not
running the fixed version of fc6, it might run into the EINVAL on readdir
problem - but that seems unlikely to cause list corruption

-- Additional comment from amyagi on 2006-11-02 00:40 EST --
Created an attachment (id=140084)
1st example


-- Additional comment from amyagi on 2006-11-02 00:41 EST --
Created an attachment (id=140085)
2nd example


-- Additional comment from amyagi on 2006-11-02 00:42 EST --
Created an attachment (id=140086)
3rd example


-- Additional comment from amyagi on 2006-11-02 00:43 EST --
Created an attachment (id=140087)
4th example


-- Additional comment from amyagi on 2006-11-02 00:48 EST --
I sent 4 other examples from dmesg output.

As for debugging, could you tell me a bit more about how this is done?
Thanks for looking into the problem.

-- Additional comment from amyagi on 2006-11-02 01:29 EST --
If I understand correctly, I just need to set cifsFYI to 1 to enable debugging?
 No need to compile the cifs modules with additional flags?

-- Additional comment from amyagi on 2006-11-02 11:06 EST --
OK, I enabled cifs debugging and "successfully" crashed the system.  The
attached file is from /var/log/messages.

-- Additional comment from amyagi on 2006-11-02 11:09 EST --
Created an attachment (id=140143)
dmesg with cifs debugging enabled


-- Additional comment from amyagi on 2006-11-08 19:24 EST --
Bug#214622 apparently has the same root cause as mine.

-- Additional comment from amyagi on 2006-11-10 14:10 EST --
This is just a confirmation that the 2.6.18-1 kernel in testing has the same
problem as expected because the bug in this report (as well as in bug#214622)
has not been addressed. 

-- Additional comment from amyagi on 2006-11-21 10:07 EST --
Found this post on the kernel maillist:

Subject:      Kernel panic in cifs_revalidate
From:         "Chakri n" <chakriin5>
Newsgroups:   gmane.linux.kernel
Date:         Tue, 21 Nov 2006 00:24:40 -0800

Hi,

I am seeing a kernel panic in cifs module. It seems to be a result of
invalid inode entry in dentry for the file it is trying to validate.

The inode->i_ino is set zero and inode->i_mapping is set to NULL in
the inode pointer in the dentry (0xdf8ea200) structure. I went through
the cifs code and could not find any valid case that could trigger
this situation. Is there any case which can lead to this situation?

0xed47fe70 0xc0133b30 filemap_fdatawait+0x20 (0x0, 0xe0e1c780, 0x0,
0xf5b35000, 0x0)
                               kernel .text 0xc0100000 0xc0133b10 0xc0133bc0
0xed47feb8 0xf8b49855 [cifs]cifs_revalidate+0x225 (0xdf8ea200)
                               cifs .text 0xf8b27060 0xf8b49630 0xf8b49af0
0xed47fec4 0xf8b3ec71 [cifs]cifs_d_revalidate+0x11 (0xdf8ea200, 0x0, 0xef47a031)
                               cifs .text 0xf8b27060 0xf8b3ec60 0xf8b3ec7d
0xed47fed8 0xc0151c03 cached_lookup+0x43 (0xe8e03a00, 0xed47fefc, 0x0,
0x1, 0xe7f5b0f8)
                               kernel .text 0xc0100000 0xc0151bc0 0xc0151c20
0xed47ff18 0xc01522a8 link_path_walk+0x3e8
                               kernel .text 0xc0100000 0xc0151ec0 0xc0152610
0xed47ff20 0xc0152629 path_walk+0x19 (0x8002, 0x8003, 0x83141a0)
                               kernel .text 0xc0100000 0xc0152610 0xc0152630
0xed47ff34 0xc015280a path_lookup+0x3a (0x0, 0x0, 0x2, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc01527d0 0xc0152810
0xed47ff64 0xc0152d3a open_namei+0x6a (0xef47a000, 0x8003, 0x0,
0xed47ff7c, 0xe8e03a00)
                               kernel .text 0xc0100000 0xc0152cd0 0xc0153260
0xed47ffa0 0xc01448b1 filp_open+0x41 (0xef47a000, 0x8002, 0x0,
0xed47e000, 0x8002)
                               kernel .text 0xc0100000 0xc0144870 0xc01448e0
0xed47ffbc 0xc0144ca1 sys_open+0x51 (0x83141a0, 0x8002, 0x0, 0x8002, 0x83141a0)

Thanks
--Chakri

-- Additional comment from smfrench.com on 2006-11-27 16:35 EST --
I am not aware of any cases the cifs code where you could reference from dentry
to invalid inode, but the attachment posted was caused by list_del ... (which
seems at first glance to be unrelated) it would be useful to know what list it
was trying to delete (since there is no cifs code in the call stack there).  It
looks like the kernel dmesg log is overflowing - any chance you could delete the
message log ("dmesg -c") and use a kernel (rebuild with different debug config
options) with a larger dmesg size so the dmesg log does not drop so many entries.



-- Additional comment from amyagi on 2006-11-28 19:43 EST --
This is an update of my status.

In an attmpt to debug the problem better, I installed FC6 as a guest in VMware.
 After many times of cifs-mounting, kernel panic has not happened.  This made me
wonder what the difference between the virtual FC6 and the real one.  One
obvious difference is that the VM is running on a single processor and the two
machines that have the lockup problem are dual-core Athlons.

After searching through postings/maillists on the net, I came across what might
be related to my issue.  It is regarding the AMD processors and the choice of
clocksource.  Following someone's post, I booted the system with the
clocksource=acpi_pm option.  No crash despite cifs-mounts.  Then I booted again
without that option.  The machine locked up as soon as I did a cifs-mount.  I
went back with the acpi_pm option again, and it has been running stable for a
few hours now.

With some more digging I found that the system uses hpet if I do not specify the
clocksource.  The VM uses TSC.

But I still do not understand why and how cifs is involved.  Hope this new piece
of info helps resolve the problem.

-- Additional comment from jon on 2006-11-29 12:10 EST --
The problem also seems to be resolved for me when I set clocksource=acpi_pm . 
Ive been running for a while with that option and remounted the volume several
times with no crashes.  My cpu information:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 2992.602
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips        : 7485.77

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 2992.602
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc up pni monitor ds_cpl cid xtpr
bogomips        : 7485.77


-- Additional comment from jon on 2006-11-29 12:58 EST --
sorry, I spoke too soon.  After another 45 minutes using the system and having a
CIFS mount, my system had a panic:

Nov 29 12:50:21 x kernel: BUG: unable to handle kernel paging request at virtual
address 0080b4e4
Nov 29 12:50:21 x kernel:  printing eip:
Nov 29 12:50:21 x kernel: c04e0b51
Nov 29 12:50:21 x kernel: 2c0d3000 -> *pde = 00000000:14eb0001
Nov 29 12:50:21 x kernel: 292b0000 -> *pme = 00000000:14ed4067
Nov 29 12:50:21 x kernel: 292d4000 -> *pte = 00000000:00000000
Nov 29 12:50:21 x kernel: Oops: 0000 [#1]
Nov 29 12:50:21 x kernel: SMP
Nov 29 12:50:21 x kernel: last sysfs file: /power/state
Nov 29 12:50:21 x kernel: Modules linked in: nls_utf8 cifs bridge netloop netbk
blktap blkbk hidp l2cap bluetooth sunrpc dm_mirror dm_multipath dm_mod video sbs
i2c_ec i2c_core button battery asus_acpi ac sg ipv6 parport_pc lp parport floppy
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore pcspkr tg3 snd_page_alloc i82875p_edac serio_raw edac_mc
usb_storage ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Nov 29 12:50:21 x kernel: CPU:    0
Nov 29 12:50:21 x kernel: EIP:    0061:[<c04e0b51>]    Not tainted VLI
Nov 29 12:50:21 x kernel: EFLAGS: 00010096   (2.6.18-1.2849.fc6xen #1)
Nov 29 12:50:21 x kernel: EIP is at list_del+0x9/0x6c
Nov 29 12:50:21 x kernel: eax: 0080b4e4   ebx: e8897f20   ecx: 00000006   edx:
00000000
Nov 29 12:50:21 x kernel: esi: ed7fd7c0   edi: e89b3000   ebp: c0d39dc0   esp:
c0b4eefc
Nov 29 12:50:21 x kernel: ds: 007b   es: 007b   ss: 0069
Nov 29 12:50:21 x kernel: Process events/0 (pid: 8, ti=c0b4e000 task=ed7c25e0
task.ti=c0b4e000)
Nov 29 12:50:21 x kernel: Stack: c14fc400 e7f2d040 ed7fd340 e8897f20 c0462271
c0b1ca80 00000006 00000000
Nov 29 12:50:21 x kernel:        ed7f8220 ed7f8220 00000006 ed7f8200 00000000
c0462374 00000000 00000000
Nov 29 12:50:21 x kernel:        c0d39dc0 ed7fd7e4 ed7fd7c0 c0d39dc0 ed7c9cc0
00000000 c0463814 00000000
Nov 29 12:50:21 x kernel: Call Trace:
Nov 29 12:50:21 x kernel:  [<c0462271>] free_block+0x63/0xdc
Nov 29 12:50:21 x kernel:  [<c0462374>] drain_array+0x8a/0xb5
Nov 29 12:50:21 x kernel:  [<c0463814>] cache_reap+0x85/0x117
Nov 29 12:50:21 x kernel:  [<c042b210>] run_workqueue+0x83/0xc5
Nov 29 12:50:21 x kernel:  [<c042bb00>] worker_thread+0xd9/0x10d
Nov 29 12:50:21 x kernel:  [<c042e013>] kthread+0xc0/0xed
Nov 29 12:50:21 x kernel:  [<c0402a69>] kernel_thread_helper+0x5/0xb
Nov 29 12:50:21 x kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Nov 29 12:50:21 x kernel:
Nov 29 12:50:21 x kernel: Leftover inexact backtrace:
Nov 29 12:50:21 x kernel:
Nov 29 12:50:21 x kernel:  =======================
Nov 29 12:50:21 x kernel: Code: 8d 46 04 e8 86 00 00 00 8d 4b 0c 8b 51 04 8d 46
0c 83 c4 14 5b 5e 5f e9 72 00 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04
<8b> 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 38 0e 63 c0
Nov 29 12:50:21 x kernel: EIP: [<c04e0b51>] list_del+0x9/0x6c SS:ESP 0069:c0b4eefc
Nov 29 12:50:21 x kernel:  <3>BUG: sleeping function called from invalid context
at kernel/rwsem.c:20
Nov 29 12:50:21 x kernel: in_atomic():0, irqs_disabled():1
Nov 29 12:50:21 x kernel:  [<c0405707>] dump_trace+0x69/0x1af
Nov 29 12:50:21 x kernel:  [<c0405865>] show_trace_log_lvl+0x18/0x2c
Nov 29 12:50:21 x kernel:  [<c0405e05>] show_trace+0xf/0x11
Nov 29 12:50:21 x kernel:  [<c0405e34>] dump_stack+0x15/0x17
Nov 29 12:50:21 x kernel:  [<c0430b92>] down_read+0x12/0x20
Nov 29 12:50:21 x kernel:  [<c0428c41>] blocking_notifier_call_chain+0xe/0x29
Nov 29 12:50:21 x kernel:  [<c041ed09>] do_exit+0x1b/0x776
Nov 29 12:50:21 x kernel:  [<c0405da6>] die+0x289/0x2ae
Nov 29 12:50:22 x kernel:  [<c060abf0>] do_page_fault+0xabf/0xc3c
Nov 29 12:50:22 x kernel:  [<c040502b>] error_code+0x2b/0x30
Nov 29 12:50:22 x kernel: DWARF2 unwinder stuck at error_code+0x2b/0x30
Nov 29 12:50:22 x kernel:
Nov 29 12:50:22 x kernel: Leftover inexact backtrace:
Nov 29 12:50:22 x kernel:
Nov 29 12:50:22 x kernel:  [<c04e0b51>] list_del+0x9/0x6c
Nov 29 12:50:22 x kernel:  [<c0462271>] free_block+0x63/0xdc
Nov 29 12:50:22 x kernel:  [<c0462374>] drain_array+0x8a/0xb5
Nov 29 12:50:22 x kernel:  [<c0463814>] cache_reap+0x85/0x117
Nov 29 12:50:22 x kernel:  [<c042b210>] run_workqueue+0x83/0xc5
Nov 29 12:50:22 x kernel:  [<c060936b>] _spin_lock_irqsave+0x12/0x17
Nov 29 12:50:22 x kernel:  [<c046378f>] cache_reap+0x0/0x117
Nov 29 12:50:22 x kernel:  [<c042bb00>] worker_thread+0xd9/0x10d
Nov 29 12:50:22 x kernel:  [<c04178a1>] default_wake_function+0x0/0xc
Nov 29 12:50:22 x kernel:  [<c042ba27>] worker_thread+0x0/0x10d
Nov 29 12:50:22 x kernel:  [<c042e013>] kthread+0xc0/0xed
Nov 29 12:50:22 x kernel:  [<c042df53>] kthread+0x0/0xed
Nov 29 12:50:22 x kernel:  [<c0402a69>] kernel_thread_helper+0x5/0xb
Nov 29 12:50:22 x kernel:  =======================


-- Additional comment from amyagi on 2006-11-29 14:02 EST --
I spoke too soon, too.  My system crashed next morning at 4 AM.  This has
happened before.  One or more cronjobs run at this time, which apparently caused
the panic.  But it looks like the clocksource option makes it a bit harder to
trigger the crash.

Akemi

-- Additional comment from admin.ac.nz on 2006-11-30 18:00 EST --
I'm having the same problems.  From my post to the Fedora mailing list:

When trying to mount cifs shares with any 2.6.18 kernel on FC5 or FC6 the system
becomes unstable, usually crashing with a kernel panic a few seconds after the
mount.  I've seen this on several machines, including a dual athlon box (i386),
and an athlon64 (x86_64).  A typical log for what goes on is attached below.

I've also been having stability problems with 2.6.18 on my Debian Sid box,
though not related to cifs mounts, so I'm wondering if this kernel release might
have just escaped the barn a bit early.

In every case I've had to revert to 2.6.17 and had no problems since.

Message from syslogd@mgl26 at Fri Dec  1 10:14:06 2006 ...
mgl26 kernel: ------------[ cut here ]------------

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: kernel BUG at lib/list_debug.c:65!

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: invalid opcode: 0000 [#1]

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: SMP

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: CPU:    0

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: EIP is at list_del+0x23/0x6c

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: eax: 00000048   ebx: d0b53da0   ecx: c067e1d0   edx: 00000086

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: esi: c176f944   edi: c176f920   ebp: c176f930   esp: ddfdff20

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: ds: 007b   es: 007b   ss: 0068

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Process events/0 (pid: 5, ti=ddfdf000 task=dde00030 task.ti=ddfdf000)

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Stack: c0641cb6 d0b53da0 d0b50080 d0b53da0 c046b6dc 00000005
ddd413e0 00000003

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:        c176f920 ddd413e0 c146ff40 00000282 c046caf9 00000000
00000000 c13c7aa0

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:        c13c7aa4 c0433c38 00000246 c146ff40 c146ff60 c046ca47
00000000 c146ff60

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel: Call Trace:

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:  [<c046b6dc>] drain_freelist+0x3b/0x7b

Message from syslogd@mgl26 at Fri Dec  1 10:14:07 2006 ...
mgl26 kernel:  [<c046caf9>] cache_reap+0xb2/0x117

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0433c38>] run_workqueue+0x83/0xc5

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0434528>] worker_thread+0xd9/0x10d

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c04369fb>] kthread+0xc0/0xed

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  [<c0404dab>] kernel_thread_helper+0x7/0x10

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: Leftover inexact backtrace:

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel:  =======================

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c 8b 40 04 8b 00 39
d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 b6 1c 64 c0 e8 dc bd f3 ff <0f> 0b 41
00 f3 1c 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04

Message from syslogd@mgl26 at Fri Dec  1 10:14:08 2006 ...
mgl26 kernel: EIP: [<c04e99eb>] list_del+0x23/0x6c SS:ESP 0068:ddfdff2


-- Additional comment from amyagi on 2006-12-20 17:33 EST --
Just a quick note for those who are seeing this problem.  Samba programmers have
been working on this and will be posting a fix soon.  I understand it might be a
temporary fix but things are looking good now.

Akemi

-- Additional comment from shirishp.com on 2006-12-21 13:41 EST --

This is a patch for 1.45 version of cifs.  I think this should help.

diff -u sess.c sess.c.mod
--- sess.c      2006-08-02 16:15:17.000000000 -0500
+++ sess.c.mod  2006-12-21 09:43:19.000000000 -0600
@@ -179,10 +179,9 @@
        cFYI(1,("bleft %d",bleft));


-       /* word align, if bytes remaining is not even */
-       if(bleft % 2) {
+       /* word align, if bytes remaining is even */
+       if(!(bleft % 2)) {
                bleft--;
-               data++;
        }
        words_left = bleft / 2;

@@ -506,6 +505,7 @@
        /* and lanman response is 3 */
        bytes_remaining = BCC(smb_buf);
        bcc_ptr = pByteArea(smb_buf);
+       bcc_ptr++;

        if(smb_buf->WordCount == 4) {
                __u16 blob_len;


-- Additional comment from amyagi on 2006-12-22 12:58 EST --
I have two test machines running with the patch provided by Shirish.  Both used
to have system lockups before the patch.  After the patch was applied, I have
not seen a single kernel oops/crash on either machine.  This is with a number of
mounts/umounts/reboots.

The test kernel was 2.6.18-1.2868.fc6 compiled with the above patch.  Later, I
installed the same kernel using rpm's and replaced cifs.ko with my patched
version.  That worked, too.

Akemi

-- Additional comment from smfrench.com on 2006-12-23 15:01 EST --
Has anyone tried this against Win9x (or OS/2) or anything which is ASCII only -
at first glance it looks odd that the bcc is updated even for the ascii case.

Comment 1 Steve Dickson 2007-01-25 13:10:27 UTC

Created attachment 146535 [details]
Upstream patch 

Gitweb:    
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e6f195af0e1f226e9b2e0256af8df46adb9d595

Commit:     8e6f195af0e1f226e9b2e0256af8df46adb9d595
Parent:     bd2abf177b3384375c43906be551d976e4c18166
Author:     Steve French <sfrench.com>
AuthorDate: Mon Jan 22 01:19:30 2007 +0000
Committer:  Steve French <sfrench.com>
CommitDate: Mon Jan 22 01:19:30 2007 +0000

    [CIFS] Fix oops when Windows server sent bad domain name null terminator
    
    Fixes RedHat bug 211672
    
    Windows sends one byte (instead of two) of null to terminate final Unicode
    string (domain name) in session setup response in some cases - this caused
    cifs to misalign some informational strings (making it hard to convert
    from UCS16 to UTF8).
    
    Thanks to Shaggy for his help and Akemi Yagi for debugging/testing
    
    Signed-off-by: Shirish Pargaonkar <shirishp.com>
    Signed-off-by: Steve French <sfrench.com>

Comment 2 RHEL Program Management 2007-01-25 13:26:16 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Jeff Layton 2007-04-19 15:42:17 UTC

I'm looking at this case in the context of RHEL5, and I'll need to do a bit more
explaining about why this patch should be included...

Reading over the reports in the fedora BZ, it sounds pretty clearly like this
patch fixed the problem. Is there a way to reproduce this? Alternately, is there
an upstream discussion somewhere that describes how this bad session setup
packet leads to the apparent list corruption that people were seeing?

I'd like to understand how this patch corrects the panic people were seeing...

Comment 5 Jeff Layton 2007-04-23 17:20:00 UTC

While I still don't follow exactly how these panics follow from this issue, I
think I understand the code well enough to propose it for inclusion into RHEL5.
I've gone ahead and done that.

Comment 6 RHEL Program Management 2007-04-23 17:21:41 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 7 Jeff Layton 2007-04-24 11:30:56 UTC

*** Bug 235296 has been marked as a duplicate of this bug. ***

Comment 8 Don Zickus 2007-05-01 18:03:38 UTC

in 2.6.18-17.el5

Comment 11 errata-xmlrpc 2007-11-07 19:22:13 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Comment 12 Jeff Layton 2007-12-29 12:15:18 UTC

*** Bug 220806 has been marked as a duplicate of this bug. ***

Comment 13 Jeff Layton 2007-12-29 12:16:11 UTC

*** Bug 223328 has been marked as a duplicate of this bug. ***

Comment 14 Jeff Layton 2007-12-29 12:17:48 UTC

*** Bug 217914 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.