Red Hat Bugzilla – Bug 444865
oops in cifs module while trying to stop a thread (kthread_stop) during filesystem mount
Last modified: 2011-01-24 17:58:11 EST
+++ This bug was initially created as a clone of Bug #442789 +++
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
System Panic with this stack trace
cpu 0x0: Vector: 300 (Data Access) at [c00000002e50b390]
pc: c0000000000a1f74: .kfree+0x8c/0xfc
lr: c00000000005d8d8: .free_task+0x30/0x60
current = 0xc0000000a07ef520
paca = 0xc000000000404800
pid = 23924, comm = mount.cifs
[c00000002e50b6b0] c00000000005d8d8 .free_task+0x30/0x60
[c00000002e50b740] c00000000007f1f4 .kthread_stop+0xf0/0x168
[c00000002e50b7e0] d00000000058dad0 .cifs_mount+0xde0/0x1070 [cifs]
[c00000002e50b990] d00000000057908c .cifs_read_super+0x8c/0x1fc [cifs]
[c00000002e50ba30] d000000000579918 .cifs_get_sb+0x9c/0x124 [cifs]
[c00000002e50bad0] c0000000000ce7d8 .do_kern_mount+0xfc/0x29c
[c00000002e50bb80] c0000000000ef074 .do_new_mount+0x90/0xf0
[c00000002e50bc30] c0000000000efb88 .do_mount+0x1e4/0x22c
[c00000002e50bd60] c0000000000ff4a8 .compat_sys_mount+0x188/0x258
[c00000002e50be30] c000000000011280 syscall_exit+0x0/0x18
--- Exception: c01 (System Call) at 000000000ff53758
SP (ffffe680) is in userspace
Systems tests/testsuite keeps running.
This happens in connect.c, in cifs_mount function at this piece of code
tsk = srvTcp->tsk;
-- Additional comment from email@example.com on 2008-04-16 16:20 EST --
Created an attachment (id=302668)
proposed upstream patch
Proposed patch -- only lightly tested.
I think that the problem here is that cifs_demultiplex_thread is allowed to
exit when signalled or if kthread_should_stop returns true. It should actually
only be allowed to exit when kthread_should_stop returns true. That should
prevent this panic.
Shagggy asked whether this patch might cause us to hang on the second pass into
kernel_recvmsg. I don't think that it will since the signal should still be
pending when we return from the first kernel_recvmsg call, so the next call
into it should return quickly.
The light testing I've done seems to indicate that that is the case. A umount
proceeded quickly and didn't hang.
-- Additional comment from firstname.lastname@example.org on 2008-04-17 15:11 EST --
That patch isn't what we want I don't think. We want to allow the thread to
start coming down in some cases, but not to actually exit until after
kthread_stop is called.
I'm working on a patchset for upstream that should (hopefully) close these races.
-- Additional comment from email@example.com on 2008-04-17 15:15 EST --
This doesn't really appear to be a regression, AFAICT. This looks to be a
long-standing problem that Shirish just now happened to hit.
-- Additional comment from firstname.lastname@example.org on 2008-04-18 06:50 EST --
I've sent an initial patchset upstream that I think will fix this, awaiting
comments on it there...
*** Bug 446932 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
A few hours after I posted this patch internally, Steve French found a problem
with it. If the server just closes the connection on a Negotiate Protocol error,
then the thread can hang indefinitely without coming down. There's a one line
fix that's been pushed upstream to Linus, and we should probably also take it
for RHEL. I plan to repost this in the next day or so.
Created attachment 309854 [details]
Updated patch. Wake up the response_q before going to sleep. This prevents
deadlock when a server just closes the connection during session setup.
Jeff, please let me know if anything is needed from IBM to get this patch into
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.