+++ This bug was initially created as a clone of Bug #442789 +++ Description of problem: System panic Version-Release number of selected component (if applicable): cifs 1.50cRH How reproducible: Happened once. Steps to Reproduce: 1. 2. 3. Actual results: System Panic with this stack trace 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000002e50b390] pc: c0000000000a1f74: .kfree+0x8c/0xfc lr: c00000000005d8d8: .free_task+0x30/0x60 sp: c00000002e50b610 msr: 8000000000001032 dar: 100100 dsisr: 40000000 current = 0xc0000000a07ef520 paca = 0xc000000000404800 pid = 23924, comm = mount.cifs 0:mon> t [c00000002e50b6b0] c00000000005d8d8 .free_task+0x30/0x60 [c00000002e50b740] c00000000007f1f4 .kthread_stop+0xf0/0x168 [c00000002e50b7e0] d00000000058dad0 .cifs_mount+0xde0/0x1070 [cifs] [c00000002e50b990] d00000000057908c .cifs_read_super+0x8c/0x1fc [cifs] [c00000002e50ba30] d000000000579918 .cifs_get_sb+0x9c/0x124 [cifs] [c00000002e50bad0] c0000000000ce7d8 .do_kern_mount+0xfc/0x29c [c00000002e50bb80] c0000000000ef074 .do_new_mount+0x90/0xf0 [c00000002e50bc30] c0000000000efb88 .do_mount+0x1e4/0x22c [c00000002e50bd60] c0000000000ff4a8 .compat_sys_mount+0x188/0x258 [c00000002e50be30] c000000000011280 syscall_exit+0x0/0x18 --- Exception: c01 (System Call) at 000000000ff53758 SP (ffffe680) is in userspace Expected results: Systems tests/testsuite keeps running. Additional info: This happens in connect.c, in cifs_mount function at this piece of code force_sig(SIGKILL, srvTcp->tsk); tsk = srvTcp->tsk; if (tsk) kthread_stop(tsk); <--- -- Additional comment from jlayton on 2008-04-16 16:20 EST -- Created an attachment (id=302668) proposed upstream patch Proposed patch -- only lightly tested. I think that the problem here is that cifs_demultiplex_thread is allowed to exit when signalled or if kthread_should_stop returns true. It should actually only be allowed to exit when kthread_should_stop returns true. That should prevent this panic. Shagggy asked whether this patch might cause us to hang on the second pass into kernel_recvmsg. I don't think that it will since the signal should still be pending when we return from the first kernel_recvmsg call, so the next call into it should return quickly. The light testing I've done seems to indicate that that is the case. A umount proceeded quickly and didn't hang. -- Additional comment from jlayton on 2008-04-17 15:11 EST -- That patch isn't what we want I don't think. We want to allow the thread to start coming down in some cases, but not to actually exit until after kthread_stop is called. I'm working on a patchset for upstream that should (hopefully) close these races. -- Additional comment from jlayton on 2008-04-17 15:15 EST -- This doesn't really appear to be a regression, AFAICT. This looks to be a long-standing problem that Shirish just now happened to hit. -- Additional comment from jlayton on 2008-04-18 06:50 EST -- I've sent an initial patchset upstream that I think will fix this, awaiting comments on it there...
*** Bug 446932 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
A few hours after I posted this patch internally, Steve French found a problem with it. If the server just closes the connection on a Negotiate Protocol error, then the thread can hang indefinitely without coming down. There's a one line fix that's been pushed upstream to Linus, and we should probably also take it for RHEL. I plan to repost this in the next day or so.
Created attachment 309854 [details] updated patch Updated patch. Wake up the response_q before going to sleep. This prevents deadlock when a server just closes the connection during session setup.
Jeff, please let me know if anything is needed from IBM to get this patch into 5.3. Thanks!
in kernel-2.6.18-99.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html