Description from samba BZ: https://bugzilla.samba.org/show_bug.cgi?id=5720 --------------------[snip]-------------------- Running scripts to mimick the behaviour of our server (but speeded up), results in either hanging umount processes or crashing the kernel. Setup: 1 windows machine, 2 shares (SAMBARAPE3 and SAMBARAPE4). 1 Linux machine mounting cifs shares. Running the scripts: while true ; do mount //win2000machine/SAMBARAPE3 /mnt/SAMBA3/ -o username=XXX,password=YYY && echo 2 ls /mnt/SAMBA3 umount -f /mnt/SAMBA3 && echo -2 ls /mnt/SAMBA3 done and Running scripts to mimick the behaviour of our server (but speeded up), results in either hanging umount processes or crashing the kernel. Setup: 1 windows machine, 2 shares (SAMBARAPE3 and SAMBARAPE4). 1 Linux machine mounting cifs shares. Running the scripts: while true ; do mount //win2000machine/SAMBARAPE4 /mnt/SAMBA4/ -o username=XXX,password=YYY && echo 2 ls /mnt/SAMBA4 umount -f /mnt/SAMBA4 && echo -2 ls /mnt/SAMBA4 done and while true ; do mount //win2000machine/SAMBARAPE3 /mnt/SAMBA3/ -o username=XXX,password=YYY && echo 1 ls /mnt/SAMBA3 umount -f /mnt/SAMBA3 && echo -1 ls /mnt/SAMBA3 done Concurrently either crashes the kernel or result in hanging umount processes. In the last instance the kernel module unloading gets buggy. The echo statements and ls commands can be omitted. Playing aroung with options, -i and -f works better (takes longer time to crash, but do not prevent the problem). This is extreme usage of mount/umount processes, and we have minimised the problem by reducing the risk of these happening at the same time. It may be of interest for the SAMBA team (and maybe Kernel devs) to investigate anyway. It happens on RHEL 5 and Ubuntu Hardy Heron and probably other distro's Best Regards Esben Bjerrum
Created attachment 316764 [details] ...also adds locking arround CIFSTCon() Hi Jeff, This patch is based on your previous patch and adds down(&pSesInfo->sesSem); / up(&pSesInfo->sesSem); arround CIFSTCon() in connect.c in cifs_mount() and get_dfs_path(() With this the system seems to be handling several mount/umounts in parallel for a few minutes. Probably not perfect yet, but I hope this helps. Cheers, Olivier.
Created attachment 316776 [details] lockdep warning and memory corruption detected during reproducer run Thanks Olivier. I forward-ported your patch to rawhide and gave it a run there. It definitely helps keep things moving along. I see a number of places where we get this: CIFS VFS: cifs_read_super: get root inode failed ...which may indicate another race -- instrumenting that to give a return code might be interesting, but it probably means that cifs_get_inode_info_unix failed for some reason (probably due to races with the session getting torn down). After this ran for a few minutes though, I got the attached message. It looks pretty clearly like a use-after-free. This isn't terribly surprising since the tcon struct doesn't have any sort of refcounting. I think fixing this is going to take some careful thought and redesign to make sure that we aren't tearing down the sessions while another one is being brought up. That will probably mean adding some reference counting to some of these data structures.
Created attachment 316932 [details] Updated patch I believe the problems comes from a race condition when doing mount/umounts in parallel. The effect are various memory and slab corruption. This updated version gives some very good results here, no crash anymore for almost 2 hours (used to crash within a few minutes at best previously). What this does is to actually nullify the server->tsk structure after force_sig(SIGKILL,...) to avoid a second potention kill of the same task if a umount occurs immediately after. It also protects pSesInfo->status change in cifs_setup_session().
Nice work. Zeroing out the tsk pointer makes sense. There is one problem though -- we don't seem to hold any sort of lock while sending the signal so there's nothing to prevent another thread from racing in with a second signal before you can set the tsk to NULL. The best thing might be a new helper function or inline that: 1) saves off the tsk pointer to a local variable 2) NULLs out the server->tsk 3) sends the signal Parts 1 and 2 should probably be done while holding a spinlock, or maybe doing it with xchg() would be atomic enough...
The race condition still persists. After instrumenting the code, this is the last few lines preceeding a crash: fs/cifs/connect.c: CIFS VFS: in cifs_umount as Xid: 1360 with uid: 0 fs/cifs/connect.c: XID=1360: Disconnecting tcon 0xccef3e34 fs/cifs/cifssmb.c: In tree disconnect fs/cifs/transport.c: For smb_command 113 fs/cifs/transport.c: Sending smb: total_len 39 fs/cifs/connect.c: XID=1359: Freeing tcon 0xccef36e0 fs/cifs/connect.c: XID=1359: About to do SMBLogoff fs/cifs/connect.c: XID=1359: Mount active on server: 2, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/cifssmb.c: In SMBLogoff for session disconnect fs/cifs/cifssmb.c: Currently in use: 1 fs/cifs/connect.c: CIFS VFS: leaving cifs_umount (xid = 1359) rc = -16 fs/cifs/connect.c: rfc1002 length 0x27 fs/cifs/connect.c: XID=1360: Freeing tcon 0xccef3e34 fs/cifs/connect.c: XID=1360: About to do SMBLogoff fs/cifs/connect.c: XID=1360: Mount active on server: 1, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/cifssmb.c: In SMBLogoff for session disconnect fs/cifs/cifssmb.c: Currently in use: 0 fs/cifs/transport.c: For smb_command 116 fs/cifs/transport.c: Sending smb: total_len 43 fs/cifs/connect.c: rfc1002 length 0x2b fs/cifs/cifsfs.c: Devname: //172.16.239.132/samba4 flags: 64 fs/cifs/connect.c: CIFS VFS: in cifs_mount as Xid: 1361 with uid: 0 fs/cifs/connect.c: Entering cifs_mount. Xid: 1361 with: unc=//172.16.239.132\samba4,ip=172.16.239.132,ver=1,rw,username=root,password=redhat fs/cifs/connect.c: XID=1361: Username: root fs/cifs/connect.c: XID=1361: UNC: \\172.16.239.132\samba4 ip: 172.16.239.132 fs/cifs/connect.c: cifs_find_tcp_session fs/cifs/connect.c: Found exact match server, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/connect.c: XID=1361: Existing tcp session with server found fs/cifs/connect.c: XID=1361: Existing smb sess found, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/connect.c: file mode: 0x7f7 dir mode: 0x1ff fs/cifs/connect.c: XID=1361: Allocating tcon 0xccef3e34 fs/cifs/connect.c: XID=1361: CIFSTCon tcon 0xccef3e34, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/connect.c: XID=1360: Waking up socket by sending signal, tcon 0xccef3e34, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 fs/cifs/connect.c: XID=1361: CIFS Tcon rc = -2 fs/cifs/connect.c: XID=1361: session 0xccef3a04, server 0xcf0b4288, task 0x0 BUG: unable to handle kernel NULL pointer dereference at virtual address 00000450 XID 1359: cifs_umount XID 1360: cifs_umount XID 1361: cifs_mount - So what seems to happen here is: 1) umount is called (XID 1360) and checks the use on the server, only 1 active mount 2) mount is called (XID 1361) and looks for existing server 3) mount finds one that matches, session 0xccef3a04, server 0xcf0b4288, task 0xccd6c070 4) umount kill the server 5) mount now tries to use a dead server - How can we fix the issue? There is fairly huge gap between the time the existing session is found and the time it's actially marked as "in use". The field "inUse" of the CIFS session is incremented fairly late in cifs_mount() and this field is used in CIFSSMBLogoff() to check if the session is in use. A possible solution would be to mark the session "in use" as soon as it's been selected by cifs_find_tcp_session() so a concurrent umount() would not kill the server in the mean time. I am working on a patch that implements this, but when adding the write/read locks required, I end up with lockups after a few concurrent mounts/umounts which is odd.
Agreed on the last idea. Once we find the session in cifs_find_tcp_session, we should be taking a reference to it. The best thing though might be to try and overhaul the refcounting on the sessions altogether, possibly even convert it to use krefs.
Created attachment 319217 [details] Patch for current status Hi Jeff, Following our discussion on IRC, here is the patch as it is today. This patch is against RHEL5 kernel and it does not work (well, it builds, but you'll get a lockup fairly quickly after a few mounts/umounts and I've not had time to investigate why yet). Hope this helps, Cheers, Olivier.
Thanks Olivier, I didn't try out your patch directly, but did snarf a few pieces from it. I'll post my patchset in a little while. I ran a test for a while on the one I've been testing and saw this (this is on a rawhide kernel so it catches accesses to uninitialized memory). It looks like we probably tried to use a tcon struct that had already been freed. I think the patchset I have now does a passable job with making sure that the server tsk pointer doesn't get used when it shouldn't, but I suspect that we need to go through the session and tcon handling and convert them to more sane refcounting. ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2609 __lock_acquire+0xb33/0xc01() (Not tainted) Modules linked in: nls_utf8 cifs bridge stp bnep l2cap bluetooth autofs4 sunrpc ipv6 xfs dm_multipath floppy pcspkr 8139cp 8139too mii i2c_piix4 i2c_core dm_snapshot dm_zero dm_mirror dm_log pata_acpi ata_generic [last unloaded: freq_table] Pid: 4628, comm: mount.cifs Not tainted 2.6.27-0.372.rc8.fc10.x86_64 #1 Call Trace: [<ffffffff81045916>] warn_on_slowpath+0x65/0x9a [<ffffffff810133e8>] ? dump_trace+0x24f/0x25e [<ffffffff81067c7c>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8101b616>] ? save_stack_trace+0x2f/0x4d [<ffffffff81066ad7>] ? save_trace+0x3f/0x95 [<ffffffff810659c1>] ? graph_unlock+0x5e/0x6a [<ffffffff81067a11>] ? mark_lock+0x38d/0x3a2 [<ffffffff81068f15>] __lock_acquire+0xb33/0xc01 [<ffffffff81066c39>] ? trace_hardirqs_off_caller+0x21/0xa4 [<ffffffff81069070>] lock_acquire+0x8d/0xba [<ffffffff8137ae62>] ? __down_common+0x98/0xcf [<ffffffff8137c1bf>] _spin_lock_irq+0x44/0x74 [<ffffffff8137ae62>] ? __down_common+0x98/0xcf [<ffffffff8137bebe>] ? _spin_unlock_irq+0x30/0x3c [<ffffffff8137ae62>] __down_common+0x98/0xcf [<ffffffff8137af0c>] __down+0x1d/0x1f [<ffffffff8105e4b7>] down+0x2d/0x3e [<ffffffffa02e1210>] CIFSSMBTDis+0x4c/0x11a [cifs] [<ffffffffa02e67b3>] cifs_umount+0x6a/0x1ba [cifs] [<ffffffffa02db86a>] cifs_get_sb+0x1f8/0x26f [cifs] [<ffffffff810d4b11>] vfs_kern_mount+0xa2/0x131 [<ffffffff810d4c07>] do_kern_mount+0x4c/0xe7 [<ffffffff810ea97a>] do_new_mount+0x64/0x97 [<ffffffff810eb77f>] do_mount+0x20c/0x23b [<ffffffff810e98f2>] ? copy_mount_options+0xd4/0x138 [<ffffffff8118e9c9>] ? _raw_spin_lock+0x68/0x116 [<ffffffff810eb83c>] sys_mount+0x8e/0xd6 [<ffffffff8137ba76>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101139a>] system_call_fastpath+0x16/0x1b ---[ end trace 91b2b9d8ad1ebe92 ]--- general protection fault: 0000 [1] SMP DEBUG_PAGEALLOC CPU 1 Modules linked in: nls_utf8 cifs bridge stp bnep l2cap bluetooth autofs4 sunrpc ipv6 xfs dm_multipath floppy pcspkr 8139cp 8139too mii i2c_piix4 i2c_core dm_snapshot dm_zero dm_mirror dm_log pata_acpi ata_generic [last unloaded: freq_table] Pid: 4628, comm: mount.cifs Tainted: G W 2.6.27-0.372.rc8.fc10.x86_64 #1 RIP: 0010:[<ffffffff8118ee80>] [<ffffffff8118ee80>] list_del+0xc/0x85 RSP: 0018:ffff880010d23c48 EFLAGS: 00010096 RAX: ffff88001d9b4bc8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff88001d402a00 RDX: ffffffff8105e36d RSI: ffff88001d9b4ba0 RDI: 6b6b6b6b6b6b6b6b RBP: ffff880010d23c58 R08: 0000000000000002 R09: 0000000000000001 R10: ffffffff81b37370 R11: 0000000000100100 R12: 0000000000000292 R13: ffff88001d9b4b88 R14: 00000000000009fb R15: ffff8800131cc000 FS: 00007f371d2506f0(0000) GS:ffff88001f8117d0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fb8c35eb000 CR3: 000000001d44f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount.cifs (pid: 4628, threadinfo ffff880010d22000, task ffff88001d4022d0) Stack: ffff880010d23c78 6b6b6b6b6b6b6b6b ffff880010d23c78 ffffffff8137adb6 00000000fffffff0 ffff88001d9b4b88 ffff880010d23c98 ffffffff8105e387 ffff88001d9b4b68 00000000fffffff0 ffff880010d23cd8 ffffffffa02e122e Call Trace: [<ffffffff8137adb6>] __up+0x1a/0x2e [<ffffffff8105e387>] up+0x2e/0x3e [<ffffffffa02e122e>] CIFSSMBTDis+0x6a/0x11a [cifs] [<ffffffffa02e67b3>] cifs_umount+0x6a/0x1ba [cifs] [<ffffffffa02db86a>] cifs_get_sb+0x1f8/0x26f [cifs] [<ffffffff810d4b11>] vfs_kern_mount+0xa2/0x131 [<ffffffff810d4c07>] do_kern_mount+0x4c/0xe7 [<ffffffff810ea97a>] do_new_mount+0x64/0x97 [<ffffffff810eb77f>] do_mount+0x20c/0x23b [<ffffffff810e98f2>] ? copy_mount_options+0xd4/0x138 [<ffffffff8118e9c9>] ? _raw_spin_lock+0x68/0x116 [<ffffffff810eb83c>] sys_mount+0x8e/0xd6 [<ffffffff8137ba76>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8101139a>] system_call_fastpath+0x16/0x1b Code: 48 8d 53 18 49 8d 7c 24 18 e8 96 00 00 00 4c 89 ef e8 85 fe ff ff 59 5b 41 5c 41 5d c9 c3 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> 8b 47 08 48 8b 00 48 39 f8 74 20 49 89 c0 48 89 f9 48 c7 c2 RIP [<ffffffff8118ee80>] list_del+0xc/0x85 RSP <ffff880010d23c48> ---[ end trace 91b2b9d8ad1ebe92 ]--- ============================================================================= BUG kmalloc-512 (Tainted: G D W): Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff88001d9b4b88-0xffff88001d9b4c48. First byte 0x6c instead of 0x6b INFO: Allocated in tconInfoAlloc+0x1f/0xd3 [cifs] age=775 cpu=0 pid=4619 INFO: Freed in tconInfoFree+0x6a/0x6e [cifs] age=280 cpu=0 pid=4630 INFO: Slab 0xffffe20000b1a380 objects=14 used=5 fp=0xffff88001d9b4b68 flags=0x200000000020c3 INFO: Object 0xffff88001d9b4b68 @offset=2920 fp=0xffff88001d9b4920 Bytes b4 0xffff88001d9b4b58: 0f e5 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .���....ZZZZZZZZ Object 0xffff88001d9b4b68: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4b78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4b88: 6c 6b 6d 6b 6b 6b 6b 6b 01 00 00 00 6b 6b 6b 6b lkmkkkkk....kkkk Object 0xffff88001d9b4b98: d0 22 40 1d 00 88 ff ff 6b 6b 6b 6b 6b 6b 6b 6b �"@...��kkkkkkkk Object 0xffff88001d9b4ba8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4bb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4bc8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4bd8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4be8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4bf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c38: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c48: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk Object 0xffff88001d9b4c58: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c68: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4c98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4ca8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4cb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4cc8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4cd8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4ce8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d38: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d48: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff88001d9b4d58: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk� Redzone 0xffff88001d9b4d68: bb bb bb bb bb bb bb bb �������� Padding 0xffff88001d9b4da8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Pid: 4632, comm: mountumount1 Tainted: G D W 2.6.27-0.372.rc8.fc10.x86_64 #1 Call Trace: [<ffffffff810c9e9c>] print_trailer+0x140/0x149 [<ffffffff810ca505>] check_bytes_and_report+0xb7/0xf7 [<ffffffff81109900>] ? load_elf_binary+0xf7/0x16c7 [<ffffffff810ca616>] check_object+0xd1/0x1b4 [<ffffffff810cc529>] __slab_alloc+0x33a/0x3ed [<ffffffff81109900>] ? load_elf_binary+0xf7/0x16c7 [<ffffffff81109900>] ? load_elf_binary+0xf7/0x16c7 [<ffffffff810cd78c>] __kmalloc+0xbc/0x111 [<ffffffff81109809>] ? load_elf_binary+0x0/0x16c7 [<ffffffff81109900>] load_elf_binary+0xf7/0x16c7 [<ffffffff81109809>] ? load_elf_binary+0x0/0x16c7 [<ffffffff810d6f39>] search_binary_handler+0xcf/0x255 [<ffffffff810d8228>] do_execve+0x1c8/0x27a [<ffffffff810dcff1>] ? getname+0x171/0x1d7 [<ffffffff8100f68e>] sys_execve+0x43/0x5e [<ffffffff8101196a>] stub_execve+0x6a/0xc0 FIX kmalloc-512: Restoring 0xffff88001d9b4b88-0xffff88001d9b4c48=0x6b FIX kmalloc-512: Marking all objects used
Did an initial patch to try and add some sanity to the tcon struct reference counting and it totally crashed and burned. I've gone over it several times and don't see why my patch would cause more problems than it solves, so I think it must just be changing the timing such that another existing race is exposed. I'll plan to go over this more closely and do some more incremental testing to see if I can figure out the nature of the race here.
Created attachment 319417 [details] cifs: eliminate usage of kthread_stop for cifsd Patch 1 -- eliminate usage of kthread_stop for cifsd
Created attachment 319418 [details] cifs: make sure to do CIFSTCon while holding session semaphore patch 2
Created attachment 319419 [details] cifs: handle the TCP_Server_Info->tsk field more carefully patch 3
Created attachment 319420 [details] cifs: take tcon->useCount reference as soon as it's found patch 4
Here's what I have so far. This doesn't seem to cure the problems, however. I still get the occasional use-after-free of the tcon struct or the session. Right now, I'm testing a patch that eliminates the sharing of the tcon and session structs (and sockets, for that matter). So far, while running that I see no oopses and it's been 15-20 minutes or so. I may try to propose a patch that does this for upstream and plan to work from that to overhaul the tcon and session sharing code. It's probably easier to work from a working state than to try and fix the code that's broken. There's another somewhat related problem here, occasionally we end up with extra entries in the mtab with this reproducer. Either they're getting added when the mount didn't really work or we're not removing them on a successful umount. This is almost certainly a problem with the mount.cifs and umount.cifs code. Once we get the kernel-based races patched up, we can have a look and fix that too.
So far so good with the non-sharing patch. I'll probably let this run for a while overnight to make sure we don't hit any races. Once we do that, I'll have a closer look at what's really required to share tcon/ses/server structs. It might not even hurt to really do some hatchet work with this code and start with some heavily simplified cifs_mount/cifs_umount functions. We can then go back and try to reimplement sharing the various structures.
I spent some time over the weekend looking at how to fix this. I started by trying add back in code to share TCP_Server_Info structs between mounts. This also means that we're sharing cifsd kernel threads and sockets. I took great care to try and make this race free, but it ended up still oopsing with this: CIFS VFS: cifs_mount failed w/return code = -88 CIFS VFS: Send error in SessSetup = -88 (tcpStatus = 1) CIFS VFS: cifs_mount failed w/return code = -88 NULL pointer dereference at 00000000000004a8 IP: [<ffffffff811579c9>] socket_has_perm+0xd/0x60 PGD 1786d067 PUD 17946067 PMD 1610c067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: cifs nls_utf8 sco bridge stp bnep l2cap bluetooth autofs4 sunrpc ipv6 xfs dm_multipath floppy pcspkr 8139cp 8139too mii i2c_piix4 i2c_core pata_acpi ata_generic [last unloaded: cifs] Pid: 15266, comm: cifsd Not tainted 2.6.27-0.391.rc8.git7.fc10.x86_64.debug #1 RIP: 0010:[<ffffffff811579c9>] [<ffffffff811579c9>] socket_has_perm+0xd/0x60 RSP: 0018:ffff88000b421bd0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff880013d422d0 RBP: ffff88000b421c30 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000004 R11: ffff88000dd11198 R12: 0000000000000000 R13: 0000000000000004 R14: ffff88000b421ea0 R15: ffff88000b421ca0 FS: 0000000000000000(0000) GS:ffff88001f8047d0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000004a8 CR3: 0000000016116000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cifsd (pid: 15266, threadinfo ffff88000b420000, task ffff880013d422d0) Stack: ffff88000b421c10 ffffffff810674fe ffff88000b421c00 ffff880013d422d0 ffff88001f950c68 0000000000000002 ffff880013d422d0 ffff88000b421c20 ffffffff810c8fac ffffe2000043c8c0 5aff88000b421c70 0000000000000000 Call Trace: [<ffffffff810674fe>] ? mark_lock+0x22/0x3a2 [<ffffffff810c8fac>] ? compound_order+0x1a/0x2b [<ffffffff81157ace>] selinux_socket_recvmsg+0x22/0x24 [<ffffffff811536c5>] security_socket_recvmsg+0x16/0x18 [<ffffffff812c8530>] __sock_recvmsg+0x54/0x7f [<ffffffff812c8c2d>] sock_recvmsg+0xcf/0xe8 [<ffffffff810674fe>] ? mark_lock+0x22/0x3a2 [<ffffffff8105a241>] ? autoremove_wake_function+0x0/0x3d [<ffffffff81066b21>] ? trace_hardirqs_off+0xd/0xf [<ffffffff81017e89>] ? native_sched_clock+0x8e/0xa8 [<ffffffff81017d26>] ? sched_clock+0x9/0xc [<ffffffff81066009>] ? lock_release_holdtime+0x2c/0x111 [<ffffffff81067ad4>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81036295>] ? __wake_up+0x43/0x4f [<ffffffff812c8c8a>] kernel_recvmsg+0x44/0x59 [<ffffffffa02aa73a>] cifs_demultiplex_thread+0x1d0/0xc25 [cifs] [<ffffffffa02aa56a>] ? cifs_demultiplex_thread+0x0/0xc25 [cifs] [<ffffffff81059edb>] kthread+0x4e/0x7b [<ffffffff810128f9>] child_rip+0xa/0x11 [<ffffffff81011c0e>] ? restore_args+0x0/0x30 [<ffffffff81059e68>] ? kthreadd+0x17b/0x1a0 [<ffffffff81059e8d>] ? kthread+0x0/0x7b [<ffffffff810128ef>] ? child_rip+0x0/0x11 Code: 45 e0 e8 7f 4c f7 ff eb 06 41 bc a4 ff ff ff 5b 44 89 e0 41 5c 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 48 83 ec 60 0f 1f 44 00 00 <4c> 8b 96 a8 04 00 00 31 c0 4c 8b 87 18 06 00 00 41 89 d1 41 83 RIP [<ffffffff811579c9>] socket_has_perm+0xd/0x60 RSP <ffff88000b421bd0> CR2: 00000000000004a8 ---[ end trace 5a038d2cae9f6247 ]--- ...I think we ended up calling into kernel_recvmsg while ipv4_connect was being run. I don't see anything that prevents this, so it seems plausible. So in addition to fixing the code that shares the data structures mentioned, we probably will also need to overhaul the cifs_reconnect code to make sure we aren't trying to use the socket while it's being reconnected. I've gone ahead and set a set of patches upstream to remove the kthread_stop usage, and to just disable the sharing code. This allows the reproducer to run for a long time (at least overnight).
*** Bug 458210 has been marked as a duplicate of this bug. ***
Created attachment 320000 [details] disable sharing of mount structures + remove kthread_stop usage uhhh...sorry... Those patches are definitely not a complete solution (not even close, actually). This patchset actually seems to prevent the panics with the reproducer, but the upstream maintainer has some concerns. Most notably, this will probably break multi-user mounts. Still, it might be something to consider. The other idea I have for a stopgap fix is to just wrap all of cifs_mount and cifs_umount with a global mutex. It's an ugly "solution" but would probably prevent most of these crashes. In the meantime, I'm working on a better fix, but the changes required are substantial. It'll probably take some time to get it right.
Created attachment 320011 [details] serialize cifs mounts + remove kthread_stop usage Ugh... I'm really not fond of this patch, but it should allow the existing code to limp along while preventing races between mounts and unmounts. It just adds a global mutex that we use to serialize the cifs_mount and cifs_umount functions. Essentially, we're papering over the bugs with this... I'm testing it now and it seems to be working. I still want to clean up the structure sharing code to make this unnecessary (and to make the code simpler and easier to debug). It might be reasonable to consider this for RHEL as a stopgap measure until a better fix is in place.
Created attachment 320186 [details] patch -- fix mount races in RHEL5 First, I notice that you're running a -108.el5 based kernel. That kernel is pretty old by now and doesn't contain the update to the CIFS code that's slated for 5.3. The 5.3 kernel should alreadu include the patch to clean up error handling in cifs_mount. My guess is that you're running into a different, unrelated bug of some sort. I'd recommend testing this on something closer to what will be released in 5.3 (unless you feel like chasing bugs that have already been fixed). Here's the RHEL5 patchset I've been testing today. It's based on the patchset that I sent upstream over the weekend. I think it should fix this without needing to wrap everything in a new semaphore. There are some other patches in the kernel I've been testing too, but I don't think they have any bearing on this problem. This patchset should apply reasonably cleanly to the current RHEL5.3 kernels.
A patchset is now in mainline kernels for this. I've backported it to RHEL5 and put it in the test kernels on my people.redhat.com page: http://people.redhat.com/jlayton/ ...if you're able to test them, then please do and let me know how they fare...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
*** This bug has been marked as a duplicate of bug 465143 ***