Bug 711400 - panic in cifsd code after unexpected lookup error -88.
Summary: panic in cifsd code after unexpected lookup error -88.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 6.2
Assignee: Jeff Layton
QA Contact: Jian Li
URL:
Whiteboard:
Depends On:
Blocks: 704921
TreeView+ depends on / blocked
 
Reported: 2011-06-07 12:07 UTC by Jeff Layton
Modified: 2014-03-04 00:07 UTC (History)
12 users (show)

Fixed In Version: kernel-2.6.32-170.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 704921
Environment:
Last Closed: 2011-12-06 13:33:12 UTC
Target Upstream Version:


Attachments (Terms of Use)
The patch crash kernel-2.6.32-169 to reproduce the bug. (1.78 KB, patch)
2011-09-29 02:09 UTC, Jian Li
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Jeff Layton 2011-06-07 12:07:21 UTC
+++ This bug was initially created as a clone of Bug #704921 +++

Description of problem:

The kernel panics in

| >  CIFS VFS: Send error in SessSetup = -88
| >  CIFS VFS: Unexpected lookup error -88


Version-Release number of selected component (if applicable):

kernel-2.6.18-194

How reproducible:

Rarely


Steps to Reproduce:
1. Use cifsd
2. Wait ~280 days..


Additional info:

I've attempted to trace the code

crash-5.1.2> bt
PID: 11263 TASK: e68ad000 CPU: 1 COMMAND: "cifsd"
#0 [d5c7bd80] crash_kexec at c04426da
#1 [d5c7bdc4] die at c040649f
#2 [d5c7bdf4] do_page_fault at c061e611
#3 [d5c7be44] error_code (via page_fault) at c0405a87
EAX: 00000000 EBX: c069c9a0 ECX: 00000004 EDX: 00000000 EBP: 00000004
DS: 007b ESI: 00000000 ES: 007b EDI: 00000000
CS: 0060 EIP: c05b5efe ERR: ffffffff EFLAGS: 00010246
#4 [d5c7be78] sock_recvmsg at c05b5efe
#5 [d5c7bf54] kernel_recvmsg at c05b7d6d
#6 [d5c7bf64] cifs_demultiplex_thread at e8cf99de [cifs]
#7 [d5c7bfcc] kthread at c0436339
#8 [d5c7bfe4] kernel_thread_helper at c0405c51


This is odd, as the ctcpStatus for this struct is passing a null socket, but the tcpStatus of this server_info is null (which I guess corresponds to why there is an error 88 (  ENOTSOCK        88      /* Socket operation on non-socket */ ).

crash-5.1.2> struct TCP_Server_Info.srv_count,hostname,ssocket,tcpStatus,lstrp 0xdc5b2000
srv_count = 4,
hostname = 0xd62b2ec0 "localhost",
ssocket = 0x0, <====== here
tcpStatus = CifsGood,
lstrp = 2768191999
crash-5.1.2> 

Jlayton asked me to create this bug and let him know.

--- Additional comment from jlayton@redhat.com on 2011-05-16 09:39:54 EDT ---

I've looked over the code but simply don't see it. This code is really too complicated for anyone's good, but basically the recreation and reconnection of the socket is supposed to be done by cifsd. When a reconnect event occurs, then cifsd will close down the socket and set ssocket to NULL, and then try to create a new socket and connect it.

It shouldn't return until that has successfully occurred. The above stack trace though makes it look like it did happen.

There is one possibility -- it could be that there was a flurry of reconnect/disconnect activity, cifs_setup_session raced in and reset the tcpStatus to CifsGood while cifsd was trying (and failing) to reconnect the socket. That would probably explain what happened...

The fundamental problem here though is that the tcpStatus has no clear locking rules around it. This will probably require a fairly fundamental overhaul to fix it correctly.

--- Additional comment from jlayton@redhat.com on 2011-06-06 10:11:35 EDT ---

Note that there is a discussion about a very similar problem going on upstream. I think I have a patch that may fix this there, but it will need to be backported for RHEL5:

http://article.gmane.org/gmane.linux.kernel.cifs/3402

Comment 1 Jeff Layton 2011-06-07 12:08:59 UTC
Going ahead and cloning this for RHEL6 as I'm fairly certain it's a bug there too. I have a patch that should prevent the panics, but it needs further testing. It would be nice if we could come up with a reproducer, but it may be difficult for this one.

Comment 2 RHEL Product and Program Management 2011-06-08 18:20:20 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 yanfu,wang 2011-06-14 09:58:49 UTC
as per comment #1, will do code review if no reproducer provided.

Comment 5 Kyle McMartin 2011-07-20 14:33:34 UTC
Patch(es) available on kernel-2.6.32-170.el6

Comment 8 Jian Li 2011-09-29 02:09:56 UTC
Created attachment 525428 [details]
The patch crash kernel-2.6.32-169 to reproduce the bug.

Comment 9 Jian Li 2011-09-29 02:11:10 UTC
The bug could be reproduced by apply a manual patch to kernel, which would make 'cifs_reconnect' wait until 'cifs_negotiate_protocol' changed tcpserver->status. The patch is based on kernel-2.6.32-169.el6, and is attached. 

steps to crash with patch:
1. start smb service, create smb user (root/redhat)
2. modprobe cifs
3. echo 8 > /proc/fs/cifs/cifsFYI
4. mount.cifs //localhost/test /mnt/test -o user=root,password=redhat

crash output:

fs/cifs/cifssmb.c: Dialect: 2
fs/cifs/cifssmb.c: negprot rc 0
fs/cifs/connect.c: bug test 1 cifs-8000000f
**snip**
BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
**snip**
fs/cifs/transport.c: For smb_command 115
fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
CIFS VFS: Send error in SessSetup = -88
fs/cifs/connect.c: CIFS VFS: leaving cifs_get_smb_ses (xid = 1) rc = -88
fs/cifs/connect.c: CIFS VFS: leaving cifs_mount (xid = 0) rc = -88
CIFS VFS: cifs_mount failed w/return code = -88
 [last unloaded: scsi_wait_scan]
Call Trace:
 [<ffffffff81079bcc>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8126d150>] ? string+0x40/0x100
 [<ffffffff8120e5af>] selinux_socket_recvmsg+0x1f/0x30
 [<ffffffff812067e6>] security_socket_recvmsg+0x16/0x20
 [<ffffffff8140eae0>] sock_recvmsg+0xe0/0x160
 [<ffffffff810943ef>] ? up+0x2f/0x50
 [<ffffffff8108e6d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff814dd145>] ? printk+0x41/0x44
 [<ffffffff8118ccb2>] ? iput+0x62/0x70
 [<ffffffff8140eba4>] kernel_recvmsg+0x44/0x60
 [<ffffffffa040ca0e>] cifs_demultiplex_thread+0x1ce/0x1070 [cifs]

On kernel-2.6.32-203, the patch could not crash kernel.

Comment 10 errata-xmlrpc 2011-12-06 13:33:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.