Bug 503192

Summary: CIFS crashes server after windows share is unavailable
Product: Red Hat Enterprise Linux 5 Reporter: Jonathan Schwehm <jmschwehm>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.0CC: gdeschner, hturesson, ijc, jlayton, rwheeler, ssorce, steved
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 452028 Environment:
Last Closed: 2010-01-13 21:31:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Contents of messages file through beginning of reboot none

Description Jonathan Schwehm 2009-05-29 14:40:22 UTC
+++ This bug was initially created as a clone of Bug #452028 +++

Description of problem:
I found bug #452028 via a search on google.  We experienced a similar issue last night when a share on a Windows Server 2003 box was unavailable for 7 hours.  The original bug mentioned an updated kernel for a CentOS system (2.6.18-129.el5.jtltest.60), but did not indicate if an updated kernel resolved the issue.

After logging multiple kernel messages regarding CIFS VFS (over 2200 lines), the server stopped responding and wrote no additional information to its logs (including messages, secure, and log files for our applications).  The server required a hard reboot and did not report any hardware failures.

Software Versions:
uname -a 
Linux webserve2 2.6.18-92.1.10.el5 #1 SMP Wed Jul 23 03:55:54 EDT 2008 i686 i686 i386 GNU/Linux

cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5 (Tikanga)

samba-3.0.23c-2.el5.2.0.2

Sample from /var/log/messages:
May 28 20:47:26 webserve2 kernel:  CIFS VFS: close with pending writes
May 28 20:47:58 webserve2 kernel:  CIFS VFS: server not responding
May 28 20:47:58 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 30854
May 28 20:48:01 webserve2 kernel:  CIFS VFS: No response to cmd 47 mid 30853
May 28 20:48:01 webserve2 kernel:  CIFS VFS: Write2 ret -11, wrote 0
May 28 20:48:01 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 30855
May 28 20:48:09 webserve2 kernel:  CIFS VFS: Write2 ret -11, wrote 0
May 28 20:48:14 webserve2 kernel:  CIFS VFS: Send error in Close = -9
May 28 20:48:45 webserve2 kernel:  CIFS VFS: close with pending writes
May 28 20:48:58 webserve2 kernel:  CIFS VFS: server not responding
May 28 20:48:58 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 30886
May 28 20:48:58 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 30887
May 28 20:49:00 webserve2 kernel:  CIFS VFS: No response to cmd 47 mid 30885
May 28 20:49:00 webserve2 kernel:  CIFS VFS: No response to cmd 4 mid 30888
May 28 20:49:00 webserve2 kernel:  CIFS VFS: Write2 ret -11, wrote 0
May 28 20:49:00 webserve2 kernel:  CIFS VFS: Send error in Close = -11
May 28 20:49:52 webserve2 kernel:  CIFS VFS: close with pending writes
May 28 20:51:20 webserve2 last message repeated 2 times
May 28 20:52:29 webserve2 kernel:  CIFS VFS: close with pending writes
...
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44712
May 29 01:59:23 webserve2 kernel:  CIFS VFS: server not responding
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44717
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44711
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44713
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44710
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44714
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44719
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44715
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44716
May 29 01:59:23 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 44718
May 29 01:59:33 webserve2 kernel:  CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of \output\njinvoice226769_702403.pdf
May 29 01:59:43 webserve2 kernel:  CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of \upload_docs\Correspondence\2009
...
May 29 03:12:37 webserve2 kernel:  CIFS VFS: server not responding
May 29 03:12:37 webserve2 kernel:  CIFS VFS: server not responding
May 29 03:12:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47303
May 29 03:12:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47304
May 29 03:12:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47305
May 29 03:12:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47306
May 29 03:12:42 webserve2 kernel:  CIFS VFS: No response to cmd 47 mid 47302
May 29 03:12:42 webserve2 kernel:  CIFS VFS: Write2 ret -11, wrote 0
May 29 03:12:42 webserve2 kernel:  CIFS VFS: writes pending, delay free of handle
May 29 03:12:48 webserve2 last message repeated 4 times
May 29 03:13:37 webserve2 kernel:  CIFS VFS: server not responding
May 29 03:13:37 webserve2 kernel:  CIFS VFS: server not responding
May 29 03:13:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47314
May 29 03:13:37 webserve2 kernel:  CIFS VFS: server not responding
May 29 03:13:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47317
May 29 03:13:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47315
May 29 03:13:37 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47316
May 29 03:13:43 webserve2 kernel:  CIFS VFS: No response to cmd 47 mid 47313
May 29 03:13:43 webserve2 kernel:  CIFS VFS: Write2 ret -11, wrote 0
May 29 03:13:43 webserve2 kernel:  CIFS VFS: No response for cmd 50 mid 47318

Comment 1 Jonathan Schwehm 2009-05-29 14:42:34 UTC
Created attachment 345912 [details]
Contents of messages file through beginning of reboot

Comment 2 Jeff Layton 2009-06-02 13:13:47 UTC
Unfortunately, I can't tell much from that log. The box may have oopsed, or just been hung. It may have even had nothing at all to do with CIFS. If it happens again, it would be helpful to capture some sysrq-t data while the box is in this state, or capture a vmcore.

Also when you say the box "went unresponsive" can you elaborate on what you mean? Did it respond to pings, for instance?

Comment 3 Jonathan Schwehm 2009-06-02 15:45:04 UTC
The box did not respond to pings when it became unresponsive.  The box was still powered-on, but the tech who rebooted it did not connect a console to see what it displayed.

If this happens again and it's responsive through the console, how can I capture some sysrq-t data or capture a vmcore?

Comment 4 Jeff Layton 2009-06-23 14:21:05 UTC
Sysrq info:
http://kbase.redhat.com/faq/docs/DOC-2024

How to force a core:
http://kbase.redhat.com/faq/docs/DOC-4264

How to set up netdump:
http://kbase.redhat.com/faq/docs/DOC-6855

Please open a support case if you require more detailed help.

Comment 5 Jeff Layton 2010-01-13 13:45:58 UTC
Is this still a problem with more recent kernels? In particular with the rhel5.5 test kernels on jwilson's homepage?

http://people.redhat.com/jwilson/

Comment 6 Jonathan Schwehm 2010-01-13 21:24:54 UTC
We haven't experienced this problem since June 2009 and since we upgraded the kernel.  We're currently running 2.6.18-164.10.1.

Thanks for checking-in; feel free to close this report since the current version of the kernel doesn't exhibit this behavior.

Comment 7 Jeff Layton 2010-01-13 21:31:49 UTC
Thanks, closing case.