Bug 247464 - soft lockup in cifs mountpoint leads to unresponsive server
Summary: soft lockup in cifs mountpoint leads to unresponsive server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL: http://git.kernel.org/?p=linux/kernel...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-09 14:23 UTC by Arenas Belon, Carlo Marcelo
Modified: 2008-01-15 11:24 UTC (History)
2 users (show)

Fixed In Version: RHEL5.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-15 11:24:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 7903 0 None None None Never

Description Arenas Belon, Carlo Marcelo 2007-07-09 14:23:05 UTC
Description of problem:
soft lockup in cifs module when doing "stat", as shown by :

kernel: BUG: soft lockup detected on CPU#1!
kernel: [<c0447f3f>] softlockup_tick+0x98/0xa6
kernel: [<c042d138>] update_process_times+0x39/0x5c
kernel: [<c04176f0>] smp_apic_timer_interrupt+0x5c/0x64
kernel: [<c04049bf>] apic_timer_interrupt+0x1f/0x24
kernel: [<c0470f5e>] generic_fillattr+0x76/0xa4
kernel: [<f8d9cce8>] cifs_getattr+0x1e/0x2b [cifs]
kernel: [<f8d9ccca>] cifs_getattr+0x0/0x2b [cifs]
kernel: [<c047132e>] vfs_getattr+0x40/0x9b
kernel: [<c0471445>] vfs_stat_fd+0x2a/0x3c
kernel: [<c04714e4>] sys_stat64+0xf/0x23
kernel: [<c04dec11>] copy_to_user+0x31/0x48
kernel: [<c0403eff>] syscall_call+0x7/0xb
kernel: ======================= 

Version-Release number of selected component (if applicable):
2.6.18-8.1.6.el5PAE

How reproducible:
regularly (4 cases reported in the last 2 days from a universe of 336 boxes)

Steps to Reproduce:
1. create a read only mountpoint to a samba server
2. run 2 processes (better more) in an SMP server with a PAE enabled kernel that
stat files in that mount point
3. wait for it to lock (ping will work, but won't be able to ssh anymore)
  
Actual results:
system locks and requires a manual reboot (not even sysrq will respond in the
console)

Expected results:
no lock

Additional info:
in the server the smb process is still up and connected from the defunct server,
including oplocks to the file that was being stat as exclusive.

the TCP/IP stack of the defunct server seems to be operative, but no process
that requires a fork/exec would work (rsync/sshd) and will hung TCP connections
after the 3 way handshake is completed.

probably already fixed upstream in version of cifs 1.48 as shown by the linked
bug, with the commit linked in URL

Comment 1 Jeff Layton 2007-07-10 12:01:44 UTC
There's a pending update of the CIFS code already slated for inclusion into 5.1.
If you have a non-critical place to do so, could you test the kernels on my
people page and see if they help this issue:

http://people.redhat.com/jlayton

Comment 2 Jeff Layton 2008-01-15 11:24:07 UTC
No response to my query from July. I believe this is already fixed in the 5.1
release. Please reopen if that is not the case.



Note You need to log in before you can comment on or make changes to this bug.