Bug 249394 - CIFS deadlocks in cifs_get_inode_info_unix
Summary: CIFS deadlocks in cifs_get_inode_info_unix
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 5.0
Hardware: All Linux
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
Keywords: Regression
Depends On:
Blocks: 252315
TreeView+ depends on / blocked
Reported: 2007-07-24 11:54 UTC by Bryn M. Reeves
Modified: 2018-10-19 19:43 UTC (History)
7 users (show)

Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-07 19:56:33 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
avoid sleeping inside is_size_safe_to_change (1.63 KB, patch)
2007-07-25 12:07 UTC, Bryn M. Reeves
no flags Details | Diff
patch -- Amit's patch ported to 5.1 beta kernels (1.81 KB, patch)
2007-08-15 11:38 UTC, Jeff Layton
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC

Description Bryn M. Reeves 2007-07-24 11:54:19 UTC
Description of problem:
This is related to the following kernel.org bugzillas:


A change was introduced in Febuary to fix hangs in i_size_read caused by calling
i_size_write without holding i_mutex. The fix for this worked out in KBZ#7903
was to use the i_lock spinlock to synchronize cifs' use of i_size_xxxx(). This
introduces another deadlock:

  if (is_size_safe_to_change(cifsInfo, end_of_file)) {

is_size_safe_to_change() can end up sleeping via the following sequence of calls:

is_size_safe_to_change+0x24/0x90 [cifs]
find_writable_file+0xe4/0x184 [cifs]
cifs_reopen_file+0x2e4/0x524 [cifs]
CIFSSMBOpen+0x2d8/0x518 [cifs]
SendReceive+0x2dc/0x598 [cifs]
wait_for_response+0xe8/0x1bc [cifs]

If we wind up with one thread sleeping in this code path it is possible for
the system to deadlock should all other CPUs enter CIFS and attempt to take
this inode's i_lock. This has been seen on 2 CPU power5 systems.

Version-Release number of selected component (if applicable):

How reproducible:
Moderate - stress test of several hours duration required.

Steps to Reproduce:
1. Run LTP locktest case (testcases/network/nfsv4/locks/) for several hours:


Actual results:
After some time the system deadlocks and reports soft lockups pointing at the
above CIFS code:

<3>BUG: soft lockup detected on CPU#1!.
<4>Call Trace:.
<4>[C0000000E4D3F290] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable).
<4>[C0000000E4D3F330] [C0000000000A68C4] .softlockup_tick+0xf0/0x13c.
<4>[C0000000E4D3F3E0] [C000000000075BF8] .run_local_timers+0x1c/0x30.
<4>[C0000000E4D3F460] [C0000000000235F0] .timer_interrupt+0xa8/0x498.
<4>[C0000000E4D3F540] [C0000000000034F4] decrementer_common+0xf4/0x100.
<4>--- Exception: 901 at ._spin_lock+0x40/0x88.
<4>    LR = .cifs_get_inode_info_unix+0x784/0x9c0 [cifs].
<4>[C0000000E4D3F830] [C0000000EA44EE40] 0xc0000000ea44ee40 (unreliable).
<4>[C0000000E4D3F8B0] [D000000000B4BE78] .cifs_get_inode_info_unix+0x784/0x9c0
<4>[C0000000E4D3FA20] [D000000000B4A908] .cifs_open+0x72c/0x8d0 [cifs].
<4>[C0000000E4D3FB30] [C0000000000E8754] .__dentry_open+0x13c/0x2bc.
<4>[C0000000E4D3FBE0] [C0000000000E8A48] .do_filp_open+0x50/0x70.
<4>[C0000000E4D3FD00] [C0000000000E8ADC] .do_sys_open+0x74/0x130.
<4>[C0000000E4D3FDB0] [C000000000128210] .compat_sys_open+0x24/0x38.
<4>[C0000000E4D3FE30] [C0000000000086A4] syscall_exit+0x0/0x40.
<3>BUG: soft lockup detected on CPU#0!.
<4>Call Trace:.
<4>[C0000000B307B290] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable).
<4>[C0000000B307B330] [C0000000000A68C4] .softlockup_tick+0xf0/0x13c.
<4>[C0000000B307B3E0] [C000000000075BF8] .run_local_timers+0x1c/0x30.
<4>[C0000000B307B460] [C0000000000235F0] .timer_interrupt+0xa8/0x498.
<4>[C0000000B307B540] [C0000000000034F4] decrementer_common+0xf4/0x100.
<4>--- Exception: 901 at ._spin_lock+0x50/0x88.
<4>    LR = .cifs_get_inode_info_unix+0x784/0x9c0 [cifs].
<4>[C0000000B307B830] [D000000000B889F8] 0xd000000000b889f8 (unreliable).
<4>[C0000000B307B8B0] [D000000000B4BE78] .cifs_get_inode_info_unix+0x784/0x9c0
<4>[C0000000B307BA20] [D000000000B4A908] .cifs_open+0x72c/0x8d0 [cifs].
<4>[C0000000B307BB30] [C0000000000E8754] .__dentry_open+0x13c/0x2bc.
<4>[C0000000B307BBE0] [C0000000000E8A48] .do_filp_open+0x50/0x70.
<4>[C0000000B307BD00] [C0000000000E8ADC] .do_sys_open+0x74/0x130.
<4>[C0000000B307BDB0] [C000000000128210] .compat_sys_open+0x24/0x38.
<4>[C0000000B307BE30] [C0000000000086A4] syscall_exit+0x0/0x40.

Expected results:
No deadlock, fs test runs to completion.

Additional info:
The original kernel.org bugzilla was triggered via multiple instances of cp/find
etc. It's not clear yet if this problem can also be triggered by these simple
tests but is reported to be reliably triggered by the LTP testcase.

Comment 2 Bryn M. Reeves 2007-07-25 12:07:02 UTC
Created attachment 159920 [details]
avoid sleeping inside is_size_safe_to_change

Amit's 5.1 backport of Steve French's upstream patch

Comment 5 RHEL Product and Program Management 2007-08-15 10:37:46 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 8 Jeff Layton 2007-08-15 11:38:26 UTC
Created attachment 161345 [details]
patch -- Amit's patch ported to 5.1 beta kernels

Same patch as Amit's patch but fixed up to apply to 5.1 beta kernels.

Comment 9 Jeff Layton 2007-08-15 15:28:00 UTC
For QA purposes, could you elaborate on exactly how you reproduce this issue?
i.e. how are you running locktest here?

Comment 11 Jeff Layton 2007-08-15 16:11:17 UTC
Was able to reproduce this pretty quickly by running the ltp locktest like this:

# locktest -n 50 -f /file/on/cifs

...while running "service smb restart" in a loop on the server. I want to
reproduce it a couple of times to make sure I can do it reliably and then I'll
test whether the patch fixes it.

Comment 15 Don Zickus 2007-08-21 18:35:38 UTC
in 2.6.18-42.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 17 Mike Gahagan 2007-09-13 16:10:32 UTC
reproduced problem using testcase in comment 11 using the -40 kernel, was unable
to reproduce it using the -45 kernel.

Comment 19 errata-xmlrpc 2007-11-07 19:56:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.