Bug 433249 - [EMC 4.7 bug] nfs_access_cache_shrinker() race with umount
[EMC 4.7 bug] nfs_access_cache_shrinker() race with umount
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.6
All Linux
medium Severity high
: rc
: ---
Assigned To: Peter Staubach
Martin Jenner
: OtherQA
Depends On:
Blocks: 241692
  Show dependency treegraph
 
Reported: 2008-02-17 22:26 EST by Niu Yawei
Modified: 2010-11-22 23:28 EST (History)
13 users (show)

See Also:
Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-24 15:26:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test for reproduce nfs_access_cache_shrinker() race with umount (501 bytes, application/x-gzip-compressed)
2008-03-05 21:19 EST, Niu Yawei
no flags Details
Patch from commit 6f23e3872cff238589f9bf39c71db2ea880c9a26 (1.26 KB, patch)
2008-03-13 14:28 EDT, Bryn M. Reeves
no flags Details | Diff
Patch rediffed for RHEL4 (55.0.12) (1006 bytes, patch)
2008-03-14 09:49 EDT, Bryn M. Reeves
no flags Details | Diff
Proposed patch (847 bytes, patch)
2008-03-26 15:55 EDT, Peter Staubach
no flags Details | Diff

  None (edit)
Description Niu Yawei 2008-02-17 22:26:58 EST
Description of problem:

nfs_access_cache_shrinker() calls i_grab() without s_umount, so it can be 
racing with umount, when user perform umount while system is reclaiming nfs 
access slab cache, it probably result in busy inode after umount.

Trond has post a fix for the race, please refer to http://www.mail-
archive.com/linux-nfs@vger.kernel.org/msg01104.html


Version-Release number of selected component (if applicable):

All RHEL4 and RHEL5 version.

How reproducible:

It's hard to reproduce, however, we found the "busy inode after umount" 
several times while doing our test over RHEL4.


Steps to Reproduce:
1.
2.
3.
  
Actual results:

Feb 10 04:24:51 dvt15687 kernel: Unable to handle kernel paging request at 
virtual address 47aec2ca
Feb 10 04:24:51 dvt15687 kernel:  printing eip:
Feb 10 04:24:51 dvt15687 kernel: c01721c2
Feb 10 04:24:51 dvt15687 kernel: *pde = 00007001
Feb 10 04:24:51 dvt15687 kernel: Oops: 0000 [#1]
Feb 10 04:24:51 dvt15687 kernel: SMP 
Feb 10 04:24:51 dvt15687 kernel: Modules linked in: mpfs(U) emchrdp(U) md5 
ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc 
dm_mirror dm_multipath dm_mod button battery ac uhci_hcd ehci_hcd hw_random 
e1000 floppy sg ext3 jbd qla2300 ata_piix libata mptscsih mptsas mptspi 
mptscsi mptbase qla2xxx scsi_transport_fc sd_mod scsi_mod
Feb 10 04:24:51 dvt15687 kernel: CPU:    2
Feb 10 04:24:51 dvt15687 kernel: EIP:    0060:[<c01721c2>]    Tainted: PF     
VLI
Feb 10 04:24:51 dvt15687 kernel: EFLAGS: 00010202   (2.6.9-55.ELsmp) 
Feb 10 04:24:51 dvt15687 kernel: EIP is at iput+0x25/0x61
Feb 10 04:24:51 dvt15687 kernel: eax: 47aec2b6   ebx: f66aa40c   ecx: 
f8c33d00   edx: f66aa474
Feb 10 04:24:51 dvt15687 kernel: esi: f66aa2c4   edi: f66aa40c   ebp: 
f66aa380   esp: f7d9aee0
Feb 10 04:24:51 dvt15687 kernel: ds: 007b   es: 007b   ss: 0068
Feb 10 04:24:51 dvt15687 kernel: Process kswapd0 (pid: 70, threadinfo=f7d9a000 
task=f7dc38b0)
Feb 10 04:24:51 dvt15687 kernel: Stack: d222c290 f8c03641 00000073 d222c290 
ca26d990 00000000 00000080 00000000 
Feb 10 04:24:51 dvt15687 kernel:        f70f0400 c0149a04 0017a200 00000000 
00000007 00000000 00035037 000000d0 
Feb 10 04:24:51 dvt15687 kernel:        00000020 c032af80 00000000 c032af80 
00000002 c014ad03 c02d3d86 00035037 
Feb 10 04:24:51 dvt15687 kernel: Call Trace:
Feb 10 04:24:51 dvt15687 kernel:  [<f8c03641>] 
nfs_access_cache_shrinker+0x115/0x182 [nfs]
Feb 10 04:24:51 dvt15687 kernel:  [<c0149a04>] shrink_slab+0xf8/0x161
Feb 10 04:24:51 dvt15687 kernel:  [<c014ad03>] balance_pgdat+0x1e1/0x30e
Feb 10 04:24:51 dvt15687 kernel:  [<c02d3d86>] schedule+0x87e/0x8ec
Feb 10 04:24:51 dvt15687 kernel:  [<c0120458>] prepare_to_wait+0x12/0x4c
Feb 10 04:24:51 dvt15687 kernel:  [<c014aefa>] kswapd+0xca/0xcc
Feb 10 04:24:51 dvt15687 kernel:  [<c012052d>] 
autoremove_wake_function+0x0/0x2d
Feb 10 04:24:52 dvt15687 kernel:  [<c02d5dfe>] ret_from_fork+0x6/0x14
Feb 10 04:24:52 dvt15687 kernel:  [<c012052d>] 
autoremove_wake_function+0x0/0x2d
Feb 10 04:24:52 dvt15687 kernel:  [<c014ae30>] kswapd+0x0/0xcc
Feb 10 04:24:52 dvt15687 kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Feb 10 04:24:52 dvt15687 kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 
83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 56 04 63 d8 2e c0 
85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba 70 fe 32 c0 e8 8e 


Expected results:

NFS is umounted successfully.

Additional info:
Comment 1 Wayne Berthiaume 2008-02-19 09:03:25 EST
Add EMC team to CC list...
Comment 2 Andrius Benokraitis 2008-02-21 13:06:03 EST
Wayne, this is severity of LOW still from you... is this still the case?
Comment 3 Wayne Berthiaume 2008-02-21 15:34:49 EST
Hi Andirus.

     After reviewing the concerns of the MPFS team and that RHEL 4 is getting 
close to EOL I'd like to move this to HIGH. EMC will release note this for now 
and we'll plan for a fix in RHEL4.7. 
     As a race condition it is difficult to reproduce. Thusfar, they have not 
observed the same issue in RHEL 5 but if the code stream is very similar then 
there is always the possibility. Can Red Hat determine if this change may be 
warranted in RHEL 5 as well? 

Regards,
Wayne.
Comment 5 Peter Staubach 2008-02-26 16:43:23 EST
Is the reproducer test available for use on our own systems?

Since this is a client side issue, is it possible to reproduce this
against any available server, ie. not just Celerra?
Comment 6 Niu Yawei 2008-03-05 21:19:24 EST
Created attachment 296980 [details]
test for reproduce nfs_access_cache_shrinker() race with umount

Try the test to reproduce the nfs umount race.
Comment 7 Niu Yawei 2008-03-05 21:23:57 EST
(In reply to comment #5)
> Is the reproducer test available for use on our own systems?
> Since this is a client side issue, is it possible to reproduce this
> against any available server, ie. not just Celerra?

Hi, Peter

There isn't any reproducer yet, I tried to reproduce it, but failed. I 
uploaded my test case in attachement #296980, maybe someone other can try it 
in his own setup or improve it.

As the description in comment #1, the race is a nfs client issue, not related 
to server.
Comment 8 Peter Staubach 2008-03-06 07:47:56 EST
Thanx!  I will look at the test case and see what I can find.
Comment 9 Bikash 2008-03-13 13:43:11 EDT
Any update on this one?

Thanks,
Comment 10 Peter Staubach 2008-03-13 14:19:52 EDT
I haven't had a chance to look at this one enough yet.  I've got some
other high priority bugs on my plate that I need to get completed and
then I will have time to look into this one.
Comment 18 Bryn M. Reeves 2008-03-14 09:49:24 EDT
Created attachment 298048 [details]
Patch rediffed for RHEL4 (55.0.12)
Comment 19 Peter Staubach 2008-03-26 15:55:43 EDT
Created attachment 299221 [details]
Proposed patch

Here is the patch ported to RHEL-4 68.26.
Comment 21 Bryn M. Reeves 2008-03-27 06:05:16 EDT
Patches seem identical so I assume the testing results from the patch in comment
#18 are still valid.
Comment 22 Vivek Goyal 2008-03-31 10:08:59 EDT
Committed in 68.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 23 Niu Yawei 2008-04-02 21:58:39 EDT
(In reply to comment #22)
> Committed in 68.28.EL . RPMS are available at 
http://people.redhat.com/vgoyal/rhel4/

Our QA engineer tested the patched rhel4 kernel for several days, and 
the "busy inode after umount" is not seen so far. Is there any plan to include 
the patch in next rhel4 release? and what's the release version? Thanks.
Comment 24 Andrius Benokraitis 2008-04-02 22:42:24 EDT
Yawei - yes, this is slated to be in the forthcoming RHEL 4.7 Beta.
Comment 25 Tommi Kaarmela 2008-04-10 07:25:50 EDT
Hello. On which version or Kernel this fix will be available for RHEL5. Thanks.
Comment 26 Bryn M. Reeves 2008-04-10 08:34:12 EDT
In reply to comment #25 - this is a RHEL4 bugzilla. If you are encountering this
problem on RHEL5 also please open a support request with Red Hat GSS so that any
necessary changes can be tracked appropriately.
Comment 29 Chris Ward 2008-06-05 11:55:25 EDT
~~~~~~~~~~~~~~
~ Attention: ~ Feedback requested regarding this **High Priority** bug. 
~~~~~~~~~~~~~~

A fix for this issue should be included in the latest packages contained in
RHEL4.7-Snapshot1--available now on partners.redhat.com.

After you (Red Hat Partner) have verified that this issue has been addressed,
submit a comment describing the passing results of your test in appropriate
detail, along with which snapshot and package version tested. The bugzilla will
be updated by Red Hat Quality Engineering for you when this information has been
received.

If you believe this issue has not properly fixed or you are unable to verify the
issue for any reason, please add a comment describing the most recent issues you
are experiencing, along with which snapshot and package version tested. 

If you believe the bug has not been fixed, change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and bugzilla will be updated for you. 

If you need assistance accessing ftp://partners.redhat.com, please contact your
Partner Manager.

Thank you
Red Hat QE Partner Management
Comment 30 Chris Ward 2008-06-06 04:30:09 EDT
Correction: This bug is **Medium Priority**. Sorry.
Comment 31 Chris Ward 2008-06-19 09:07:23 EDT
~~~~~~~~~~~~~~
~ Attention: ~ 
~~~~~~~~~~~~~~

A fix for this issue should be included in the latest kernel packages contained
in **kernel 2.6.9-73.EL**, accessible now on http://partners.redhat.com.

After you (Red Hat Partner) have verified that this issue has been addressed,
submit a comment describing the results of your test in appropriate detail,    
 along with which snapshot and package version tested. The bugzilla will be
updated by Red Hat Quality Engineering for you when this information has been  
    received.

If this issue has not been properly fixed or you are unable to verify the issue
for any reason, please add a comment describing the most recent issues you are
experiencing, along with which snapshot and package version tested. If you are
sure the bug has not been fixed, change the status of the bug to ASSIGNED.

For IssueTracker users, submit verification results as usual; Bugzilla will be
updated by Red Hat Quality Engineering for you.

For additional information, contact your Partner Manager.

Thank you,
Red Hat QE Partner Management
Comment 35 errata-xmlrpc 2008-07-24 15:26:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html
Comment 36 Wayne Berthiaume 2008-07-28 14:55:42 EDT
Sorry for the late response. Testing has been performed and the bug resolved.
Comment 37 Chris Ward 2008-07-29 03:30:28 EDT
Partners, I would like to thank you all for your participation in assuring the
quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.

Note You need to log in before you can comment on or make changes to this bug.