Description of problem: nfs_access_cache_shrinker() calls i_grab() without s_umount, so it can be racing with umount, when user perform umount while system is reclaiming nfs access slab cache, it probably result in busy inode after umount. Trond has post a fix for the race, please refer to http://www.mail- archive.com/linux-nfs.org/msg01104.html Version-Release number of selected component (if applicable): All RHEL4 and RHEL5 version. How reproducible: It's hard to reproduce, however, we found the "busy inode after umount" several times while doing our test over RHEL4. Steps to Reproduce: 1. 2. 3. Actual results: Feb 10 04:24:51 dvt15687 kernel: Unable to handle kernel paging request at virtual address 47aec2ca Feb 10 04:24:51 dvt15687 kernel: printing eip: Feb 10 04:24:51 dvt15687 kernel: c01721c2 Feb 10 04:24:51 dvt15687 kernel: *pde = 00007001 Feb 10 04:24:51 dvt15687 kernel: Oops: 0000 [#1] Feb 10 04:24:51 dvt15687 kernel: SMP Feb 10 04:24:51 dvt15687 kernel: Modules linked in: mpfs(U) emchrdp(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc dm_mirror dm_multipath dm_mod button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy sg ext3 jbd qla2300 ata_piix libata mptscsih mptsas mptspi mptscsi mptbase qla2xxx scsi_transport_fc sd_mod scsi_mod Feb 10 04:24:51 dvt15687 kernel: CPU: 2 Feb 10 04:24:51 dvt15687 kernel: EIP: 0060:[<c01721c2>] Tainted: PF VLI Feb 10 04:24:51 dvt15687 kernel: EFLAGS: 00010202 (2.6.9-55.ELsmp) Feb 10 04:24:51 dvt15687 kernel: EIP is at iput+0x25/0x61 Feb 10 04:24:51 dvt15687 kernel: eax: 47aec2b6 ebx: f66aa40c ecx: f8c33d00 edx: f66aa474 Feb 10 04:24:51 dvt15687 kernel: esi: f66aa2c4 edi: f66aa40c ebp: f66aa380 esp: f7d9aee0 Feb 10 04:24:51 dvt15687 kernel: ds: 007b es: 007b ss: 0068 Feb 10 04:24:51 dvt15687 kernel: Process kswapd0 (pid: 70, threadinfo=f7d9a000 task=f7dc38b0) Feb 10 04:24:51 dvt15687 kernel: Stack: d222c290 f8c03641 00000073 d222c290 ca26d990 00000000 00000080 00000000 Feb 10 04:24:51 dvt15687 kernel: f70f0400 c0149a04 0017a200 00000000 00000007 00000000 00035037 000000d0 Feb 10 04:24:51 dvt15687 kernel: 00000020 c032af80 00000000 c032af80 00000002 c014ad03 c02d3d86 00035037 Feb 10 04:24:51 dvt15687 kernel: Call Trace: Feb 10 04:24:51 dvt15687 kernel: [<f8c03641>] nfs_access_cache_shrinker+0x115/0x182 [nfs] Feb 10 04:24:51 dvt15687 kernel: [<c0149a04>] shrink_slab+0xf8/0x161 Feb 10 04:24:51 dvt15687 kernel: [<c014ad03>] balance_pgdat+0x1e1/0x30e Feb 10 04:24:51 dvt15687 kernel: [<c02d3d86>] schedule+0x87e/0x8ec Feb 10 04:24:51 dvt15687 kernel: [<c0120458>] prepare_to_wait+0x12/0x4c Feb 10 04:24:51 dvt15687 kernel: [<c014aefa>] kswapd+0xca/0xcc Feb 10 04:24:51 dvt15687 kernel: [<c012052d>] autoremove_wake_function+0x0/0x2d Feb 10 04:24:52 dvt15687 kernel: [<c02d5dfe>] ret_from_fork+0x6/0x14 Feb 10 04:24:52 dvt15687 kernel: [<c012052d>] autoremove_wake_function+0x0/0x2d Feb 10 04:24:52 dvt15687 kernel: [<c014ae30>] kswapd+0x0/0xcc Feb 10 04:24:52 dvt15687 kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb Feb 10 04:24:52 dvt15687 kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 56 04 63 d8 2e c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba 70 fe 32 c0 e8 8e Expected results: NFS is umounted successfully. Additional info:
Add EMC team to CC list...
Wayne, this is severity of LOW still from you... is this still the case?
Hi Andirus. After reviewing the concerns of the MPFS team and that RHEL 4 is getting close to EOL I'd like to move this to HIGH. EMC will release note this for now and we'll plan for a fix in RHEL4.7. As a race condition it is difficult to reproduce. Thusfar, they have not observed the same issue in RHEL 5 but if the code stream is very similar then there is always the possibility. Can Red Hat determine if this change may be warranted in RHEL 5 as well? Regards, Wayne.
Is the reproducer test available for use on our own systems? Since this is a client side issue, is it possible to reproduce this against any available server, ie. not just Celerra?
Created attachment 296980 [details] test for reproduce nfs_access_cache_shrinker() race with umount Try the test to reproduce the nfs umount race.
(In reply to comment #5) > Is the reproducer test available for use on our own systems? > Since this is a client side issue, is it possible to reproduce this > against any available server, ie. not just Celerra? Hi, Peter There isn't any reproducer yet, I tried to reproduce it, but failed. I uploaded my test case in attachement #296980, maybe someone other can try it in his own setup or improve it. As the description in comment #1, the race is a nfs client issue, not related to server.
Thanx! I will look at the test case and see what I can find.
Any update on this one? Thanks,
I haven't had a chance to look at this one enough yet. I've got some other high priority bugs on my plate that I need to get completed and then I will have time to look into this one.
Created attachment 298048 [details] Patch rediffed for RHEL4 (55.0.12)
Created attachment 299221 [details] Proposed patch Here is the patch ported to RHEL-4 68.26.
Patches seem identical so I assume the testing results from the patch in comment #18 are still valid.
Committed in 68.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
(In reply to comment #22) > Committed in 68.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Our QA engineer tested the patched rhel4 kernel for several days, and the "busy inode after umount" is not seen so far. Is there any plan to include the patch in next rhel4 release? and what's the release version? Thanks.
Yawei - yes, this is slated to be in the forthcoming RHEL 4.7 Beta.
Hello. On which version or Kernel this fix will be available for RHEL5. Thanks.
In reply to comment #25 - this is a RHEL4 bugzilla. If you are encountering this problem on RHEL5 also please open a support request with Red Hat GSS so that any necessary changes can be tracked appropriately.
~~~~~~~~~~~~~~ ~ Attention: ~ Feedback requested regarding this **High Priority** bug. ~~~~~~~~~~~~~~ A fix for this issue should be included in the latest packages contained in RHEL4.7-Snapshot1--available now on partners.redhat.com. After you (Red Hat Partner) have verified that this issue has been addressed, submit a comment describing the passing results of your test in appropriate detail, along with which snapshot and package version tested. The bugzilla will be updated by Red Hat Quality Engineering for you when this information has been received. If you believe this issue has not properly fixed or you are unable to verify the issue for any reason, please add a comment describing the most recent issues you are experiencing, along with which snapshot and package version tested. If you believe the bug has not been fixed, change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and bugzilla will be updated for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you Red Hat QE Partner Management
Correction: This bug is **Medium Priority**. Sorry.
~~~~~~~~~~~~~~ ~ Attention: ~ ~~~~~~~~~~~~~~ A fix for this issue should be included in the latest kernel packages contained in **kernel 2.6.9-73.EL**, accessible now on http://partners.redhat.com. After you (Red Hat Partner) have verified that this issue has been addressed, submit a comment describing the results of your test in appropriate detail, along with which snapshot and package version tested. The bugzilla will be updated by Red Hat Quality Engineering for you when this information has been received. If this issue has not been properly fixed or you are unable to verify the issue for any reason, please add a comment describing the most recent issues you are experiencing, along with which snapshot and package version tested. If you are sure the bug has not been fixed, change the status of the bug to ASSIGNED. For IssueTracker users, submit verification results as usual; Bugzilla will be updated by Red Hat Quality Engineering for you. For additional information, contact your Partner Manager. Thank you, Red Hat QE Partner Management
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html
Sorry for the late response. Testing has been performed and the bug resolved.
Partners, I would like to thank you all for your participation in assuring the quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.