Red Hat Bugzilla – Bug 489931
NFS umount deadlock in rpciod with rpc_shutdown_client()
Last modified: 2016-06-20 07:39:30 EDT
The mail thread below appears to cover the important issues
related to this problem (copied from bug 487699#45). An attempt
at a backport of all the relevant patches for RHEL-4 seems a
little too risky as some of the dependant infrastructure is
Created attachment 343779 [details]
patchset #1 (bad, leads to oops)
This is my first stab at a patchset for this. When I run cthon04 test on a kernel with this set, it oopses fairly quickly:
general protection fault: 0000  SMP
last sysfs file: /block/dm-0/range
Modules linked in: nfs(FU) lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth rpcsec_gss_krb5(FU) auth_rpcgss(FU) testmgr_cipher testmgr aead crypto_blkcipher crypto_algapi des sunrpc(FU) ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy xen_vnif xen_balloon i2c_piix4 xen_vbd i2c_core xen_platform_pci serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2771, comm: nfsiod Tainted: GF 2.6.18-144.el5debug #1
RIP: 0010:[<ffffffff88501261>] [<ffffffff88501261>] :nfs:nfs_inode_remove_request+0x15/0x9f
RSP: 0018:ffff81002938bdc0 EFLAGS: 00010286
RAX: 6b6b6b6b6b6b6b6b RBX: ffff81002961b658 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff810029b75e74 RDI: ffff81002961b658
RBP: ffff810029b75c20 R08: ffff81003ffee1c0 R09: ffff810000012c00
R10: ffff81002961b6d0 R11: 0000000000000060 R12: ffff81002961b658
R13: 0000000000000282 R14: ffff810029b75c28 R15: ffffffff883a5fb6
FS: 0000000000000000(0000) GS:ffffffff80424000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003bab48b610 CR3: 000000003a3d0000 CR4: 00000000000006e0
Process nfsiod (pid: 2771, threadinfo ffff81002938a000, task ffff81002a138800)
Stack: 0000000000000297 ffff81002961b658 ffff810029b75c20 ffff810029b75c28
0000000000000282 ffffffff88503613 ffff810029b75c28 ffff810029b75cf8
0000000000000000 ffffffff883a5bb4 ffff810029b75c28 ffffffff883a5df8
...still looking at the cause, but it seems like this set is uncovering a race of some sort.
Created attachment 367564 [details]
Upstream patch #1
Created attachment 367565 [details]
Backport fix for upstream patch #1
Created attachment 367566 [details]
Upstream patch #2
Created attachment 367567 [details]
Upstream patch #3
Created attachment 367568 [details]
Upstream patch #4
The above upstream patch series (originally from the post in
comment #2) allegedly resolves the deadlock issue reported
here. An additional patch has been included to add some changes
needed due to differences in the kernel that the patch series
was originally created against. I've tested against NFS
connectathon to ensure basic functionality but can't really
test the actual deadlock problem.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
A kernel build with the above patches can be found at:
Could we test this out please.
If the kernel packages needed aren't present please let me know
and I'll upload the needed packages.
Do we have reproducer available? Could reporter provide reproduce steps so that QE can verify this?
I don't believe we have a reproducer. I only cloned this from the RHEL4 bz based on an analysis of the RHEL4 core that indicated that this problem was also present in RHEL5.
We need to confirm that there is third-party commitment to
test for the resolution of this request during the RHEL 5.5
Beta Test Phase before we can approve it for acceptance
into the release.
RHEL 5.5 Beta Test Phase is expected to begin around February
In order to avoid any unnecessary delays, please post a
confirmation as soon as possible, including the contact
information for testing engineers.
Any additional information about alternative testing variations we
could use to reproduce this issue in-house would be appreciated.
You can download this test kernel from http://people.redhat.com/dzickus/el5
Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~
RHEL 5.5 Beta has been released! There should be a fix present in this
release that addresses your request. Please test and report back results
here, by March 3rd 2010 (2010-03-03) or sooner.
Upon successful verification of this request, post your results and update
the Verified field in Bugzilla with the appropriate value.
If you encounter any issues while testing, please describe them and set
this bug into NEED_INFO. If you encounter new defects or have additional
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
Any chance that some future release of RHEL5.4 will include the patches for
this from RHEL5.5?
*** Bug 488063 has been marked as a duplicate of this bug. ***