Bug 489931 - NFS umount deadlock in rpciod with rpc_shutdown_client()
Summary: NFS umount deadlock in rpciod with rpc_shutdown_client()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 488063 (view as bug list)
Depends On:
Blocks: 499522 526950 533192
TreeView+ depends on / blocked
 
Reported: 2009-03-12 15:42 UTC by Jeff Layton
Modified: 2018-10-27 14:23 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 487699
Environment:
Last Closed: 2010-03-30 07:32:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patchset #1 (bad, leads to oops) (22.05 KB, patch)
2009-05-13 14:42 UTC, Jeff Layton
no flags Details | Diff
Upstream patch #1 (4.67 KB, patch)
2009-11-05 03:18 UTC, Ian Kent
no flags Details | Diff
Backport fix for upstream patch #1 (1.85 KB, patch)
2009-11-05 03:18 UTC, Ian Kent
no flags Details | Diff
Upstream patch #2 (7.49 KB, patch)
2009-11-05 03:19 UTC, Ian Kent
no flags Details | Diff
Upstream patch #3 (6.79 KB, patch)
2009-11-05 03:20 UTC, Ian Kent
no flags Details | Diff
Upstream patch #4 (2.41 KB, patch)
2009-11-05 03:20 UTC, Ian Kent
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Comment 1 Ian Kent 2009-03-17 11:55:30 UTC
The mail thread below appears to cover the important issues
related to this problem (copied from bug 487699#45). An attempt
at a  backport of all the relevant patches for RHEL-4 seems a
little too risky as some of the dependant infrastructure is
quite different.

http://marc.info/?l=linux-nfs&m=120000214806703&w=2

Comment 2 Jeff Layton 2009-05-13 14:42:50 UTC
Created attachment 343779 [details]
patchset #1 (bad, leads to oops)

This is my first stab at a patchset for this. When I run cthon04 test on a kernel with this set, it oopses fairly quickly:

general protection fault: 0000 [1] SMP 
last sysfs file: /block/dm-0/range
CPU 0 
Modules linked in: nfs(FU) lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth rpcsec_gss_krb5(FU) auth_rpcgss(FU) testmgr_cipher testmgr aead crypto_blkcipher crypto_algapi des sunrpc(FU) ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy xen_vnif xen_balloon i2c_piix4 xen_vbd i2c_core xen_platform_pci serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2771, comm: nfsiod Tainted: GF     2.6.18-144.el5debug #1
RIP: 0010:[<ffffffff88501261>]  [<ffffffff88501261>] :nfs:nfs_inode_remove_request+0x15/0x9f
RSP: 0018:ffff81002938bdc0  EFLAGS: 00010286
RAX: 6b6b6b6b6b6b6b6b RBX: ffff81002961b658 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff810029b75e74 RDI: ffff81002961b658
RBP: ffff810029b75c20 R08: ffff81003ffee1c0 R09: ffff810000012c00
R10: ffff81002961b6d0 R11: 0000000000000060 R12: ffff81002961b658
R13: 0000000000000282 R14: ffff810029b75c28 R15: ffffffff883a5fb6
FS:  0000000000000000(0000) GS:ffffffff80424000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003bab48b610 CR3: 000000003a3d0000 CR4: 00000000000006e0
Process nfsiod (pid: 2771, threadinfo ffff81002938a000, task ffff81002a138800)
Stack:  0000000000000297 ffff81002961b658 ffff810029b75c20 ffff810029b75c28
 0000000000000282 ffffffff88503613 ffff810029b75c28 ffff810029b75cf8
 0000000000000000 ffffffff883a5bb4 ffff810029b75c28 ffffffff883a5df8
Call Trace:
 [<ffffffff88503613>] :nfs:nfs_commit_done+0x138/0x195
 [<ffffffff883a5bb4>] :sunrpc:rpc_exit_task+0x25/0x6e
 [<ffffffff883a5df8>] :sunrpc:__rpc_execute+0x92/0x250
 [<ffffffff800509c6>] run_workqueue+0x9a/0xf4
 [<ffffffff8004d1c5>] worker_thread+0x0/0x122
 [<ffffffff800a3a80>] keventd_create_kthread+0x0/0xc9
 [<ffffffff8004d2b5>] worker_thread+0xf0/0x122
 [<ffffffff8008ffe6>] default_wake_function+0x0/0xe
 [<ffffffff800a3a80>] keventd_create_kthread+0x0/0xc9
 [<ffffffff800353b2>] kthread+0xfe/0x132

...still looking at the cause, but it seems like this set is uncovering a race of some sort.

Comment 10 Ian Kent 2009-11-05 03:18:05 UTC
Created attachment 367564 [details]
Upstream patch #1

Comment 11 Ian Kent 2009-11-05 03:18:54 UTC
Created attachment 367565 [details]
Backport fix for upstream patch #1

Comment 12 Ian Kent 2009-11-05 03:19:34 UTC
Created attachment 367566 [details]
Upstream patch #2

Comment 13 Ian Kent 2009-11-05 03:20:19 UTC
Created attachment 367567 [details]
Upstream patch #3

Comment 14 Ian Kent 2009-11-05 03:20:58 UTC
Created attachment 367568 [details]
Upstream patch #4

Comment 15 Ian Kent 2009-11-05 03:27:46 UTC
The above upstream patch series (originally from the post in
comment #2) allegedly resolves the deadlock issue reported
here. An additional patch has been included to add some changes
needed due to differences in the kernel that the patch series
was originally created against. I've tested against NFS
connectathon to ensure basic functionality but can't really
test the actual deadlock problem.

Comment 16 RHEL Program Management 2009-11-05 03:31:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 17 Ian Kent 2009-11-05 03:45:20 UTC
A kernel build with the above patches can be found at:
http://people.redhat.com/~ikent/kernel-2.6.18-172.el5.bz489931.1

Could we test this out please.

If the kernel packages needed aren't present please let me know
and I'll upload the needed packages.

Comment 18 Jan Tluka 2009-11-18 15:37:40 UTC
Do we have reproducer available? Could reporter provide reproduce steps so that QE can verify this?

Comment 19 Jeff Layton 2009-11-18 17:12:33 UTC
I don't believe we have a reproducer. I only cloned this from the RHEL4 bz based on an analysis of the RHEL4 core that indicated that this problem was also present in RHEL5.

Comment 21 Chris Ward 2009-11-19 15:05:47 UTC
@Jeff, @GSS

We need to confirm that there is third-party commitment to 
test for the resolution of this request during the RHEL 5.5 
Beta Test Phase before we can approve it for acceptance 
into the release.

RHEL 5.5 Beta Test Phase is expected to begin around February
2010.

In order to avoid any unnecessary delays, please post a 
confirmation as soon as possible, including the contact 
information for testing engineers.

Any additional information about alternative testing variations we 
could use to reproduce this issue in-house would be appreciated.

Comment 29 Don Zickus 2009-12-09 18:11:21 UTC
in kernel-2.6.18-178.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 39 Chris Ward 2010-02-11 10:26:08 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 43 errata-xmlrpc 2010-03-30 07:32:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 45 David Bein 2010-05-03 03:04:25 UTC
Any chance that some future release of RHEL5.4 will include the patches for
this from RHEL5.5?

Comment 46 Jeff Layton 2013-07-02 14:23:01 UTC
*** Bug 488063 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.