669204 – Repeatable NFS mount hang

Bug 669204 - Repeatable NFS mount hang

Summary: Repeatable NFS mount hang

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	672305 692315
TreeView+	depends on / blocked

Reported:	2011-01-12 21:14 UTC by Neal Kim
Modified:	2011-10-12 14:42 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Clones:	672305 (view as bug list)
Environment:
Last Closed:	2011-10-12 14:42:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
The system back trace of the hang (311.14 KB, text/plain) 2011-01-14 22:59 UTC, Steve Dickson	no flags	Details
rpc_debug output from Fedora (37.75 KB, application/octet-stream) 2011-02-04 21:02 UTC, Neal Kim	no flags	Details
The system tap prob used to debug the problem at Pixar (1.37 KB, text/plain) 2011-02-24 23:01 UTC, Steve Dickson	no flags	Details
patch -- never reuse the socket port after an xs_close (2.85 KB, patch) 2011-02-28 12:34 UTC, Jeff Layton	no flags	Details \| Diff
View All

Description Neal Kim 2011-01-12 21:14:37 UTC

Description of problem:

We have a relatively repeatable case where a decent NFS load will cause an NFS mount to get stuck on a client, and any new process that then tries to access that mount also gets stuck. These machines are running Fedora 11, but we've caused it to happen with Fedora 13 as well. 

Currently running a Fedora 14 kernel 2.6.35.9-64 backported to Fedora 11.

One machine got stuck on a mount, and I found a task in the following state:

[154061.428023] flush-0:188   D 0000000000000002     0 17043      2 0x00000080
[154061.428023]  ffff88025fd21c40 0000000000000046 0000000000000400 0000000000000000
[154061.428023]  ffff88025fd21fd8 ffff88042ace5d00 0000000000015500 ffff88025fd21fd8
[154061.428023]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[154061.428023] Call Trace:
[154061.428023]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[154061.428023]  [<ffffffff81129602>] inode_wait+0xe/0x12
[154061.428023]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[154061.428023]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[154061.428023]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[154061.428023]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[154061.428023]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[154061.428023]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[154061.428023]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[154061.428023]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[154061.428023]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[154061.428023]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[154061.428023]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

I hadn't seen this before, so I looked online and found this post:

    http://kerneltrap.org/mailarchive/linux-kernel/2010/5/22/4573770/thread

The traceback looks almost the same, but it looks like 2.6.35.9-64 already includes the patch Trond posted. That patch is 0522f6ad in the linux-stable git repo. Could our issues be another case of that problem?

Version-Release number of selected component (if applicable):

kernel 2.6.35.9-64


How reproducible:

High workload.

Comment 2 Steve Dickson 2011-01-14 22:59:10 UTC

Created attachment 473610 [details]
The system back trace of the hang

It appear the following to process are looking for the same lock: 

[958985.445126] dfa           D 0000000000000000     0 11141      1 0x00000084
[958985.445126]  ffff8802b5f43c38 0000000000000082 ffff8802b5f43ba8 ffff880119c8a018
[958985.445126]  ffff8802b5f43fd8 ffff8801e86b5d00 0000000000015500 ffff8802b5f43fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff8146706d>] io_schedule+0x43/0x5d
[958985.445126]  [<ffffffff810d3433>] sync_page+0x46/0x4a
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff810d3605>] wait_on_page_bit+0x72/0x79
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff810db921>] ? pagevec_lookup_tag+0x25/0x2e
[958985.445126]  [<ffffffff810d3f0a>] filemap_fdatawait_range+0xa4/0x178
[958985.445126]  [<ffffffff810d3fff>] filemap_fdatawait+0x21/0x23
[958985.445126]  [<ffffffff811326dc>] writeback_single_inode+0xc6/0x1f8
[958985.445126]  [<ffffffff8113283b>] sync_inode+0x2d/0x3b
[958985.445126]  [<ffffffffa0417fe8>] nfs_wb_all+0x42/0x44 [nfs]
[958985.445126]  [<ffffffffa040b63a>] nfs_do_fsync+0x20/0x3d [nfs]
[958985.445126]  [<ffffffffa040b823>] nfs_file_flush+0x75/0x7d [nfs]
[958985.445126]  [<ffffffff81115a22>] filp_close+0x43/0x72
[958985.445126]  [<ffffffff81115aee>] sys_close+0x9d/0xd2
[958985.445126]  [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
[958985.445126] flush-0:202   D ffff8801004b2d80     0 11142      2 0x00000080
[958985.445126]  ffff880216045c40 0000000000000046 0000000000000400 0000000000000800
[958985.445126]  ffff880216045fd8 ffff88042b2d0000 0000000000015500 ffff880216045fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff81065f61>] ? spin_unlock_irqrestore+0xe/0x10
[958985.445126]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[958985.445126]  [<ffffffff81129602>] inode_wait+0xe/0x12
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[958985.445126]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[958985.445126]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[958985.445126]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[958985.445126]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[958985.445126]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[958985.445126]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[958985.445126]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

Comment 3 Jon Masters 2011-01-16 02:03:51 UTC

FWIW, I am seeing a similar problem.

Comment 4 Chuck Ebbert 2011-01-17 22:55:46 UTC

Can someone get the held locks when this happens? (sysrq-D)

Comment 5 Neal Kim 2011-01-24 18:57:02 UTC

I have some great news.

Customer was able to reproduce this issue on RHEL6 and has captured the full sysrq-t traceback.

kernel-2.6.32-71.el6.x86_64

sosreport and traceback will be attached shortly.

Comment 6 Neal Kim 2011-01-24 19:02:46 UTC

I have made a clone of this BZ to RHEL6.

https://bugzilla.redhat.com/show_bug.cgi?id=672305

sosreport and sysrq-t attached.

Comment 7 Neal Kim 2011-02-04 21:02:52 UTC

Created attachment 477117 [details]
rpc_debug output from Fedora

Comment 8 Steve Dickson 2011-02-07 13:10:47 UTC

It appears this is showing up in upstream and  being reproduced with izone..
    http://marc.info/?t=129685592100001&r=1&w=2

I'll see if I can reproduce it using one our Netapps...

Comment 9 Jon Masters 2011-02-16 08:36:47 UTC

Keep us in the loop :)

Comment 10 Steve Dickson 2011-02-24 23:01:15 UTC

Created attachment 480877 [details]
The system tap prob used to debug the problem at Pixar

Comment 11 Jeff Layton 2011-02-28 12:34:51 UTC

Created attachment 481359 [details]
patch -- never reuse the socket port after an xs_close

Our best analysis was that the sunrpc layer was reusing local ports
inappropriately. The main symptom was that the RPC engine would get
into a state where the kernel_connect call would immediately return
EADDRNOTAVAIL. I *think* this was due to the connection already being
hashed, but I'm not as well versed in the tcp connection code.

If the socket becomes disconnected, the sunrpc state machine is
designed to try and reuse the port. Unfortunately, when it got that
error back, it would treat it as temporary and would try to keep
reusing that port even though it wasn't possible. This patch from
Trond simply zeroes out the port once it has been through xs_close.

What I'm less clear on is why this situation never seems to rectify
itself. It seems like maybe we have something sitting on this hashtable
indefinitely, but Trond seems to think that this patch is necessary
anyway to prevent unwanted delays in reconnect.

Comment 12 Jeff Layton 2011-02-28 12:44:38 UTC

The error seemed to be coming from __inet_hash_connect(). It looks like
it's searching for a slot to hash the new connection but that's failing,
probably due to something already being in that spot. I still have a
feeling that something is not quite right at the socket level -- it doesn't
seem like we should be getting this error back indefinitely.

That said, this was against a 2.6.35.9 kernel so it's possible that it's
already fixed in more recent kernels.

Comment 13 Bruno Faccini 2011-07-05 13:38:25 UTC

We suffer the same kind of problem and tried to apply "patch -- never reuse the socket port after an xs_close" as a work-around but it does not seem to work since the same port is still re-used upon failing re-connect attempts in a loop !!!

BTW, since this BZ has been duplicated to 672305, itself duplicated to 692315, itself duplicated to 701788, itself ... (I can not go furtheer the list since access to this last BZ has been denied to me ...) Anyway, is there finally a known+sure fix for this problem/BZs ???

Comment 14 Dave Jones 2011-10-11 18:45:18 UTC

f14 isn't going to get much beyond security fixes at this point in its
lifecycle.
Is this bug still relevant on f15 or f16beta ?

Note You need to log in before you can comment on or make changes to this bug.