Bug 669204

Summary: Repeatable NFS mount hang
Product: [Fedora] Fedora Reporter: Neal Kim <nkim>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: bergwolf, bruno.faccini, edgar.hoch, e.grammatico, francois, gansalmon, itamar, jcm, jlayton, jonathan, kernel-maint, kevin.constantine, madhu.chinakonda, nhorman, nkim, sprabhu, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 672305 (view as bug list) Environment:
Last Closed: 2011-10-12 14:42:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 672305, 692315    
Attachments:
Description Flags
The system back trace of the hang
none
rpc_debug output from Fedora
none
The system tap prob used to debug the problem at Pixar
none
patch -- never reuse the socket port after an xs_close none

Description Neal Kim 2011-01-12 21:14:37 UTC
Description of problem:

We have a relatively repeatable case where a decent NFS load will cause an NFS mount to get stuck on a client, and any new process that then tries to access that mount also gets stuck. These machines are running Fedora 11, but we've caused it to happen with Fedora 13 as well. 

Currently running a Fedora 14 kernel 2.6.35.9-64 backported to Fedora 11.

One machine got stuck on a mount, and I found a task in the following state:

[154061.428023] flush-0:188   D 0000000000000002     0 17043      2 0x00000080
[154061.428023]  ffff88025fd21c40 0000000000000046 0000000000000400 0000000000000000
[154061.428023]  ffff88025fd21fd8 ffff88042ace5d00 0000000000015500 ffff88025fd21fd8
[154061.428023]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[154061.428023] Call Trace:
[154061.428023]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[154061.428023]  [<ffffffff81129602>] inode_wait+0xe/0x12
[154061.428023]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[154061.428023]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[154061.428023]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[154061.428023]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[154061.428023]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[154061.428023]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[154061.428023]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[154061.428023]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[154061.428023]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[154061.428023]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[154061.428023]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

I hadn't seen this before, so I looked online and found this post:

    http://kerneltrap.org/mailarchive/linux-kernel/2010/5/22/4573770/thread

The traceback looks almost the same, but it looks like 2.6.35.9-64 already includes the patch Trond posted. That patch is 0522f6ad in the linux-stable git repo. Could our issues be another case of that problem?

Version-Release number of selected component (if applicable):

kernel 2.6.35.9-64


How reproducible:

High workload.

Comment 2 Steve Dickson 2011-01-14 22:59:10 UTC
Created attachment 473610 [details]
The system back trace of the hang

It appear the following to process are looking for the same lock: 

[958985.445126] dfa           D 0000000000000000     0 11141      1 0x00000084
[958985.445126]  ffff8802b5f43c38 0000000000000082 ffff8802b5f43ba8 ffff880119c8a018
[958985.445126]  ffff8802b5f43fd8 ffff8801e86b5d00 0000000000015500 ffff8802b5f43fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff8146706d>] io_schedule+0x43/0x5d
[958985.445126]  [<ffffffff810d3433>] sync_page+0x46/0x4a
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff810d3605>] wait_on_page_bit+0x72/0x79
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff810db921>] ? pagevec_lookup_tag+0x25/0x2e
[958985.445126]  [<ffffffff810d3f0a>] filemap_fdatawait_range+0xa4/0x178
[958985.445126]  [<ffffffff810d3fff>] filemap_fdatawait+0x21/0x23
[958985.445126]  [<ffffffff811326dc>] writeback_single_inode+0xc6/0x1f8
[958985.445126]  [<ffffffff8113283b>] sync_inode+0x2d/0x3b
[958985.445126]  [<ffffffffa0417fe8>] nfs_wb_all+0x42/0x44 [nfs]
[958985.445126]  [<ffffffffa040b63a>] nfs_do_fsync+0x20/0x3d [nfs]
[958985.445126]  [<ffffffffa040b823>] nfs_file_flush+0x75/0x7d [nfs]
[958985.445126]  [<ffffffff81115a22>] filp_close+0x43/0x72
[958985.445126]  [<ffffffff81115aee>] sys_close+0x9d/0xd2
[958985.445126]  [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
[958985.445126] flush-0:202   D ffff8801004b2d80     0 11142      2 0x00000080
[958985.445126]  ffff880216045c40 0000000000000046 0000000000000400 0000000000000800
[958985.445126]  ffff880216045fd8 ffff88042b2d0000 0000000000015500 ffff880216045fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff81065f61>] ? spin_unlock_irqrestore+0xe/0x10
[958985.445126]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[958985.445126]  [<ffffffff81129602>] inode_wait+0xe/0x12
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[958985.445126]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[958985.445126]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[958985.445126]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[958985.445126]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[958985.445126]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[958985.445126]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[958985.445126]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

Comment 3 Jon Masters 2011-01-16 02:03:51 UTC
FWIW, I am seeing a similar problem.

Comment 4 Chuck Ebbert 2011-01-17 22:55:46 UTC
Can someone get the held locks when this happens? (sysrq-D)

Comment 5 Neal Kim 2011-01-24 18:57:02 UTC
I have some great news.

Customer was able to reproduce this issue on RHEL6 and has captured the full sysrq-t traceback.

kernel-2.6.32-71.el6.x86_64

sosreport and traceback will be attached shortly.

Comment 6 Neal Kim 2011-01-24 19:02:46 UTC
I have made a clone of this BZ to RHEL6.

https://bugzilla.redhat.com/show_bug.cgi?id=672305

sosreport and sysrq-t attached.

Comment 7 Neal Kim 2011-02-04 21:02:52 UTC
Created attachment 477117 [details]
rpc_debug output from Fedora

Comment 8 Steve Dickson 2011-02-07 13:10:47 UTC
It appears this is showing up in upstream and  being reproduced with izone..
    http://marc.info/?t=129685592100001&r=1&w=2

I'll see if I can reproduce it using one our Netapps...

Comment 9 Jon Masters 2011-02-16 08:36:47 UTC
Keep us in the loop :)

Comment 10 Steve Dickson 2011-02-24 23:01:15 UTC
Created attachment 480877 [details]
The system tap prob used to debug the problem at Pixar

Comment 11 Jeff Layton 2011-02-28 12:34:51 UTC
Created attachment 481359 [details]
patch -- never reuse the socket port after an xs_close

Our best analysis was that the sunrpc layer was reusing local ports
inappropriately. The main symptom was that the RPC engine would get
into a state where the kernel_connect call would immediately return
EADDRNOTAVAIL. I *think* this was due to the connection already being
hashed, but I'm not as well versed in the tcp connection code.

If the socket becomes disconnected, the sunrpc state machine is
designed to try and reuse the port. Unfortunately, when it got that
error back, it would treat it as temporary and would try to keep
reusing that port even though it wasn't possible. This patch from
Trond simply zeroes out the port once it has been through xs_close.

What I'm less clear on is why this situation never seems to rectify
itself. It seems like maybe we have something sitting on this hashtable
indefinitely, but Trond seems to think that this patch is necessary
anyway to prevent unwanted delays in reconnect.

Comment 12 Jeff Layton 2011-02-28 12:44:38 UTC
The error seemed to be coming from __inet_hash_connect(). It looks like
it's searching for a slot to hash the new connection but that's failing,
probably due to something already being in that spot. I still have a
feeling that something is not quite right at the socket level -- it doesn't
seem like we should be getting this error back indefinitely.

That said, this was against a 2.6.35.9 kernel so it's possible that it's
already fixed in more recent kernels.

Comment 13 Bruno Faccini 2011-07-05 13:38:25 UTC
We suffer the same kind of problem and tried to apply "patch -- never reuse the socket port after an xs_close" as a work-around but it does not seem to work since the same port is still re-used upon failing re-connect attempts in a loop !!!

BTW, since this BZ has been duplicated to 672305, itself duplicated to 692315, itself duplicated to 701788, itself ... (I can not go furtheer the list since access to this last BZ has been denied to me ...) Anyway, is there finally a known+sure fix for this problem/BZs ???

Comment 14 Dave Jones 2011-10-11 18:45:18 UTC
f14 isn't going to get much beyond security fixes at this point in its
lifecycle.
Is this bug still relevant on f15 or f16beta ?