Bug 669204

Summary:

Repeatable NFS mount hang

Product:

[Fedora] Fedora

Reporter:

Neal Kim <nkim>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED NEXTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

CC:

bergwolf, bruno.faccini, edgar.hoch, e.grammatico, francois, gansalmon, itamar, jcm, jlayton, jonathan, kernel-maint, kevin.constantine, madhu.chinakonda, nhorman, nkim, sprabhu, steved

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

672305 (view as bug list)

Environment:

Last Closed:

2011-10-12 14:42:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

672305, 692315

Attachments:

Description	Flags
The system back trace of the hang	none
rpc_debug output from Fedora	none
The system tap prob used to debug the problem at Pixar	none
patch -- never reuse the socket port after an xs_close	none

Description Neal Kim 2011-01-12 21:14:37 UTC

Description of problem:

We have a relatively repeatable case where a decent NFS load will cause an NFS mount to get stuck on a client, and any new process that then tries to access that mount also gets stuck. These machines are running Fedora 11, but we've caused it to happen with Fedora 13 as well. 

Currently running a Fedora 14 kernel 2.6.35.9-64 backported to Fedora 11.

One machine got stuck on a mount, and I found a task in the following state:

[154061.428023] flush-0:188   D 0000000000000002     0 17043      2 0x00000080
[154061.428023]  ffff88025fd21c40 0000000000000046 0000000000000400 0000000000000000
[154061.428023]  ffff88025fd21fd8 ffff88042ace5d00 0000000000015500 ffff88025fd21fd8
[154061.428023]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[154061.428023] Call Trace:
[154061.428023]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[154061.428023]  [<ffffffff81129602>] inode_wait+0xe/0x12
[154061.428023]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[154061.428023]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[154061.428023]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[154061.428023]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[154061.428023]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[154061.428023]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[154061.428023]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[154061.428023]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[154061.428023]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[154061.428023]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[154061.428023]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[154061.428023]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[154061.428023]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

I hadn't seen this before, so I looked online and found this post:

    http://kerneltrap.org/mailarchive/linux-kernel/2010/5/22/4573770/thread

The traceback looks almost the same, but it looks like 2.6.35.9-64 already includes the patch Trond posted. That patch is 0522f6ad in the linux-stable git repo. Could our issues be another case of that problem?

Version-Release number of selected component (if applicable):

kernel 2.6.35.9-64


How reproducible:

High workload.

Comment 2 Steve Dickson 2011-01-14 22:59:10 UTC

Created attachment 473610 [details]
The system back trace of the hang

It appear the following to process are looking for the same lock: 

[958985.445126] dfa           D 0000000000000000     0 11141      1 0x00000084
[958985.445126]  ffff8802b5f43c38 0000000000000082 ffff8802b5f43ba8 ffff880119c8a018
[958985.445126]  ffff8802b5f43fd8 ffff8801e86b5d00 0000000000015500 ffff8802b5f43fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff810d33ed>] ? sync_page+0x0/0x4a
[958985.445126]  [<ffffffff8146706d>] io_schedule+0x43/0x5d
[958985.445126]  [<ffffffff810d3433>] sync_page+0x46/0x4a
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff810d3605>] wait_on_page_bit+0x72/0x79
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff810db921>] ? pagevec_lookup_tag+0x25/0x2e
[958985.445126]  [<ffffffff810d3f0a>] filemap_fdatawait_range+0xa4/0x178
[958985.445126]  [<ffffffff810d3fff>] filemap_fdatawait+0x21/0x23
[958985.445126]  [<ffffffff811326dc>] writeback_single_inode+0xc6/0x1f8
[958985.445126]  [<ffffffff8113283b>] sync_inode+0x2d/0x3b
[958985.445126]  [<ffffffffa0417fe8>] nfs_wb_all+0x42/0x44 [nfs]
[958985.445126]  [<ffffffffa040b63a>] nfs_do_fsync+0x20/0x3d [nfs]
[958985.445126]  [<ffffffffa040b823>] nfs_file_flush+0x75/0x7d [nfs]
[958985.445126]  [<ffffffff81115a22>] filp_close+0x43/0x72
[958985.445126]  [<ffffffff81115aee>] sys_close+0x9d/0xd2
[958985.445126]  [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
[958985.445126] flush-0:202   D ffff8801004b2d80     0 11142      2 0x00000080
[958985.445126]  ffff880216045c40 0000000000000046 0000000000000400 0000000000000800
[958985.445126]  ffff880216045fd8 ffff88042b2d0000 0000000000015500 ffff880216045fd8
[958985.445126]  0000000000015500 0000000000015500 0000000000015500 0000000000015500
[958985.445126] Call Trace:
[958985.445126]  [<ffffffff81065f61>] ? spin_unlock_irqrestore+0xe/0x10
[958985.445126]  [<ffffffff811295f4>] ? inode_wait+0x0/0x12
[958985.445126]  [<ffffffff81129602>] inode_wait+0xe/0x12
[958985.445126]  [<ffffffff814675eb>] __wait_on_bit+0x48/0x7b
[958985.445126]  [<ffffffff81132b75>] ? writeback_sb_inodes+0x10b/0x142
[958985.445126]  [<ffffffff81132585>] inode_wait_for_writeback+0x8a/0xaa
[958985.445126]  [<ffffffff81065f20>] ? wake_bit_function+0x0/0x33
[958985.445126]  [<ffffffff81133378>] wb_writeback+0x1f4/0x212
[958985.445126]  [<ffffffff811334c7>] wb_do_writeback+0x131/0x147
[958985.445126]  [<ffffffff81133520>] bdi_writeback_task+0x43/0x11b
[958985.445126]  [<ffffffff81065df8>] ? bit_waitqueue+0x17/0xa9
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff810e74d0>] bdi_start_fn+0x63/0xca
[958985.445126]  [<ffffffff810e746d>] ? bdi_start_fn+0x0/0xca
[958985.445126]  [<ffffffff81065a4d>] kthread+0x7f/0x87
[958985.445126]  [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[958985.445126]  [<ffffffff810659ce>] ? kthread+0x0/0x87
[958985.445126]  [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10

Comment 3 Jon Masters 2011-01-16 02:03:51 UTC

FWIW, I am seeing a similar problem.

Comment 4 Chuck Ebbert 2011-01-17 22:55:46 UTC

Can someone get the held locks when this happens? (sysrq-D)

Comment 5 Neal Kim 2011-01-24 18:57:02 UTC

I have some great news.

Customer was able to reproduce this issue on RHEL6 and has captured the full sysrq-t traceback.

kernel-2.6.32-71.el6.x86_64

sosreport and traceback will be attached shortly.

Comment 6 Neal Kim 2011-01-24 19:02:46 UTC

I have made a clone of this BZ to RHEL6.

https://bugzilla.redhat.com/show_bug.cgi?id=672305

sosreport and sysrq-t attached.

Comment 7 Neal Kim 2011-02-04 21:02:52 UTC

Created attachment 477117 [details]
rpc_debug output from Fedora

Comment 8 Steve Dickson 2011-02-07 13:10:47 UTC

It appears this is showing up in upstream and  being reproduced with izone..
    http://marc.info/?t=129685592100001&r=1&w=2

I'll see if I can reproduce it using one our Netapps...

Comment 9 Jon Masters 2011-02-16 08:36:47 UTC

Keep us in the loop :)

Comment 10 Steve Dickson 2011-02-24 23:01:15 UTC

Created attachment 480877 [details]
The system tap prob used to debug the problem at Pixar

Comment 11 Jeff Layton 2011-02-28 12:34:51 UTC

Created attachment 481359 [details]
patch -- never reuse the socket port after an xs_close

Our best analysis was that the sunrpc layer was reusing local ports
inappropriately. The main symptom was that the RPC engine would get
into a state where the kernel_connect call would immediately return
EADDRNOTAVAIL. I *think* this was due to the connection already being
hashed, but I'm not as well versed in the tcp connection code.

If the socket becomes disconnected, the sunrpc state machine is
designed to try and reuse the port. Unfortunately, when it got that
error back, it would treat it as temporary and would try to keep
reusing that port even though it wasn't possible. This patch from
Trond simply zeroes out the port once it has been through xs_close.

What I'm less clear on is why this situation never seems to rectify
itself. It seems like maybe we have something sitting on this hashtable
indefinitely, but Trond seems to think that this patch is necessary
anyway to prevent unwanted delays in reconnect.

Comment 12 Jeff Layton 2011-02-28 12:44:38 UTC

The error seemed to be coming from __inet_hash_connect(). It looks like
it's searching for a slot to hash the new connection but that's failing,
probably due to something already being in that spot. I still have a
feeling that something is not quite right at the socket level -- it doesn't
seem like we should be getting this error back indefinitely.

That said, this was against a 2.6.35.9 kernel so it's possible that it's
already fixed in more recent kernels.

Comment 13 Bruno Faccini 2011-07-05 13:38:25 UTC

We suffer the same kind of problem and tried to apply "patch -- never reuse the socket port after an xs_close" as a work-around but it does not seem to work since the same port is still re-used upon failing re-connect attempts in a loop !!!

BTW, since this BZ has been duplicated to 672305, itself duplicated to 692315, itself duplicated to 701788, itself ... (I can not go furtheer the list since access to this last BZ has been denied to me ...) Anyway, is there finally a known+sure fix for this problem/BZs ???

Comment 14 Dave Jones 2011-10-11 18:45:18 UTC

f14 isn't going to get much beyond security fixes at this point in its
lifecycle.
Is this bug still relevant on f15 or f16beta ?