Bug 851673 - [regression] nfs client activity leads to hung tasks
[regression] nfs client activity leads to hung tasks
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
All Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Jeff Layton
Filesystem QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-24 12:36 EDT by Orion Poplawski
Modified: 2014-06-18 03:42 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-20 11:33:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2012-08-24 12:36:15 EDT
Description of problem:

Twice now since updating one of our file servers to EL6.3 we've had it have hung kjournald and nfsd processes.  Also seen once on another (64-bit) EL6.3 server.

Aug 24 09:54:11 alexandria kernel: INFO: task kjournald:1585 blocked for more than 120 seconds.
Aug 24 09:54:11 alexandria kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th
is message.
Aug 24 09:54:11 alexandria kernel: kjournald     D 00000086     0  1585      2 0x00000000
Aug 24 09:54:11 alexandria kernel: efcde550 00000046 d0008680 00000086 c044e285 00000000 00000000
 000065eb
Aug 24 09:54:11 alexandria kernel: 00000000 c15ccd00 00022968 ca13836f 00022968 c0b25680 c0b25680
 efcde7f8
Aug 24 09:54:11 alexandria kernel: c0b25680 c0b21024 c0b25680 efcde7f8 f7106000 ef88e2ec 00000003
 00000001
Aug 24 09:54:11 alexandria kernel: Call Trace:
Aug 24 09:54:11 alexandria kernel: [<c044e285>] ? try_to_wake_up+0x205/0x3a0
Aug 24 09:54:11 alexandria kernel: [<c0440372>] ? __wake_up+0x42/0x60
Aug 24 09:54:11 alexandria kernel: [<c04811e0>] ? ktime_get_ts+0xd0/0x100
Aug 24 09:54:11 alexandria kernel: [<c083ce39>] ? io_schedule+0x59/0xa0
Aug 24 09:54:11 alexandria kernel: [<c05581f0>] ? sync_buffer+0x30/0x40
Aug 24 09:54:11 alexandria kernel: [<c083d625>] ? __wait_on_bit+0x45/0x70
Aug 24 09:54:11 alexandria kernel: [<c05581c0>] ? sync_buffer+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c05581c0>] ? sync_buffer+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c083d6b8>] ? out_of_line_wait_on_bit+0x68/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476a50>] ? wake_bit_function+0x0/0x60
Aug 24 09:54:11 alexandria kernel: [<c05581ae>] ? __wait_on_buffer+0x1e/0x30
Aug 24 09:54:11 alexandria kernel: [<f929f89f>] ? journal_commit_transaction+0x8cf/0x1150 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c0466137>] ? lock_timer_base+0x27/0x50
Aug 24 09:54:11 alexandria kernel: [<c0466b32>] ? try_to_del_timer_sync+0x62/0xb0
Aug 24 09:54:11 alexandria kernel: [<f92a49ad>] ? kjournald+0xad/0x1f0 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<f92a4900>] ? kjournald+0x0/0x1f0 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c04767d4>] ? kthread+0x74/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476760>] ? kthread+0x0/0x80
Aug 24 09:54:11 alexandria kernel: [<c0409fff>] ? kernel_thread_helper+0x7/0x10
Aug 24 09:54:11 alexandria kernel: INFO: task nfsd:6721 blocked for more than 120 seconds.
Aug 24 09:54:11 alexandria kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th
is message.
Aug 24 09:54:11 alexandria kernel: nfsd          D eb605c8c     0  6721      2 0x00000080
Aug 24 09:54:11 alexandria kernel: edcc0000 00000046 00000002 eb605c8c d0104024 00000000 00000000
 00000000
Aug 24 09:54:11 alexandria kernel: 00000000 c15ccd00 00022968 578bb23e 00022968 c0b25680 c0b25680
 edcc02a8
Aug 24 09:54:11 alexandria kernel: c0b25680 c0b21024 c0b25680 edcc02a8 24400f9b 00000000 00000000
 edcc0000
Aug 24 09:54:11 alexandria kernel: Call Trace:
Aug 24 09:54:11 alexandria kernel: [<c083da38>] ? __mutex_lock_slowpath+0xd8/0x140
Aug 24 09:54:11 alexandria kernel: [<c083d93d>] ? mutex_lock+0x1d/0x40
Aug 24 09:54:11 alexandria kernel: [<c04e2eb9>] ? generic_file_aio_write+0x49/0xc0
Aug 24 09:54:11 alexandria kernel: [<c0544a54>] ? iget_locked+0x34/0x130
Aug 24 09:54:11 alexandria kernel: [<c04e2e70>] ? generic_file_aio_write+0x0/0xc0
Aug 24 09:54:11 alexandria kernel: [<c052d17e>] ? do_sync_readv_writev+0xce/0x110
Aug 24 09:54:11 alexandria kernel: [<f9b081a0>] ? nfsd_acceptable+0x0/0xf0 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c05b13b7>] ? selinux_file_permission+0xe7/0x130
Aug 24 09:54:11 alexandria kernel: [<c05a663c>] ? security_file_permission+0xc/0x10
Aug 24 09:54:11 alexandria kernel: [<c052d466>] ? rw_verify_area+0x66/0xe0
Aug 24 09:54:11 alexandria kernel: [<c052d835>] ? rw_copy_check_uvector+0x85/0x100
Aug 24 09:54:11 alexandria kernel: [<c052e1b6>] ? do_readv_writev+0xa6/0x1b0
Aug 24 09:54:11 alexandria kernel: [<c04e2e70>] ? generic_file_aio_write+0x0/0xc0
Aug 24 09:54:11 alexandria kernel: [<c04fdb7b>] ? kmemdup+0x1b/0x40
Aug 24 09:54:11 alexandria kernel: [<c05a680c>] ? security_task_setgroups+0xc/0x10
Aug 24 09:54:11 alexandria kernel: [<c047ec14>] ? set_groups+0x14/0x180
Aug 24 09:54:11 alexandria kernel: [<c052e2fe>] ? vfs_writev+0x3e/0x50
Aug 24 09:54:11 alexandria kernel: [<f9b09b64>] ? nfsd_vfs_write+0xb4/0x390 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b082de>] ? nfsd_setuser_and_check_port+0x4e/0x90 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c04fdb8a>] ? kmemdup+0x2a/0x40
Aug 24 09:54:11 alexandria kernel: [<f9b2158c>] ? renew_client+0x5c/0xb0 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b0b958>] ? nfsd_write+0x98/0x110 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15487>] ? nfsd4_write+0xf7/0x120 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15b67>] ? nfsd4_proc_compound+0x357/0x410 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b1b8bf>] ? nfs4svc_decode_compoundargs+0x26f/0x340 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15390>] ? nfsd4_write+0x0/0x120 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b05341>] ? nfsd_dispatch+0xd1/0x200 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f992a60c>] ? svc_process_common+0x2ec/0x5a0 [sunrpc]
Aug 24 09:54:11 alexandria kernel: [<f9937179>] ? svc_recv+0x3b9/0x780 [sunrpc]
Aug 24 09:54:11 alexandria kernel: [<f9b0598c>] ? nfsd+0xac/0x140 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b058e0>] ? nfsd+0x0/0x140 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c04767d4>] ? kthread+0x74/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476760>] ? kthread+0x0/0x80
Aug 24 09:54:11 alexandria kernel: [<c0409fff>] ? kernel_thread_helper+0x7/0x10

There don't appear to be any other error messages.

The most problematic mount appears to be nfs4 mount of a locally exported filesystem, /data/cora3:

/dev/md5 on /export/cora3 type ext3 (rw,noatime,usrquota,acl)
alexandriag:/cora3 on /data/cora3 type nfs4 (rw,noatime,intr,rsize=32768,wsize=32768,sloppy,addr=192.168.1.13,clientaddr=192.168.1.13)

We run a cron process that walks the following mounts:

alexandriag:/cora3    6.8T  6.5T  317G  96% /data/cora3
alexandria2g:/cora1   9.0T  6.5T  2.5T  73% /data/cora1
alexandriag:/cora2    9.0T  4.2T  4.8T  47% /data/cora2
alexandria2g:/cora6   6.8T  4.9T  1.9T  73% /data/cora6
alexandriag:/cora5    8.4T  5.5T  2.9T  66% /data/cora5


Version-Release number of selected component (if applicable):
2.6.32-279.5.1.el6.i686
2.6.32-279.2.1.el6.i686

How reproducible:
Twice now

This last time I managed a reboot by unmounting some filesystems with umount -l.  It also appears that at some point / was remounted read-only, but no indication why/when.
Comment 2 Orion Poplawski 2012-08-24 13:29:20 EDT
I rebooted after this just happened and it happened again immediately.  This time it appears to be under heavy external load.  I've booted back to 2.6.32-220.23.1.el6.i686 for now to see if that helps.
Comment 3 Ric Wheeler 2012-08-24 15:05:17 EDT
Hi Orion,

Can you please work with Red Hat GSS and open a ticket for this? They are essential to the way we work these issues.

thanks!
Comment 4 Orion Poplawski 2012-08-24 16:24:36 EDT
Sorry, no support subscription.  I guess more of a put it on your radar kind of thing.
Comment 5 Orion Poplawski 2012-09-24 17:25:45 EDT
After further experience, this appears to be triggered by nfs client activity.  Running the -220 series kernels prevents it.
Comment 6 RHEL Product and Program Management 2012-12-14 03:14:43 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 10 Jeff Layton 2013-12-20 07:33:55 EST
Orion, is this still a problem on more current RHEL6 kernels (something 6.5-ish?).
Comment 11 Orion Poplawski 2013-12-20 10:55:20 EST
I do not seem to be seeing this behavior any more, feel free to close, thanks.
Comment 12 Jeff Layton 2013-12-20 11:33:11 EST
Thanks!

Note You need to log in before you can comment on or make changes to this bug.