RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 851673 - [regression] nfs client activity leads to hung tasks
Summary: [regression] nfs client activity leads to hung tasks
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.3
Hardware: All
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-24 16:36 UTC by Orion Poplawski
Modified: 2014-06-18 07:42 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-20 16:33:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Orion Poplawski 2012-08-24 16:36:15 UTC
Description of problem:

Twice now since updating one of our file servers to EL6.3 we've had it have hung kjournald and nfsd processes.  Also seen once on another (64-bit) EL6.3 server.

Aug 24 09:54:11 alexandria kernel: INFO: task kjournald:1585 blocked for more than 120 seconds.
Aug 24 09:54:11 alexandria kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th
is message.
Aug 24 09:54:11 alexandria kernel: kjournald     D 00000086     0  1585      2 0x00000000
Aug 24 09:54:11 alexandria kernel: efcde550 00000046 d0008680 00000086 c044e285 00000000 00000000
 000065eb
Aug 24 09:54:11 alexandria kernel: 00000000 c15ccd00 00022968 ca13836f 00022968 c0b25680 c0b25680
 efcde7f8
Aug 24 09:54:11 alexandria kernel: c0b25680 c0b21024 c0b25680 efcde7f8 f7106000 ef88e2ec 00000003
 00000001
Aug 24 09:54:11 alexandria kernel: Call Trace:
Aug 24 09:54:11 alexandria kernel: [<c044e285>] ? try_to_wake_up+0x205/0x3a0
Aug 24 09:54:11 alexandria kernel: [<c0440372>] ? __wake_up+0x42/0x60
Aug 24 09:54:11 alexandria kernel: [<c04811e0>] ? ktime_get_ts+0xd0/0x100
Aug 24 09:54:11 alexandria kernel: [<c083ce39>] ? io_schedule+0x59/0xa0
Aug 24 09:54:11 alexandria kernel: [<c05581f0>] ? sync_buffer+0x30/0x40
Aug 24 09:54:11 alexandria kernel: [<c083d625>] ? __wait_on_bit+0x45/0x70
Aug 24 09:54:11 alexandria kernel: [<c05581c0>] ? sync_buffer+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c05581c0>] ? sync_buffer+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c083d6b8>] ? out_of_line_wait_on_bit+0x68/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476a50>] ? wake_bit_function+0x0/0x60
Aug 24 09:54:11 alexandria kernel: [<c05581ae>] ? __wait_on_buffer+0x1e/0x30
Aug 24 09:54:11 alexandria kernel: [<f929f89f>] ? journal_commit_transaction+0x8cf/0x1150 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c0466137>] ? lock_timer_base+0x27/0x50
Aug 24 09:54:11 alexandria kernel: [<c0466b32>] ? try_to_del_timer_sync+0x62/0xb0
Aug 24 09:54:11 alexandria kernel: [<f92a49ad>] ? kjournald+0xad/0x1f0 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<f92a4900>] ? kjournald+0x0/0x1f0 [jbd]
Aug 24 09:54:11 alexandria kernel: [<c04767d4>] ? kthread+0x74/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476760>] ? kthread+0x0/0x80
Aug 24 09:54:11 alexandria kernel: [<c0409fff>] ? kernel_thread_helper+0x7/0x10
Aug 24 09:54:11 alexandria kernel: INFO: task nfsd:6721 blocked for more than 120 seconds.
Aug 24 09:54:11 alexandria kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th
is message.
Aug 24 09:54:11 alexandria kernel: nfsd          D eb605c8c     0  6721      2 0x00000080
Aug 24 09:54:11 alexandria kernel: edcc0000 00000046 00000002 eb605c8c d0104024 00000000 00000000
 00000000
Aug 24 09:54:11 alexandria kernel: 00000000 c15ccd00 00022968 578bb23e 00022968 c0b25680 c0b25680
 edcc02a8
Aug 24 09:54:11 alexandria kernel: c0b25680 c0b21024 c0b25680 edcc02a8 24400f9b 00000000 00000000
 edcc0000
Aug 24 09:54:11 alexandria kernel: Call Trace:
Aug 24 09:54:11 alexandria kernel: [<c083da38>] ? __mutex_lock_slowpath+0xd8/0x140
Aug 24 09:54:11 alexandria kernel: [<c083d93d>] ? mutex_lock+0x1d/0x40
Aug 24 09:54:11 alexandria kernel: [<c04e2eb9>] ? generic_file_aio_write+0x49/0xc0
Aug 24 09:54:11 alexandria kernel: [<c0544a54>] ? iget_locked+0x34/0x130
Aug 24 09:54:11 alexandria kernel: [<c04e2e70>] ? generic_file_aio_write+0x0/0xc0
Aug 24 09:54:11 alexandria kernel: [<c052d17e>] ? do_sync_readv_writev+0xce/0x110
Aug 24 09:54:11 alexandria kernel: [<f9b081a0>] ? nfsd_acceptable+0x0/0xf0 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c0476a10>] ? autoremove_wake_function+0x0/0x40
Aug 24 09:54:11 alexandria kernel: [<c05b13b7>] ? selinux_file_permission+0xe7/0x130
Aug 24 09:54:11 alexandria kernel: [<c05a663c>] ? security_file_permission+0xc/0x10
Aug 24 09:54:11 alexandria kernel: [<c052d466>] ? rw_verify_area+0x66/0xe0
Aug 24 09:54:11 alexandria kernel: [<c052d835>] ? rw_copy_check_uvector+0x85/0x100
Aug 24 09:54:11 alexandria kernel: [<c052e1b6>] ? do_readv_writev+0xa6/0x1b0
Aug 24 09:54:11 alexandria kernel: [<c04e2e70>] ? generic_file_aio_write+0x0/0xc0
Aug 24 09:54:11 alexandria kernel: [<c04fdb7b>] ? kmemdup+0x1b/0x40
Aug 24 09:54:11 alexandria kernel: [<c05a680c>] ? security_task_setgroups+0xc/0x10
Aug 24 09:54:11 alexandria kernel: [<c047ec14>] ? set_groups+0x14/0x180
Aug 24 09:54:11 alexandria kernel: [<c052e2fe>] ? vfs_writev+0x3e/0x50
Aug 24 09:54:11 alexandria kernel: [<f9b09b64>] ? nfsd_vfs_write+0xb4/0x390 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b082de>] ? nfsd_setuser_and_check_port+0x4e/0x90 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c04fdb8a>] ? kmemdup+0x2a/0x40
Aug 24 09:54:11 alexandria kernel: [<f9b2158c>] ? renew_client+0x5c/0xb0 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b0b958>] ? nfsd_write+0x98/0x110 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15487>] ? nfsd4_write+0xf7/0x120 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15b67>] ? nfsd4_proc_compound+0x357/0x410 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b1b8bf>] ? nfs4svc_decode_compoundargs+0x26f/0x340 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b15390>] ? nfsd4_write+0x0/0x120 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b05341>] ? nfsd_dispatch+0xd1/0x200 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f992a60c>] ? svc_process_common+0x2ec/0x5a0 [sunrpc]
Aug 24 09:54:11 alexandria kernel: [<f9937179>] ? svc_recv+0x3b9/0x780 [sunrpc]
Aug 24 09:54:11 alexandria kernel: [<f9b0598c>] ? nfsd+0xac/0x140 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<f9b058e0>] ? nfsd+0x0/0x140 [nfsd]
Aug 24 09:54:11 alexandria kernel: [<c04767d4>] ? kthread+0x74/0x80
Aug 24 09:54:11 alexandria kernel: [<c0476760>] ? kthread+0x0/0x80
Aug 24 09:54:11 alexandria kernel: [<c0409fff>] ? kernel_thread_helper+0x7/0x10

There don't appear to be any other error messages.

The most problematic mount appears to be nfs4 mount of a locally exported filesystem, /data/cora3:

/dev/md5 on /export/cora3 type ext3 (rw,noatime,usrquota,acl)
alexandriag:/cora3 on /data/cora3 type nfs4 (rw,noatime,intr,rsize=32768,wsize=32768,sloppy,addr=192.168.1.13,clientaddr=192.168.1.13)

We run a cron process that walks the following mounts:

alexandriag:/cora3    6.8T  6.5T  317G  96% /data/cora3
alexandria2g:/cora1   9.0T  6.5T  2.5T  73% /data/cora1
alexandriag:/cora2    9.0T  4.2T  4.8T  47% /data/cora2
alexandria2g:/cora6   6.8T  4.9T  1.9T  73% /data/cora6
alexandriag:/cora5    8.4T  5.5T  2.9T  66% /data/cora5


Version-Release number of selected component (if applicable):
2.6.32-279.5.1.el6.i686
2.6.32-279.2.1.el6.i686

How reproducible:
Twice now

This last time I managed a reboot by unmounting some filesystems with umount -l.  It also appears that at some point / was remounted read-only, but no indication why/when.

Comment 2 Orion Poplawski 2012-08-24 17:29:20 UTC
I rebooted after this just happened and it happened again immediately.  This time it appears to be under heavy external load.  I've booted back to 2.6.32-220.23.1.el6.i686 for now to see if that helps.

Comment 3 Ric Wheeler 2012-08-24 19:05:17 UTC
Hi Orion,

Can you please work with Red Hat GSS and open a ticket for this? They are essential to the way we work these issues.

thanks!

Comment 4 Orion Poplawski 2012-08-24 20:24:36 UTC
Sorry, no support subscription.  I guess more of a put it on your radar kind of thing.

Comment 5 Orion Poplawski 2012-09-24 21:25:45 UTC
After further experience, this appears to be triggered by nfs client activity.  Running the -220 series kernels prevents it.

Comment 6 RHEL Program Management 2012-12-14 08:14:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 10 Jeff Layton 2013-12-20 12:33:55 UTC
Orion, is this still a problem on more current RHEL6 kernels (something 6.5-ish?).

Comment 11 Orion Poplawski 2013-12-20 15:55:20 UTC
I do not seem to be seeing this behavior any more, feel free to close, thanks.

Comment 12 Jeff Layton 2013-12-20 16:33:11 UTC
Thanks!


Note You need to log in before you can comment on or make changes to this bug.