633540 – issues when attempting nfsv4 recovery with running client I/O

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 633540 - issues when attempting nfsv4 recovery with running client I/O

Summary: issues when attempting nfsv4 recovery with running client I/O

Keywords:
Status:	CLOSED DUPLICATE of bug 663448
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	6.1
Assignee:	J. Bruce Fields
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	600152 (view as bug list)
Depends On:
Blocks:	595547
TreeView+	depends on / blocked

Reported:	2010-09-13 22:09 UTC by Corey Marthaler
Modified:	2011-04-15 16:38 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-04-15 16:38:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dmesg from grant-01 (64.27 KB, text/plain) 2010-09-13 22:12 UTC, Corey Marthaler	no flags	Details
dmesg from grant-02 (70.04 KB, text/plain) 2010-09-13 22:17 UTC, Corey Marthaler	no flags	Details
dmesg from grant-03 (76.85 KB, text/plain) 2010-09-13 22:17 UTC, Corey Marthaler	no flags	Details
backport of upstream open-file-handling patches from 2.6.36 (31.73 KB, patch) 2010-09-14 23:06 UTC, J. Bruce Fields	no flags	Details \| Diff
View All

Description Corey Marthaler 2010-09-13 22:09:31 UTC

Description of problem:
I'm seeing errors/call traces while running the test derringer. This test starts an I/O load on an nfs client, and then relocates the HA nfs service around the cluster.

NFSD: Using /mnt/grant0/.clumanager/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
IMA: unmeasured files on fsmagic: EF53
ima_dec_counts: open/free imbalance (r:1 w:-1 o:0)
Pid: 2671, comm: nfsd Not tainted 2.6.32-70.el6.x86_64 #1
Call Trace:
 [<ffffffff812227e8>] ima_file_free+0x1a8/0x260
 [<ffffffff8116e105>] __fput+0x105/0x210
 [<ffffffff8116e235>] fput+0x25/0x30
 [<ffffffffa050274e>] nfsd_close+0xe/0x10 [nfsd]
 [<ffffffffa051e02a>] release_open_stateid+0x4a/0x60 [nfsd]
 [<ffffffffa051e089>] release_openowner+0x49/0x70 [nfsd]
 [<ffffffffa051f629>] expire_client+0xf9/0x230 [nfsd]
 [<ffffffffa051f80c>] nfs4_state_shutdown+0xac/0x1c0 [nfsd]
 [<ffffffffa0370040>] ? svc_close_xprt+0x40/0x50 [sunrpc]
 [<ffffffffa04fd5ad>] nfsd_shutdown+0x1d/0x40 [nfsd]
 [<ffffffffa04fdc39>] nfsd_last_thread+0x19/0x30 [nfsd]
 [<ffffffffa0361fc1>] svc_destroy+0x61/0x180 [sunrpc]
 [<ffffffffa0362186>] svc_exit_thread+0xa6/0xb0 [sunrpc]
 [<ffffffffa04fdbd1>] nfsd+0x111/0x160 [nfsd]
 [<ffffffffa04fdac0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff81091936>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff810918a0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
nfsd: last server has exited, flushing export cache
iint_free: readcount: 1
iint_free: writecount: -1


On the NFS client:
Sep 13 16:29:11 taft-02 kernel: nfs4_reclaim_open_state: Lock reclaim failed!
Sep 13 16:29:11 taft-02 kernel: nfs4_reclaim_open_state: Lock reclaim failed!
Sep 13 16:29:11 taft-02 kernel: nfs4_reclaim_open_state: Lock reclaim failed!
Sep 13 16:29:11 taft-02 kernel: nfs4_reclaim_open_state: Lock reclaim failed!


I'll attach the dmegs from each of the servers in the cluster.


Version-Release number of selected component (if applicable):
Linux grant-01 2.6.32-70.el6.x86_64 #1 SMP Wed Aug 25 10:17:53 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Corey Marthaler 2010-09-13 22:12:44 UTC

Created attachment 447070 [details]
dmesg from grant-01

Comment 2 Corey Marthaler 2010-09-13 22:17:14 UTC

Created attachment 447073 [details]
dmesg from grant-02

Comment 3 Corey Marthaler 2010-09-13 22:17:59 UTC

Created attachment 447074 [details]
dmesg from grant-03

Comment 4 J. Bruce Fields 2010-09-14 23:05:23 UTC

I wonder whether fs/nfsd/nfs4state.c:nfs4_upgrade_open() needs a call into the ima code?

In upstream I removed the need for nfs4_upgrade_open entirely; backported patch attached (on rhel6 compile-tested only).

Comment 5 J. Bruce Fields 2010-09-14 23:06:55 UTC

Created attachment 447353 [details]
backport of upstream open-file-handling patches from 2.6.36

Comment 6 Eric Paris 2010-09-15 18:56:49 UTC

This indicates that from a struct file was created (everything in the kernel uses alloc_file these days right?) the fmode was set RO but the struct file was later switch to be writable.

I'm not sure that is valid behaviour, is it?

Comment 7 J. Bruce Fields 2010-09-15 19:26:58 UTC

It's not really valid, but the nfsv4 server code has gotten away with it for a long time (take a look at fs/nfsd/nfs4state.c:nfs4_upgrade_open()).

I recently fixed upstream so it doesn't do that any more.  So (if that's the cause of this problem) our choices are: (1) add whatever ima hook we need to nfs4_upgrade_open() to fix up the accounting (you'd probably know better than me what that would be), or (2) backport those upstream changes.

I'd actually prefer (2), though it's more intrusive, as those changes fix some other problems as well, and as we already seem to be trying to follow upstream pretty aggressively to facilitate backporting some new features during rhel6.

Comment 8 Corey Marthaler 2010-09-21 16:12:51 UTC

This is fairly easy to reproduce with our tests. Bumping the priority...

Comment 9 J. Bruce Fields 2010-09-21 16:17:54 UTC

Would it be easy for you to run your tests on a kernel with the attached backport applied?

Comment 10 J. Bruce Fields 2010-10-12 14:38:51 UTC

Just to get an idea of the severity: how bad are the consequences of an ima imbalance (beyond just getting the logs spammed)?

Comment 11 Eric Paris 2010-10-12 15:08:56 UTC

The spam is mostly harmless.  It means that IMA isn't correct over NFS, but I doubt anyone is going to care about that since I don't know anyone actually using IMA functionality ATM, and even if they did, it's unlikely to be interested in files exported via NFS to other machines (It's typically only interested in system files)

Comment 12 J. Bruce Fields 2010-10-12 19:29:20 UTC

*** Bug 600152 has been marked as a duplicate of this bug. ***

Comment 13 J. Bruce Fields 2010-10-12 19:49:07 UTC

As a workaround is there a simple way just to keep the warnings out of the logs?

Comment 14 Ric Wheeler 2011-01-17 20:13:54 UTC

Eric, is this something that we can do (https://bugzilla.redhat.com/show_bug.cgi?id=633540#c13)?

Should we just close this?

Comment 15 Eric Paris 2011-01-17 20:38:25 UTC

No, there is going to be one set of spam per boot and we can't get rid of it other than fixing NFS....

Comment 16 J. Bruce Fields 2011-01-17 20:43:31 UTC

6.1 kernels now include revisions of the attached patches, so these messages should be gone; reopen if not.

Comment 19 RHEL Program Management 2011-01-17 23:22:29 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 21 Corey Marthaler 2011-02-04 23:29:24 UTC

Reopening after seeing the following issues on the nfs client.

[root@flea-10 ~]# uname -ar
Linux flea-10 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Slow work thread pool: Starting up
Slow work thread pool: Ready
FS-Cache: Loaded
FS-Cache: Netfs 'nfs' registered for caching


nfs4_reclaim_open_state: Lock reclaim failed!
nfs4_reclaim_open_state: Lock reclaim failed!
INFO: task xdoio:1734 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xdoio         D 0000000000000002     0  1734   1723 0x00000080
 ffff88003c861a88 0000000000000086 ffff8800ffffffff 00000aa0fed4a3ce
 ffff88003c861a18 ffff88003c86cf30 0000000000029b78 000000010041a2be
 ffff88003cdbd0a8 ffff88003c861fd8 0000000000010518 ffff88003cdbd0a8
Call Trace:
 [<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8110c060>] ? sync_page+0x0/0x50
 [<ffffffff814c8a23>] io_schedule+0x73/0xc0
 [<ffffffff8110c09d>] sync_page+0x3d/0x50
 [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
 [<ffffffff8110c253>] wait_on_page_bit+0x73/0x80
 [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa02c7e64>] nfs_wb_page+0xb4/0xd0 [nfs]
 [<ffffffffa02b68de>] nfs_launder_page+0x4e/0x80 [nfs]
 [<ffffffff811233c6>] invalidate_inode_pages2_range+0x2f6/0x380
 [<ffffffff81123467>] invalidate_inode_pages2+0x17/0x20
 [<ffffffffa02baa48>] nfs_revalidate_mapping+0x128/0x170 [nfs]
 [<ffffffffa02b7897>] nfs_file_read+0x77/0x130 [nfs]
 [<ffffffff8116c65a>] do_sync_read+0xfa/0x140
 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81167349>] ? do_huge_pmd_anonymous_page+0xc9/0x360
 [<ffffffff8120bf4b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20
 [<ffffffff8116d085>] vfs_read+0xb5/0x1a0
 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff8116d1c1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
INFO: task xdoio:1734 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xdoio         D 0000000000000002     0  1734   1723 0x00000080
 ffff88003c861a88 0000000000000086 ffff8800ffffffff 00000aa0fed4a3ce
 ffff88003c861a18 ffff88003c86cf30 0000000000029b78 000000010041a2be
 ffff88003cdbd0a8 ffff88003c861fd8 0000000000010518 ffff88003cdbd0a8
Call Trace:
 [<ffffffff8109b9a9>] ? ktime_get_ts+0xa9/0xe0
 [<ffffffff8110c060>] ? sync_page+0x0/0x50
 [<ffffffff814c8a23>] io_schedule+0x73/0xc0
 [<ffffffff8110c09d>] sync_page+0x3d/0x50
 [<ffffffff814c929f>] __wait_on_bit+0x5f/0x90
 [<ffffffff8110c253>] wait_on_page_bit+0x73/0x80
 [<ffffffff81091ce0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa02c7e64>] nfs_wb_page+0xb4/0xd0 [nfs]
 [<ffffffffa02b68de>] nfs_launder_page+0x4e/0x80 [nfs]
 [<ffffffff811233c6>] invalidate_inode_pages2_range+0x2f6/0x380
 [<ffffffff81123467>] invalidate_inode_pages2+0x17/0x20
 [<ffffffffa02baa48>] nfs_revalidate_mapping+0x128/0x170 [nfs]
 [<ffffffffa02b7897>] nfs_file_read+0x77/0x130 [nfs]
 [<ffffffff8116c65a>] do_sync_read+0xfa/0x140
 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81167349>] ? do_huge_pmd_anonymous_page+0xc9/0x360
 [<ffffffff8120bf4b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20
 [<ffffffff8116d085>] vfs_read+0xb5/0x1a0
 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff8116d1c1>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b

Comment 22 Corey Marthaler 2011-04-14 23:06:02 UTC

Still seeing issues on multiple cluster during nfsv4 recovery.

Apr 14 16:52:24 grant-03 rgmanager[2237]: Service service:nfs1 started
------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:390!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:08.0/0000:01:00.0/net/eth0/broadcast
CPU 0
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ext3 jbd dlm configfs sunrpc c]

Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ext3 jbd dlm configfs sunrpc c]
Pid: 3555, comm: nfsd Not tainted 2.6.32-130.el6.x86_64 #1 PowerEdge SC1435
RIP: 0010:[<ffffffffa04d32b5>]  [<ffffffffa04d32b5>] free_generic_stateid+0x35/0xb0 [nfsd]
RSP: 0018:ffff8801198d9b00  EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff880119f85f38 RCX: ffff8801198d9ae8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801198d9b0c
RBP: ffff8801198d9b20 R08: ffff880119f85f58 R09: 0000000000000000
R10: 0000000000000010 R11: 0000000000000000 R12: ffff8801198d7dd8
R13: ffff8801198d7e10 R14: ffff8801198d7dd8 R15: ffff880119f85740
FS:  00007f6ce4131700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f6ce413c000 CR3: 0000000218e85000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 3555, threadinfo ffff8801198d8000, task ffff88011a580a80)
Stack:
 ffff8801198d9b20 00000000a04cee88 ffff880119f85f38 ffff8801198d7dd8
<0> ffff8801198d9b50 ffffffffa04d3389 0000000000000000 ffff88021a1591a0
<0> 000000001d270000 ffff88021a15a040 ffff8801198d9d80 ffffffffa04d3a5d
Call Trace:
 [<ffffffffa04d3389>] release_lockowner+0x59/0xb0 [nfsd]
 [<ffffffffa04d3a5d>] nfsd4_lock+0x4cd/0x7e0 [nfsd]
 [<ffffffffa04bda06>] ? nfsd_setuser+0x126/0x2c0 [nfsd]
 [<ffffffffa04b5852>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
 [<ffffffffa04b5a07>] ? fh_verify+0x167/0x650 [nfsd]
 [<ffffffffa04c4f01>] nfsd4_proc_compound+0x3d1/0x490 [nfsd]
 [<ffffffffa04b243e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa03bf4d4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105d710>] ? default_wake_function+0x0/0x20
 [<ffffffffa03bfb10>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa04b2b62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa04b2aa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff8108de16>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 10 0f 1f 44 00 00 48 8b 77 60 48 89 fb 48 8d 7d ec e8 b0 c1 ff ff 8b 45 ec 83 e0 03
RIP  [<ffffffffa04d32b5>] free_generic_stateid+0x35/0xb0 [nfsd]
 RSP <ffff8801198d9b00>
---[ end trace 536ae40f35a0eb0d ]---
Apr 14 16:52:27 Kernel panic - not syncing: Fatal exception
grant-03 kernel:Pid: 3555, comm: nfsd Tainted: G      D    ----------------   2.6.32-130.1
 ------------[ cCall Trace:
ut here ]------- [<ffffffff814da981>] ? panic+0x78/0x143
-----
 [<ffffffff814de9c4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f2fb>] ? die+0x5b/0x90
 [<ffffffff814de294>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffffa04d32b5>] ? free_generic_stateid+0x35/0xb0 [nfsd]

Comment 23 J. Bruce Fields 2011-04-15 02:19:34 UTC

Bugs reported in comments 21 and 22 are very different from the original bug, and from each other (#21 looks like a client bug, and #22 like a server bug).  Please open new bugs for those rather than reopening this one.

*** This bug has been marked as a duplicate of bug 663448 ***

Comment 24 Corey Marthaler 2011-04-15 16:36:56 UTC

FYI - I opened 697029 and 697032 for the issues listed in comments #21 and #22.

Comment 25 Corey Marthaler 2011-04-15 16:38:51 UTC

dang bugzilla, I didn't mean to re-open this.

*** This bug has been marked as a duplicate of bug 663448 ***

Note You need to log in before you can comment on or make changes to this bug.