|Summary:||RHEL6.6: NFS client has kernel panic after seeing 'VFS: Busy inodes after unmount ... Self-destruct in 5 seconds. Have a nice day'|
|Product:||Red Hat Enterprise Linux 6||Reporter:||Dave Wysochanski <dwysocha>|
|Component:||kernel||Assignee:||Benjamin Coddington <bcodding>|
|kernel sub component:||NFS||QA Contact:||Yongcheng Yang <yoyang>|
|Status:||CLOSED ERRATA||Docs Contact:||Jana Heves <jsvarova>|
|Priority:||urgent||CC:||bcodding, dhoward, djeffery, eguan, fsorenso, hbarcomb, ikent, jbainbri, jiyin, jsvarova, nmurray, pbokoc, plambri, rkothiya, salmy, smayhew, swhiteho, yoyang|
|Fixed In Version:||kernel-2.6.32-582.el6||Doc Type:||Bug Fix|
Loop checks added to VFS to prevent kernel crashes The NFS client was previously failing to detect a directory loop for some NFS server directory structures. This failure could cause NFS inodes to remain referenced after attempting to unmount the file system, leading to a kernel crash. This update adds loop checks to VFS, which effectively prevents this problem from occurring.
|:||1272858 (view as bug list)||Environment:|
|Last Closed:||2016-05-10 23:08:15 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||1172231, 1268411, 1272858|
Description Dave Wysochanski 2015-08-16 20:19:10 UTC
Description of problem: Customer reports multiple machines running 2.6.32-504.3.3.el6 crashing repeatedly with a backtrace similar to the below. There are a few variations of the crash, but all of them are the kswapd doing a shrink_icache_memory and ending up crashing due to: 1. an inode pointer which is allocated and looks valid 2. an inode.i_sb pointer which points at an address in allocated size-NNNN slab (we've seen size-1024 and size-64), but the memory being pointed to is not a valid super_block structure. 3. some field of the super_block structure which is supposed to be a pointer is used / dereferenced (such as s_op->clear_inode), causing the kernel crash Prior to the crash, we see a lot of these: VFS: Busy inodes after unmount of 0:8c. Self-destruct in 5 seconds. Have a nice day... Then the crash comes, with an oops similar to this: BUG: unable to handle kernel paging request at 0000000000a31658 IP: [<ffffffff811ab95e>] clear_inode+0x9e/0x140 PGD 435baa067 PUD 4373d1067 PMD 435bb3067 PTE 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/online CPU 0 Modules linked in: tcp_diag inet_diag nfs lockd fscache auth_rpcgss nfs_acl sunrpc autofs4 vsock(U) ipv6 ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables microcode ppdev vmware_balloon parport_pc parport sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi vmxnet(U) pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 40, comm: kswapd0 Not tainted 2.6.32-504.3.3.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff811ab95e>] [<ffffffff811ab95e>] clear_inode+0x9e/0x140 RSP: 0018:ffff880434f8dbf0 EFLAGS: 00010246 RAX: 0000000000a315f0 RBX: ffff8802beb79408 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffff88028db29718 RDI: ffff8802beb79408 RBP: ffff880434f8dc00 R08: ffff8802beb79418 R09: fd544e08748abe07 R10: ffff88014a913458 R11: ffff88014a913040 R12: ffff8802beb79540 R13: 0000000000000044 R14: ffffffff81fd5820 R15: ffff8802beb79418 FS: 0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000a31658 CR3: 00000004378eb000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 40, threadinfo ffff880434f8c000, task ffff880434f3aae0) Stack: ffff8802beb79408 ffff880434f8dc60 ffff880434f8dc40 ffffffff811aba40 <d> ffffc90001812000 0000000000000080 ffff880112f1fd38 0000000000000080 <d> ffff880434f8dc60 0000000000000080 ffff880434f8dca0 ffffffff811abd94 Call Trace: [<ffffffff811aba40>] dispose_list+0x40/0x120 [<ffffffff811abd94>] shrink_icache_memory+0x274/0x2e0 [<ffffffff8113d4ba>] shrink_slab+0x12a/0x1a0 [<ffffffff8114082a>] balance_pgdat+0x57a/0x800 [<ffffffff81140be4>] kswapd+0x134/0x3b0 [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81140ab0>] ? kswapd+0x0/0x3b0 [<ffffffff8109e66e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Code: 00 00 b9 02 00 00 00 48 c7 c2 80 ac 1a 81 be 08 00 00 00 e8 25 f3 37 00 48 89 df e8 8d 45 04 00 48 8b 83 08 01 00 00 48 8b 40 38 <48> 8b 40 68 48 85 c0 74 05 48 89 df ff d0 0f b7 83 ae 00 00 00 RIP [<ffffffff811ab95e>] clear_inode+0x9e/0x140 RSP <ffff880434f8dbf0> CR2: 0000000000a31658 Version-Release number of selected component (if applicable): 2.6.32-504.3.3.el6.x86_64 How reproducible: One customer has seen it 4 times on the same machine (twice in May, twice in June), and one time on a second machine (Aug 13). Steps to Reproduce: 1. TBD. All machines which crash are using automount, and it seems the automounter is auto-unmounting NFS shares which have active inodes on them. Actual results: We see a lot of these messages: VFS: Busy inodes after unmount of 0:8c. Self-destruct in 5 seconds. Have a nice day... Then sometime after the messages (may be hours or days), the kernel crashes Expected results: No messages and no kernel crash. Additional info: We have vmcores and have given the customer one round of systemtap, which I can attach here for reference, since you will see some output in one of the vmcores if you look at all of them. I think this looks like some reference counting issue on the super_block structure, i.e. an overput or underget of super_block. Customer claims they were running fine (on RHEL6.5 kernels) until upgrading to the RHEL6.6 kernel series. As of yet I do not have definitive proof of how long they were running fine on RHEL6.5 kernel but I've no reason to disbelieve their claim, and I could find no similar bugs reported in the RHEL6 kernel series. Their so for now I think we should assume this is a regression. Customer is running automount but the /etc/auto.master looks straightforward with just an /etc/auto.direct mounting NFSv3 shares. I'll attach a sosreport and point at all of the vmcores we have so far.
Comment 4 Dave Wysochanski 2015-08-16 20:58:44 UTC
Looking at the commits between RHEL6.5 and RHEL6.6 I wonder if either of these could have possibly introduced a regression: [fs] vfs: fix ref count leak in path_mountpoint() [fs] Fix mountpoint reference leakage in linkat
Comment 9 Dave Wysochanski 2015-08-16 21:39:38 UTC
Created attachment 1063571 [details] bash shell script with systemtap which was used for partial debugging The attached bash + systemtap script produces output like the following as seen in some of the vmcores kernel logs: VFS: Busy inodes after unmount of 0:b4. Self-destruct in 5 seconds. Have a nice day... Mon Jul 6 23:15:18 2015 PDT: automount(2406) => umount(4784) => umount.nfs(4785) (/sbin/umount.nfs /mnt/nfs): invalidate_inodes returning 1 for 'nfs' super_block 0xffff880434d7f000 (0:b3)
Comment 10 Ian Kent 2015-08-17 01:46:35 UTC
(In reply to Dave Wysochanski from comment #4) > Looking at the commits between RHEL6.5 and RHEL6.6 > > I wonder if either of these could have possibly introduced a regression: > [fs] vfs: fix ref count leak in path_mountpoint() Maybe not this one specifically. It applies only when the last component of the umount path is a symlink and that doesn't appear to be the case here. That doesn't mean a corner case for the associated series has been missed due to differences between upstream and our kernel. It's worth considering but there was quite a bit of testing surrounding the series. > [fs] Fix mountpoint reference leakage in linkat This looks like a straight forward bug fix. The get and put on the path, old_path, are entirely local so it looks like the path_put() was in fact missing on ESTALE error.
Comment 21 Dave Wysochanski 2015-08-17 22:24:02 UTC
Created attachment 1064086 [details] Analysis of code paths which touch super_block.s_active and why we want to do this
Comment 22 Dave Wysochanski 2015-08-17 22:25:12 UTC
Created attachment 1064087 [details] WIP systemtap script to trace super_block.s_active - bulk still needs filled in and this may not work - may need a kernel patch
Comment 23 Ian Kent 2015-08-18 03:26:50 UTC
Looking through the vfs related changes from rhel-6.5 to 6.6 there aren't many at all. The most significant change was the addition of the path_mountpoint() series which is meant to avoid revalidation of the last path component during path resolution. This was mostly needed to prevent hangs on NFS umount in cases where the server is unresponsive (I believe). But it was also required for a similar hang with autofs direct mounts at startup. So we need to confirm that the problem hasn't been seen on kernel revisions prior to revision 451 (ie. on rhel-6.5 kernels, rev 431). If we can confirm that then we can focus instrumentation in the functions that patch series adds or changes and there are only a few. As far as s_active goes I think that's a symptom and not a cause because the code where the VFS message is issued is invoked when the mount reference count goes to 0 to cleanup the in memory mount tree and discard the super block. If the mount reference count isn't right then all sorts of bad things happen on the final mntput(). So it's probably better to assume this is a mount reference count problem. In rhel-6 the struct vfsmount still holds the mount reference count so that's where we should look for reference count accounting problems and specifically on entry and exit of the functions I mentioned above. The difficult part will be working out what the counts should be (there are so many special cases) so we know a problem when we see it. I'll have a look at the functions and see if I can spot anything and try and work out what we should expect. But without a reproducer we have an uphill battle since the customer will get sick of testing very quickly. Ian
Comment 24 Ian Kent 2015-08-18 07:24:19 UTC
(In reply to Ian Kent from comment #23) > Looking through the vfs related changes from rhel-6.5 to > 6.6 there aren't many at all. snip ... > > I'll have a look at the functions and see if I can spot > anything and try and work out what we should expect. After looking through the path_mountpoint() change I can't see a problem with the mount ref count. It occurs to me that autofs wouldn't be selecting these mounts to expire unless the mount ref count indicated they weren't in use so the problem likely occurred before the path_mountpoint() code was called. The autofs expire code hasn't changed for a long long time so I doubt the problem lies there. There are a couple of calls that lead to path_mountpoint() due to autofs ioctls but an initial check indicates they are ok too. I'll work through this again but I'm not hopeful. Then it will be back to the changelog. Ian
Comment 25 Ian Kent 2015-08-18 10:26:16 UTC
(In reply to Ian Kent from comment #24) > > After looking through the path_mountpoint() change I can't > see a problem with the mount ref count. > > It occurs to me that autofs wouldn't be selecting these > mounts to expire unless the mount ref count indicated they > weren't in use so the problem likely occurred before the > path_mountpoint() code was called. > > The autofs expire code hasn't changed for a long long time > so I doubt the problem lies there. Actually that's not right. There was a change that went into rhel-6.6 for expiry of symlinks. If there's a symlink present in an autofs file system only the last access time can be checked and when it expires it amounts to an unlink() done by the daemon, not a umount. I don't see how that could be a problem here but a debug log and some knowledge of what's in the filesystem should tell us if confusion has crept in somewhere. Ian
Comment 28 Frank Sorenson 2015-08-18 16:34:47 UTC
Created attachment 1064376 [details] verbose automount debugging Here is verbose automount debugging, beginning with the first mount of a filesystem that will result in the 'Busy inodes', until all the filesystems that produce the 'Busy inodes' are unmounted. This is just one time through, however each cycle of mount/unmount for these 6 particular /users/datadomain mounts looks about the same.
Comment 30 Ian Kent 2015-08-19 02:01:46 UTC
(In reply to Frank Sorenson from comment #28) > Created attachment 1064376 [details] > verbose automount debugging > > Here is verbose automount debugging, beginning with the first mount of a > filesystem that will result in the 'Busy inodes', until all the filesystems > that produce the 'Busy inodes' are unmounted. > > This is just one time through, however each cycle of mount/unmount for these > 6 particular /users/datadomain mounts looks about the same. When you post debug logs you should post the entire log not just fragments and give any time stamps of likely problem activity, if possible. It may be harder for me to work those logs but often something has happened long before the problem is actually seen. Ian
Comment 63 Dave Wysochanski 2015-08-31 13:56:58 UTC
Created attachment 1068648 [details] case 01441432 - find plus tcpdump
Comment 65 Benjamin Coddington 2015-08-31 14:17:59 UTC
I can reproduce on RHEL6 2.6.32-504.el6.x86_64: [root@rhel6 ~]# cd /exports [root@rhel6 exports]# mkdir exports_loop [root@rhel6 exports]# mount -o bind ../exports exports_loop [root@rhel6 exports]# cd [root@rhel6 ~]# /etc/init.d/nfs start Starting RPC svcgssd: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] [root@rhel6 ~]# exportfs -ua [root@rhel6 ~]# exportfs -o crossmnt *:/exports [root@rhel6 ~]# mount rhel6:/exports /manmnt/rhel6 [root@rhel6 ~]# stat /manmnt/rhel6/exports_loop/. File: `/manmnt/rhel6/exports_loop/.' Size: 4096 Blocks: 8 IO Block: 262144 directory Device: 18h/24d Inode: 275493 Links: 6 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-08-31 10:08:20.883717287 -0400 Modify: 2015-08-31 10:08:14.451617433 -0400 Change: 2015-08-31 10:08:14.451617433 -0400 [root@rhel6 ~]# umount /manmnt/rhel6 [root@rhel6 ~]# tail -1 /var/log/messages Aug 31 10:11:13 rhel6 kernel: VFS: Busy inodes after unmount of 0:18. Self-destruct in 5 seconds. Have a nice day... I think it's safe to rule out autofs at this point. Upstream fedora server seems not to create this problem.
Comment 66 Dave Wysochanski 2015-08-31 14:53:17 UTC
Nice work Ben! I tried your simple reproducer on a few NFS client versions. Here are the results. NOTE that "FAIL" means we were able to mount the share, and after umount I saw "VFS: Busy inodes after unmount of 0:17. Self-destruct in 5 seconds." in /var/log/messages RHEL5u11 (2.6.18-402.el5): PASS - mount fails with "mount.nfs: Stale NFS file handle" RHEL6u0 (2.6.32-71.40.1.el6): FAIL RHEL6u1 (2.6.32-131.4.1.el6): FAIL * NOTE: Interestingly the 'find' returned a lot of these Aug 31 09:27:18 rhel6u1-node1 kernel: fsid 0:16: expected fileid 0x54fe3, got 0x409d6 Aug 31 09:27:18 rhel6u1-node1 kernel: NFS: server 192.168.122.49 error: fileid changed RHEL6u2 (2.6.32-220.2.1.el6): FAIL RHEL6u3 (2.6.32-279.11.1.el6): FAIL RHEL6u4 (2.6.32-358.el6): FAIL RHEL6u5 (2.6.32-431.32.1.el6): FAIL RHEL7u0 (3.10.0-123.el7): PASS - mount fails with "Too many levels of symbolic links" Assuming Ben's reproducer is the same as the customer's problem, I think we can remove Regression since it doesn't look like RHEL6 has always had this bug.
Comment 67 Dave Wysochanski 2015-08-31 14:57:30 UTC
One small correction - all my tests used the default NFS version for the release. So RHEL6-RHEL7 it was an NFSv4 mount, and RHEL5 was NFSv3. I tried RHEL5 NFSv4 mount and I could mount it, but I don't get the "busy inodes after umount" message.
Comment 69 Yongcheng Yang 2015-09-01 02:09:51 UTC
Generate reproduce case as Comment 65. Thanks Ben!
Comment 86 Frank Sorenson 2015-09-03 16:17:00 UTC
I'm seeing some strange differences in behavior, depending on what exactly I'm exporting and mounting: nfs server side: # mkdir -p /loop_test/dir_loop /exports/loop_test/dir_loop # mount -o bind /loop_test /loop_test/dir_loop # mount -o bind /exports/loop_test /exports/loop_test/dir_loop # echo -e "/loop_test\t*(rw,crossmnt)" >> /etc/exports # echo -e "/exports/loop_test\t*(rw,crossmnt)" >> /etc/exports # exportfs -arv exporting *:/exports/loop_test exporting *:/loop_test # find /loop_test /exports/loop_test /loop_test find: File system loop detected; `/loop_test/dir_loop' is part of the same file system loop as `/loop_test'. /exports/loop_test find: File system loop detected; `/exports/loop_test/dir_loop' is part of the same file system loop as `/exports/loop_test'. nfs client side: # mkdir -p /mnt/loop_test # mount vm17:/loop_test /mnt/loop_test/ # find /mnt/loop_test/ /mnt/loop_test/ find: File system loop detected; `/mnt/loop_test/dir_loop' is part of the same file system loop as `/mnt/loop_test/'. # umount /mnt/loop_test # dmesg -c [47853.323277] VFS: Busy inodes after unmount of 0:19. Self-destruct in 5 seconds. Have a nice day... # mount vm17:/exports/loop_test /mnt/loop_test/ # find /mnt/loop_test /mnt/loop_test find: `/mnt/loop_test/dir_loop': Too many levels of symbolic links # umount /mnt/loop_test/ # dmesg -c So: * when mounting server:/export, we get the 'File system loop detected', followed by the 'Busy inodes after unmount' * when mounting server:/directory/export, we get the 'Too many levels of symbolic links' message instead, and no 'Busy inodes after unmount'
Comment 92 Frank Sorenson 2015-09-09 16:53:13 UTC
*** Bug 1236695 has been marked as a duplicate of this bug. ***
Comment 97 Aristeu Rozanski 2015-10-20 19:09:23 UTC
Patch(es) available on kernel-2.6.32-582.el6
Comment 107 errata-xmlrpc 2016-05-10 23:08:15 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0855.html