We have a report of a system crashing while waiting for a spinlock in prune_cache. NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 VxVM VVR vxio V-5-0-330 Unable to connect to rlink fipsprod_to_hayward on rvg fipsprod_rvg: Timeout on remote<5>VxVM VVR vxio V-5-0-265 Rlink fipsprod_to_hayward connected to remoteVFS: Busy inodes after unmount of 0:19. Self-destruct in 5 seconds. Have a nice day... VxVM VVR vxio V-5-0-267 Rlink fipsprod_to_hayward disconnecting due to ack timeout on update message VxVM VVR vxio V-5-0-266 Rlink fipsprod_to_hayward disconnected from remote VxVM VVR vxio V-5-0-330 Unable to connect to rlink fipsprod_to_hayward on rvg fipsprod_rvg: Rlink already connected on remote<4>VxVM VVR vxio V-5-0-233 Node 192.168.14.80 disconnected from node 192.168.15.39 VxVM VVR vxio V-5-0-91 Connected from node 192.168.14.80 to node 192.168.15.39<5>VxVM VVR vxio V-5-0-265 Rlink fipsprod_to_hayward connected to remote<5>VxVM VVR vxio V-5-0-101 DCM replay complete on rlink fipsprod_to_hayward<3>NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x1193a1, got 0x2 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x1193a1, got 0x2 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x1193a1, got 0x2 VxVM VVR vxio V-5-0-0 Disconnecting rlink fipsprod_to_hayward due to excessive retries. VxVM VVR vxio V-5-0-0 Disconnecting rlink fipsprod_to_hayward due to excessive retries. VxVM VVR vxio V-5-0-0 Disconnecting rlink fipsprod_to_hayward due to excessive retries. VxVM VVR vxio V-5-0-0 Disconnecting rlink fipsprod_to_hayward due to excessive retries. VxVM VVR vxio V-5-0-0 Disconnecting rlink fipsprod_to_hayward due to excessive retries. VxVM VVR vxio V-5-0-266 Rlink fipsprod_to_hayward disconnected from remote VxVM VVR vxio V-5-0-233 Node 192.168.14.80 disconnected from node 192.168.15.39 VxVM VVR vxio V-5-0-91 Connected from node 192.168.14.80 to node 192.168.15.39<5>VxVM VVR vxio V-5-0-330 Unable to connect to rlink fipsprod_to_hayward on rvg fipsprod_rvg: Timeout on remote<5>VxVM VVR vxio V-5-0-265 Rlink fipsprod_to_hayward connected to remoteNMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: iptable_filter ip_tables x_tables vxfen(PFU) gab(PU) llt(PU) nfsd exportfs auth_rpcgss nfs lockd fscache nfs_acl sunrpc dmpjbod(PU) dmpap(PU) dmpaa(PU) vxspec(PFU) vxio(PFU) vxdmp(PU) fdd(PFU) vxportal(PFU) vxfs(PU) dm_mirror dm_log dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i5000_edac edac_mc sg shpchp bnx2 pcspkr tg3 serio_raw hpilo lpfc scsi_transport_fc cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 503, comm: kswapd0 Tainted: PF 2.6.18-164.6.1.el5 #1 RIP: 0010:[<ffffffff80064bd8>] [<ffffffff80064bd8>] .text.lock.spinlock+0xe/0x30 RSP: 0018:ffff81052f331d38 EFLAGS: 00000086 RAX: 0000000000000282 RBX: ffff81051cc3e870 RCX: 0000000000000001 RDX: ffffffff803114e0 RSI: 0000000000000286 RDI: ffff81051cc3e874 RBP: ffff81051cc3e874 R08: ffff8104c9068738 R09: ffff81017bfc9078 R10: ffff81017bfc98e8 R11: ffffffff80103948 R12: 0000000000000000 R13: 0000000000000022 R14: 0000000000000000 R15: 0000000000000580 FS: 0000000000000000(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003c32e98640 CR3: 000000032ce85000 CR4: 00000000000006e0 Process kswapd0 (pid: 503, threadinfo ffff81052f330000, task ffff81052f4857a0) Stack: ffffffff8000b9dc ffff81017bfc9078 ffff81051cc3e870 ffff81022ec08b70 ffffffff8002e92f 0000000000000282 ffff81052ffece00 000000000002d94c 000000000000008d 00000000000000d0 ffffffff800ea1dd 0000000000000160 Call Trace: [<ffffffff8000b9dc>] __down_read_trylock+0x15/0x44 [<ffffffff8002e92f>] prune_dcache+0xf5/0x149 [<ffffffff800ea1dd>] shrink_dcache_memory+0x17/0x30 [<ffffffff8003f3b6>] shrink_slab+0xdc/0x153 [<ffffffff80057da5>] kswapd+0x343/0x46c [<ffffffff8009fc08>] autoremove_wake_function+0x0/0x2e [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4 [<ffffffff80057a62>] kswapd+0x0/0x46c [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003297c>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003287e>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 83 3f 00 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe BUG: warning at arch/x86_64/kernel/crash.c:163/nmi_shootdown_cpus() (Tainted: PF ) Note the following messages NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 VFS: Busy inodes after unmount of 0:19. Self-destruct in 5 seconds. Have a nice day... and a number of Vxfs related messages. The CPU appears to have been waiting for a spinlock at the following location static void prune_dcache(int count, struct super_block *sb) { .. s_umount = &dentry->d_sb->s_umount; if (down_read_trylock(s_umount)) { <-- tries to grab lock here and hangs. .. } The memory location where the super_block referenced by that dentry should have been appears to have been freed and in use by some other part of the code. The dentry being pruned is as follows crash> dentry 0xffff81022ec08b70 struct dentry { d_count = { counter = 0 }, .. d_inode = 0x0, .. d_parent = 0xffff810515f84b70, .. d_op = 0xffffffff88774ee0, d_sb = 0xffff81051cc3e800, .. d_iname = "cd\000hrc\000\067\061]\000y\000\062\064-16:14:32\000\000\000\000\000\000\000\000\000\000\000" } This looks like a negative dentry. The parent dentry which also appears to be the root dentry is as follows crash> dentry 0xffff810515f84b70 struct dentry { d_count = { counter = 1 }, d_flags = 20, --> DCACHE_DISCONNECTED | DCACHE_UNHASHED .. d_inode = 0xffff810514f57260, .. d_parent = 0xffff810515f84b70, .. d_op = 0xffffffff88774ee0, d_sb = 0xffff81051cc3e800, .. d_iname = "\000dmap\000\000\000\000\000\000\000/lib64/libc.so.6\000\000\000\000\000\000\000" } This has a valid inode which probably triggered the VFS: Busy inode message crash> inode.i_count 0xffff810514f57260 i_count = { counter = 1 }, System Details: 2.6.18-164.6.1.el5 SMP x86_64
Event posted on 04-14-2010 05:35pm BRT by fbl Hi, More interesting data from this vmcore: ... Looking at the parent crash> dentry 0xffff810515f84b70 struct dentry { d_count = { counter = 1 }, d_flags = 20, --> DCACHE_DISCONNECTED | DCACHE_UNHASHED .. d_inode = 0xffff810514f57260, ... crash> eval 0xffff810514f57260 - 0x1e0 hexadecimal: ffff810514f57080 struct nfs_inode { fileid = 0x2, fh = { size = 0x8, data = "\001\000\001\000,\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, flags = 0xe, cache_validity = 0x3b, read_cache_jiffies = 0x107ee4cba, attrtimeo = 0x7530, attrtimeo_timestamp = 0x10851de52, change_attr = 0x36827f90, last_updated = 0x107ee4cbc, cache_change_attribute = 0x107ee4cb7, data_updates = { counter = 0x0 }, access_cache = { rb_node = 0xffff8103c34ca900 }, access_cache_entry_lru = { next = 0xffff8103c34ca918, prev = 0xffff8103c34ca918 }, access_cache_inode_lru = { next = 0xffff81035dce1990, prev = 0xffff810274811b50 }, acl_access = 0xfffffffffffffff5, acl_default = 0x0, cookieverf = {0x0, 0x0}, req_lock = { raw_lock = { slock = 0x1 } }, dirty = { next = 0xffff810514f571a0, prev = 0xffff810514f571a0 }, commit = { next = 0xffff810514f571b0, prev = 0xffff810514f571b0 }, nfs_page_tree = { height = 0x0, gfp_mask = 0x220, rnode = 0x0 }, ndirty = 0x0, ncommit = 0x0, npages = 0x0, open_files = { next = 0xffff810514f571e0, prev = 0xffff810514f571e0 }, silly_count = { counter = 0x1 }, silly_list = { first = 0x0 }, waitqueue = { lock = { raw_lock = { slock = 0x1 } }, task_list = { next = 0xffff810514f57208, prev = 0xffff810514f57208 } }, nfs4_acl = 0x0, open_states = { next = 0xffff810514f57220, prev = 0xffff810514f57220 }, delegation = 0x0, delegation_state = 0x0, rwsem = { activity = 0x0, wait_lock = { raw_lock = { slock = 0x1 } }, wait_list = { next = 0xffff810514f57248, prev = 0xffff810514f57248 } }, fscache = 0x0, vfs_inode = { i_hash = { next = 0x0, pprev = 0xffff81000b49a460 }, ... <all remaining data matches with previous inode dump> So, the parent inode fsid is 0x2 and it matches with the following log entry right before the crash: ... NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed ... and it seems to come from: /* * Many nfs protocol calls return the new file attributes after * an operation. Here we update the inode to reflect the state * of the server's inode. * * This is a bit tricky because we have to make sure all dirty pages * have been sent off to the server before calling invalidate_inode_pages. * To make sure no other process adds more write requests while we try * our best to flush them, we make them sleep during the attribute refresh. * * A very similar scenario holds for the dir cache. */ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) fbl This event sent from IssueTracker by fbl issue 759533
Event posted on 04-14-2010 05:38pm BRT by fbl sorry, s/fsid/fileid/ in my previous post. fbl This event sent from IssueTracker by fbl issue 759533
It does look like a NFS inode here...and the root cause of the problem does appear to be the "Busy inodes after umount" issue. Here's basically what happened I think: A NFS filesystem was unmounted, but at least one of inodes in it couldn't be freed. That caused the "Busy inodes after umount" message to pop at umount time. The superblock was freed and the memory was recycled. The inodes and dentries associated with it were still sitting in the cache however, and when the VM tried to prune the dcache due to memory pressure it ended up dereferencing the now recycled superblock. The question of course is "Why weren't the inodes and dentries pruned at umount?"
Flags field in the root nfs_inode is 0xe, which is: #define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */ #define NFS_INO_STALE (2) /* possible stale inode */ #define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */ ...which indicates that it was marked as stale at some point (as expected). The i_ino is also "2", which matches these messages: NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x1193a1, got 0x2 ...so I suspect that the problem is likely related to the fact that the fileid for the root inode of the mount changed.
(In reply to comment #0) > > The parent dentry which also appears to be the root dentry is as follows > > crash> dentry 0xffff810515f84b70 > struct dentry { > d_count = { > counter = 1 > }, > d_flags = 20, --> DCACHE_DISCONNECTED | DCACHE_UNHASHED > .. > d_inode = 0xffff810514f57260, > .. > d_parent = 0xffff810515f84b70, > .. > d_op = 0xffffffff88774ee0, > d_sb = 0xffff81051cc3e800, > .. > d_iname = > "\000dmap\000\000\000\000\000\000\000/lib64/libc.so.6\000\000\000\000\000\000\000" > } > > This has a valid inode which probably triggered the VFS: Busy inode message Maybe my comment will be helpful, not sure, but here goes anyway. Probably not, unless it has already been successfully umounted. The root dentry should remain valid until it is dput (dentry_iput() does the real business I think) during umount. The busy inode message usually results from dentrys within the root still being active in some way, but the child above is either well and truly "gone" or is in the process of being created. If it is really "gone" then any process that tries to use it is in error. Except for the dcache garbage collection routines (that we see here), of course. The other thing that comes to mind is that the VFS will not "see" unhashed dentrys during lookups so they should't be available to modules unless they have socked an address away somewhere and shouldn't become negative unless a reference was not taken. Is something playing funny games with dentrys here? Is VxFS trying to use an invalid dentry? > > crash> inode.i_count 0xffff810514f57260 > i_count = { > counter = 1 > }, Yes, as it should be if it hasn't finished being umounted. Ian
Well...I can make these messages pop with the hypothetical scenario I came up with: export a filesystem from a server mount it on a client cd into it do some stuff (ls's and such) unexport it from server umount server filesystem export directory with same fsid stat root of share ...and I get this. At that point, I get stale NFS filehandles on the client. Unmounting it though doesn't seem to cause problems with busy inodes however. So perhaps there's another factor that contributes to this? NFS: server dantu.rdu.redhat.com error: fileid changed fsid 0:18: expected fileid 0x2, got 0x378001
If it's suspected that the hypothetical situation that I've layed out is part of the problem here. One thing they may want to consider in their clusters is using the "mp" export option which will only export a directory if the filesystem is mounted. See the exports(5) manpage for info. That might help prevent the issue from occurring while we try to determine the cause.
(In reply to comment #20) > (In reply to comment #15) > > > > Hypothetical situation: > > > > The service failed over and the underlying filesystem wasn't mounted, but the > > export was set up. Because it was exported with an explicit fsid= value, it > > looks like the same filesystem but the server ended up exporting the mountpoint > > of the directory instead. > > That would mean that the VCS configuration is grossly wrong > which is unlikley. The VCS configuration will consist of > groups of resources that depend on other groups. For example > the NFS service group will depend on the file systems group > and will not be able to start until VCS is happy the filesystems > are successfully setup. On second thoughts, maybe not. IIRC, the failover activity depends on site supplied scripts. If the ones responsible for the file system mounts returned an incorrect result that could cause this. If there is a failure then the whole process stops waiting for admin intervention so that is another opportunity for out of order actions. What happens if in your test above you mount the filesystem over the top of the mount point after it is exported?
Created attachment 407030 [details] suggested change to prevent dropping a disconnected dentry
The patch seems to fix the problem shown by the reproducer. What is less clear is whether this problem is the actual problem that the customer hit. There's no way to say that it is with 100% certainty. What we can say is that the problem they hit is consistent with the problem that the reproducer shows. That's probably the best we can hope for with this. At this point, I'm just waiting for Al to push the patch upstream. Once that's done, we can propose it for RHEL5. I also need to see whether the problem is reproducible on rawhide/rhel6 as well.
I'm unable to reproduce this so far on rawhide kernels. This commit seems like it's probably papering over the bug: commit 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486 Author: Trond Myklebust <Trond.Myklebust> Date: Thu Mar 6 12:34:59 2008 -0500 NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings ...that said, it seems a little fragile to depend on this. The code could change at some point or there may be some way to avoid this check in the current code (exclusive create?). Either way, I think Al's proposed patch is reasonable.
I'll add my ACK on Al's patch.
in kernel-2.6.18-199.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When an NFS server exported a file system with an explicit "fsid=[file_system_ID]", an NFS client mounted that file system on one mount point and a subdirectory of that file system on a separate mount point, then if the server re-exported that file system after un-exporting and unmounting it, it was possible for the NFS client to unmount those mount points and receive the following error message: "VFS: Busy inodes after unmount..." Additionally, it was possible to crash the NFS client's kernel in this situation.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html