Description of problem: After I do some NFS operations, I am unable to umount my source file system. Version-Release number of selected component (if applicable): RHEL4.6 The problem does not recreate in the 2.6.9-78.19 kernel, but it does recreate in the 2.6.9-78.20 thru .25 kernels How reproducible: Easily Steps to Reproduce: 1. mkfs ext3 or gfs file system 2. mount ext3 or gfs file system 3. service nfs start 4. mount nfs over the ext3 or gfs file system 5. do some I/O 6. stop all I/O 7. umount nfs 8. service nfs stop 9. umount ext3 or gfs file system Actual results: Unable to unmount source file system: File system busy Expected results: Should be able to unmount Additional info: lsof shows no files in use. All tasks relating to the fs have ended.
Hi Peter, Can you or Steve D, by any chance, post a link to the patch that was added between .19 and .20. I want to try reverting the patch (by hand if necessary) in a newer (2.6.9-79 or 2.6.9-80) kernel to see if I can get the problem to go away for some of my bugzilla work.
You should be able to find the patch by looking at the changelog section of the spec file? The bugzilla was 278291.
I verified that reverting the patch for 278291 allows me to unmount the file system properly on an otherwise 2.6.9-80 kernel.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 332285 [details] Proposed patch
Josef Bacik code reviewed the patch for bz278291 and discovered that a reference counting problem had been added. The pathname structs were getting get()'d without being put()'d in the routine, exp_export(). This was leading to active inodes on the referenced file system(s). I reviewed the entire routine and corrected two reference counting problems. One was the pathname reference counting problem and the other was a problem with the references to the auth_domain. With this patch, I am unable to so easily reproduce the busy file systems that I could see before.
I tried the attached patch and I hope I did everything correctly. The patch did not fix my problem. Not only was I unable to unmount my underlying gfs file system, my test also took a lot longer to run for some reason. The test normally runs in 15 minutes, but took 25 minutes with the patch. The performance thing may be unrelated, but it was an oddity.
AFAICT, I applied and built the patch correctly.
Created attachment 332577 [details] Bob's patch Looking at the patch, it seems to me that the problem might be related to the switching of tmp with new and the cache_put that happens later in svc_expkey_lookup(). I tried making the expkey_put conditionally put tmp or new depending on whether the old switcheroo happened. It didn't work, but I may have screwed it up. It caused problems. Next, I tried this patch, but I still can't unmount after using it.
finally got down to the basics to try and figure out what was going wrong. Put some debug printk's in, one for every new export we added in svc_export_lookup and one for every new expkey we added in svc_expkey_lookup. Also printed out which export we were grabbing a ref for in svc_expkey_update(?), and then when we free the expkey and export. This is what I got doing a new export, 00000101be71e600 new expkey 00000101be71e580 getting ref for export 00000101be71e600 new expkey 00000101be71e500 getting ref for export 00000101be71e600 doing a new export, 00000101be71e400 new expkey 00000101be71e380 getting ref for export 00000101be71e400 new expkey 00000101be71e300 getting ref for export 00000101be71e400 doing a new export, 00000101be71e200 new expkey 00000101be71e180 getting ref for export 00000101be71e200 new expkey 00000101be71e100 getting ref for export 00000101be71e200 doing a new export, 00000101be71e280 new expkey 000001021812f400 getting ref for export 00000101be71e280 new expkey 0000010184a81080 getting ref for export 00000101be71e280 nfsd: last server has exited nfsd: unexporting all filesystems freeing expkey 00000101be71e500 putting ref for export 00000101be71e600 freeing expkey 00000101be71e100 putting ref for export 00000101be71e200 freeing expkey 0000010184a81080 putting ref for export 00000101be71e280 freeing expkey 00000101be71e380 putting ref for export 00000101be71e400 freeing expkey 00000101be71e180 putting ref for export 00000101be71e200 freeing expkey 00000101be71e580 putting ref for export 00000101be71e600 freeing expkey 00000101be71e300 putting ref for export 00000101be71e400 freeing export 00000101be71e400 freeing export 00000101be71e600 freeing export 00000101be71e200 Exports 00000101be71e600 00000101be71e400 00000101be71e200 00000101be71e280 Expkeys -> export 00000101be71e580 -> 00000101be71e600 freed/cleared 00000101be71e500 -> 00000101be71e600 freed/cleared 00000101be71e380 -> 00000101be71e400 freed/cleared 00000101be71e300 -> 00000101be71e400 freed/cleared 00000101be71e180 -> 00000101be71e200 freed/cleared 00000101be71e100 -> 00000101be71e200 freed/cleared 000001021812f400 -> 00000101be71e280 0000010184a81080 -> 00000101be71e280 freed/cleared Sooo this is an expkey accounting problem, which is leading to one of the exports not being freed, which is leading to a dangling ref on the vfsmnt, which keeps us from unmounting. So now to find the dangling expkey ref.
doing a new export, 0000010215e98580 cache 0000010215e98580 expire_time new expkey 0000010215e98500 getting ref for export 0000010215e98580 new expkey 0000010215e98480 getting ref for export 0000010215e98580 doing a new export, 0000010215e98380 new expkey 0000010215e98300 getting ref for export 0000010215e98380 new expkey 0000010215e98280 getting ref for export 0000010215e98380 doing a new export, 0000010215e98180 new expkey 0000010215e98100 getting ref for export 0000010215e98180 new expkey 0000010215e98080 getting ref for export 0000010215e98180 cache 0000010215e98200 expire_time cache 0000010215e98400 expire_time cache 0000010215e98600 expire_time cache 0000010215e98080 expire_time cache 0000010215e98280 expire_time cache 0000010215e98500 expire_time cache 0000010215e98300 expire_time cache 0000010215e98100 expire_time cache 0000010215e98480 expire_time cache 0000010215e98180 expire_time cache 0000010215e98380 expire_time cache 0000010215e98580 expire_time cache 0000010215e98200 expire_time cache 0000010215e98400 expire_time cache 0000010215e98600 expire_time cache 0000010215e98080 expire_time cache 0000010215e98280 expire_time cache 0000010215e98500 expire_time cache 0000010215e98300 expire_time cache 0000010215e98100 expire_time cache 0000010215e98480 expire_time cache 0000010215e98180 expire_time cache 0000010215e98380 expire_time cache 0000010215e98580 expire_time cache 0000010217f13e00 expire_time cache 0000010215e98400 expire_time cache 0000010215e98600 expire_time doing a new export, 0000010215e98200 new expkey 0000010217276680 getting ref for export 0000010215e98200 cache 00000102168b4f00 expire_time cache 0000010215e98400 expire_time cache 0000010215e98600 expire_time new expkey 2 00000101b07b2880 getting ref for export 0000010215e98200 cache 0000010215e98080 expire_time cache 0000010215e98280 expire_time cache 00000101b07b2880 expire_time cache 0000010215e98500 expire_time cache 0000010215e98300 expire_time cache 0000010215e98100 expire_time cache 0000010215e98480 expire_time nfsd: last server has exited nfsd: unexporting all filesystems freeing expkey 0000010215e98080 putting ref for export 0000010215e98180 freeing expkey 0000010215e98280 putting ref for export 0000010215e98380 freeing expkey 00000101b07b2880 putting ref for export 0000010215e98200 freeing expkey 0000010215e98500 putting ref for export 0000010215e98580 freeing expkey 0000010215e98300 putting ref for export 0000010215e98380 freeing expkey 0000010215e98100 putting ref for export 0000010215e98180 freeing expkey 0000010215e98480 putting ref for export 0000010215e98580 freeing export 0000010215e98180 freeing export 0000010215e98380 cache 0000010215e98200 not being flushed b/c of refcnt, which is 2 freeing export 0000010215e98580 rpciod: active tasks at shutdown?! RPC: failed to contact portmap (errno -5). cache 0000010215e98200 not being flushed b/c of refcnt, which is 2 Going to have to look at this later, but here's what we know. Since we cannot free 0000010215e98200, which is an svc_export, and nothing else, we _know_ that the expkey that is holding the ref on the svc_export is not on the list, because we didn't get any other "not being flushed" messages. We know this expkey was not free'ed new expkey 2 00000101b07b2880 getting ref for export 0000010215e98200 the new expkey 2 message means it was added to the list here if (set) { if (test_bit(CACHE_VALID, &tmp->h.flags)) { /* need to swap in new */ struct svc_expkey *t2; printk(KERN_ERR "new expkey 2 %p\n", new); new->h.next = tmp->h.next; *hp = &new->h; tmp->h.next = NULL; t2 = tmp; tmp = new; new = t2; } if (test_bit(CACHE_NEGATIVE, &item->h.flags)) { WARN_ON(test_bit(CACHE_HAS_REF, &tmp->h.flags)); set_bit(CACHE_NEGATIVE, &tmp->h.flags); } else { WARN_ON(test_bit(CACHE_HAS_REF, &tmp->h.flags)); svc_expkey_update(tmp, item); clear_bit(CACHE_NEGATIVE, &tmp->h.flags); } } Now I cannot figure out why thats wrong, and I think it has to do with staring at this screen for too long and I'm terribly hungry, so I'm going to take a fresh look at it on monday.
This is precisely the algorithm I was tweaking on Thursday. I made four different attempts to fix it and each of them caused kernel panic problems in odd places. :7) Now I'm leaving it to the experts.
Now that i've had a moment to think, I'm pretty sure I know the solution. Since we've removed the old expkey from the cache list, it will never get culled by cache_flush(), so it never gets freed. What we need to do is after t2 = tmp; tmp = new; new = t2; do svc_expkey_put(&new->h, &svc_expkey_cache); and then at the end it will get freed. I will test this now.
ok I tested this morning and it seems that fixed the problem. I'm going to test some more on monday, but that should fix it.
Ok that fix isn't right, we should only have one ref on that thing atm, so when we do the put before we return it should be freed, so we need to figure out who is grabbing the extra ref on the expkey and not putting it.
Ok I've figured out whats going on. So we have two people do a svc_expkey_lookup at the same time via exp_find_key. Process A goes through, can't find what we're looking for, creates a new svc_expkey and restarts the loop, Process B does the same thing. Process A then still doesn't find the svc_expkey and adds the new one in there, and comes back which does a cache_check, which in turn gets another ref on the svc_expkey via cache_make_upcall. Process B comes through its loop again, and finds the svc_expkey that Process A added. So it rips the old key out and puts the new one in, and does a put on what Process A did, but unfortunately because of the cache_make_upcall, we are still holding a ref. The old way this was taken care of was by doing a cache_fresh() on new, which would kill the pending cache update and the ref on the cache, so the final svc_expkey_put would destroy the old expkey, but we don't do that anymore, we just update the expirey time, which doesn't make a difference because we've removed the thing from the hashtable. So what we need to do is do this if (test_and_clear_bit(CACHE_PENDING, &new->h.flags)) queue_loose(&svc_expkey_cache, head); after the new->h.next = tmp->h.next; *hp = &new->h; tmp->h.next = NULL; t2 = tmp; tmp = new; new = t2; part of svc_expkey_lookup, which will clear everything out and clear that pending ref. I will test with this tomorrow and make sure it works.
Created attachment 333056 [details] proposed patch Ok heres the final patch that fixes the problem. It is a combination of ps's fix and the stuff needed to kill the extra ref on tmp. Basically we have to set CACHE_VALID on new so when we call cache_fresh() it doesn't do any of the cache_revisit_request() crap, it just does the queue_loose(), which will free it up. I'm not an NFS person so theres probably a better way to do this, but this is the best I've come up with. I tested it and it worked, I'm going to build a scratch build and run that kernel to make doubly sure.
Regardless of what anyone says about the merits or correctness of the patch, Josef's version from comment #17 does, in fact, solve the problem for me.
Reverted following patch in 82.EL "sunrpc: fix kernel crash in sunrpc cache_clean" This was the patch which caused this regression. This is an stop gap measure and a good fix (which fixes the reverted patch) should go in later.
I have a concern over this patch. I may have found another code path that makes it impossible to unmount. This was an NFS error path. I did this set of commands (by accident): NFS server: ----------- mkfs.gfs -O -t "bobs_roth:roth1" -p lock_dlm -j3 /dev/roth_vg/roth1 mount -tgfs /dev/roth_vg/roth1 /mnt/gfs service nfs start NFS client: ----------- mount roth-01:/mnt/gfs /mnt/gfs cd /mnt/gfs (a bunch of NFS IO--the "genesis" test) cd - NFS Server: ----------- service nfs stop (Then I realized I forgot to umount on the client BEFORE I stopped the NFS server--so I went to do that) NFS Client: ----------- umount /mnt/gfs umount.nfs: roth-01:/mnt/gfs: not found / mounted or server not reachable umount.nfs: roth-01:/mnt/gfs: not found / mounted or server not reachable (makes sense--nfs can't contact the NFS server because the server was stopped. So I restarted it so that I could cleanly unmount.) NFS Server: ----------- service nfs start NFS Client: ----------- umount /mnt/gfs (this time it was successful) NFS Server: ----------- service nfs stop umount /mnt/gfs /sbin/umount.gfs: /mnt/gfs: device is busy. /sbin/umount.gfs: /mnt/gfs: device is busy. (Now there's no way to unmount the source file system).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html