This bug was initially reported for RHEL 4. This also affects RHEL 5. This can potentially impact RHCS setups exporting NFS shares. +++ This bug was initially created as a clone of Bug #505591 +++ This bug was first noticed by nfs users sharing filesystems using the Red Hat cluster suite. When a user wanted to reboot a node on a cluster, the nfs services on the node fail to cleanly relocate onto other nodes. This node then gets fenced. Nodes that receive services from the first node may also have problems relocating these services back to the original node when it comes back. This is caused by locks which have not been freed on the filesystem. --- Additional comment from sprabhu on 2009-06-12 10:42:24 EDT --- Setup: 1) a filesystem mounted on /test1 2) /test1 is exported to other clients. 3) Client has mounted server:/test1 and has a lock on the filesystem. To reproduce: 1) mount the share server:/test1 on a client and grab a lock on it. 2) Run the command service nfslock restart This is used to reset the time for the garbage collector which clears killed nlm_host 3) Run the command service nfslock restart This is used to mark current nlm_host->h_killed to 1. 4) unexport /test1, exportfs -u '*:/test1' This is used to simulate the RHCS where the nfs service is stopped. 5) Run the command service nfslock restart to free all locks held on /test1 6) Try unmounting /test1 umount /test1 The file share is unmounted. The umount command fails in this case. Explanation 1) The nfslock restart sends a SIGKILL to lockd asking it to invalidate all locks held by the server. nlmsvc_invalidate_all() invalidates all locks on the server. At this time, the nlm_host->h_killed is set to 1. A notify is sent to the clients indicating that they should re-claim their locks.The clients reclaim the lock. When the client reclaims the lock, the garbage collector is first run nlmclnt_proc() -> nlmclnt_lookup_host() -> nlm_lookup_host() struct nlm_host * nlm_lookup_host(int server, struct sockaddr_in *sin, int proto, int version) { .. if (time_after_eq(jiffies, next_gc)) nlm_gc_hosts(); .. } At this time, the garbage collector nlm_gc_hosts() would be pending and would be run. nlm_gc_hosts() would clear all nlm_hosts with h_killed set to 1. This clears the old nlm_hosts on the machine. A new nlm_host is then generated for the client with h_killed set to 0. The first nfslock restart is used to reset the next_gc so that the garbage collector will not be run for the next 120 seconds within which the test can take place. 2) When the second service nfslock restart is run, the lockd module again invalidates all locks held on the server. nlmsvc_invalidate_all() goes through all nlm_hosts and sets nlm_host->h_killed to 1. The clients again reclaim all locks. However this time, the garbage collector is not called in nlm_lookup_host() since the next_gc > jiffies. The older nlm_host still having h_killed set to 1 is used. So the locks are now held by host having nlm_host->h_killed set to 1. 3) On the server side, the filesystem is now unexported and a service nfs_lock restart is issued to invalidate the locks. However, this time, nlmsvc_invalidate_all() will skip over this lock which is held by client since the h_killed on the nlm_host is set to 1 and the lockd assumes that this host should not have had a lock. void nlmsvc_invalidate_all(void) { struct nlm_host *host; while ((host = nlm_find_client()) != NULL) { nlmsvc_free_host_resources(host); host->h_expires = 0; host->h_killed = 1; nlm_release_host(host); } } struct nlm_host * nlm_find_client(void) { .. for (hp = &nlm_hosts[hash]; (host = *hp) != 0; hp = &host->h_next) { if (host->h_server && host->h_killed == 0) { <-- checks for h_killed nlm_get_host(host); up(&nlm_host_sema); return host; } .. } 4) Now when attempting to unmount the filesystem, the umount will fail since the lock is still held on this filesystem. Notes: 1) nlm_host->h_killed set to 1 is used to indicate that the nlm_host has been killed and should not be used. 2) next_gc is a global which contains the number of jiffies after which the garbage collector to reap nlm_host should be run. --- Additional comment from sprabhu on 2009-06-12 10:47:00 EDT --- Created an attachment (id=347579) proposed patch The proposed patch simply adds a check in nlm_lookup_host so that nlm_host entries which have been killed earlier are not used. This patch doesn't match the fixes upstream. There have been a number of changes made upstream to nlmsvc_invalidate_all() and nlm_host no longer have the h_killed component. Since the number of changes required are quiet high and there is a risk of change involved, I decided to use this approach which is relatively risk free.
in kernel-2.6.18-165.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
reproduce on 5.4: 1.setup nfs server which kernel is 5.4 (2.6.18-164.el5) # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 63547176 2837968 57429132 5% / /dev/sda1 101086 12594 83273 14% /boot tmpfs 2022036 0 2022036 0% /dev/shm # mount /dev/sda1 /test1 # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 63547176 2837968 57429132 5% / /dev/sda1 101086 12594 83273 14% /boot tmpfs 2022036 0 2022036 0% /dev/shm /dev/sda1 101086 12594 83273 14% /test1 # grep test1 /etc/exports /test1 *(no_root_squash,rw,no_subtree_check) # /etc/init.d/nfs restart 2.mount the share server:/test1 on a client(2.6.18-164.el5) and grab a lock on it. client# mount server:/test1 /mnt client# cd /mnt/ client# touch a use the below test program to grabs a lock on a file https://bugzilla.redhat.com/attachment.cgi?id=363937 client# ./a.out a 1 Opening file a -- Done Locking file a -- Done 3.Run the command on the nfs server: 1)service nfslock restart This is used to reset the time for the garbage collector which clears killed nlm_host 2)Run the command again: service nfslock restart This is used to mark current nlm_host->h_killed to 1. 3)unexport /test1, exportfs -u '*:/test1' This is used to simulate the RHCS where the nfs service is stopped. 4) Run the command service nfslock restart to free all locks held on /test1 5) Try unmounting /test1 # umount /test1 umount: /test1: device is busy umount: /test1: device is busy set up 5.5 new kernel 2.6.18-192.el5 on client and follow the above steps, and umount successfully.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html