507549 – Bug in lockd prevents a locks being freed.

Bug 507549 - Bug in lockd prevents a locks being freed.

Summary: Bug in lockd prevents a locks being freed.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Sachin Prabhu
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:	505591
Blocks:	517967 526950
TreeView+	depends on / blocked

Reported:	2009-06-23 09:40 UTC by Sachin Prabhu
Modified:	2012-02-07 12:13 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	505591
Environment:
Last Closed:	2010-03-30 06:58:18 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	sprabhu: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0178	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update	2010-03-29 12:18:21 UTC

Description Sachin Prabhu 2009-06-23 09:40:46 UTC

This bug was initially reported for RHEL 4. This also affects RHEL 5. This can potentially impact RHCS setups exporting NFS shares.

+++ This bug was initially created as a clone of Bug #505591 +++

This bug was first noticed by nfs users sharing filesystems using the Red Hat cluster suite.

When a user wanted to reboot a node on a cluster, the nfs services on the node fail to cleanly relocate onto other nodes. This node then gets fenced. Nodes that receive services from the first node may also have problems relocating these services back to the original node when it comes back.

This is caused by locks which have not been freed on the filesystem.

--- Additional comment from sprabhu on 2009-06-12 10:42:24 EDT ---

Setup:

1) a filesystem mounted on /test1
2) /test1 is exported to other clients.
3) Client has mounted server:/test1 and has a lock on the filesystem.

To reproduce:

1) mount the share server:/test1 on a client and grab a lock on it.
2) Run the command
service nfslock restart
This is used to reset the time for the garbage collector which clears killed nlm_host
3) Run the command
service nfslock restart
This is used to mark current nlm_host->h_killed to 1.
4) unexport /test1,
exportfs -u '*:/test1'
This is used to simulate the RHCS where the nfs service is stopped.
5) Run the command
service nfslock restart
to free all locks held on /test1
6) Try unmounting /test1
umount /test1
The file share is unmounted.

The umount command fails in this case.

Explanation

1) The nfslock restart sends a SIGKILL to lockd asking it to invalidate all locks held by the server. nlmsvc_invalidate_all() invalidates all locks on the server. At this time, the nlm_host->h_killed is set to 1. A notify is sent to the clients indicating that they should re-claim their locks.The clients reclaim the lock.

When the client reclaims the lock, the garbage collector is first run
nlmclnt_proc() -> nlmclnt_lookup_host() -> nlm_lookup_host()

struct nlm_host *
nlm_lookup_host(int server, struct sockaddr_in *sin,
                                       int proto, int version)
{
..
       if (time_after_eq(jiffies, next_gc))
               nlm_gc_hosts();
..
}

At this time, the garbage collector nlm_gc_hosts() would be pending and would be run.
nlm_gc_hosts() would clear all nlm_hosts with h_killed set to 1. This clears the old nlm_hosts on the machine.
A new nlm_host is then generated for the client with h_killed set to 0.

The first nfslock restart is used to reset the next_gc so that the garbage collector will not be run for the next 120 seconds within which the test can take place.

2) When the second service nfslock restart is run, the lockd module again invalidates all locks held on the server. nlmsvc_invalidate_all() goes through all nlm_hosts and sets nlm_host->h_killed to 1.
The clients again reclaim all locks.

However this time, the garbage collector is not called in nlm_lookup_host() since the next_gc > jiffies. The older nlm_host still having h_killed set to 1 is used. So the locks are now held by host having nlm_host->h_killed set to 1.

3) On the server side, the filesystem is now unexported and a service nfs_lock restart is issued to invalidate the locks. However, this time, nlmsvc_invalidate_all() will skip over this lock which is held by client since the h_killed on the nlm_host is set to 1 and the lockd assumes that this host should not have had a lock.

void
nlmsvc_invalidate_all(void)
{
       struct nlm_host *host;
       while ((host = nlm_find_client()) != NULL) {
               nlmsvc_free_host_resources(host);
               host->h_expires = 0;
               host->h_killed = 1;
               nlm_release_host(host);
       }
}

struct nlm_host *
nlm_find_client(void)
{
..
               for (hp = &nlm_hosts[hash]; (host = *hp) != 0; hp = &host->h_next) {
                       if (host->h_server &&
                           host->h_killed == 0) { <-- checks for h_killed
                               nlm_get_host(host);
                               up(&nlm_host_sema);
                               return host;
                       }
..
}

4) Now when attempting to unmount the filesystem, the umount will fail since the lock is still held on this filesystem.

Notes:
1) nlm_host->h_killed set to 1 is used to indicate that the nlm_host has been killed and should not be used.
2) next_gc is a global which contains the number of jiffies after which the garbage collector to reap nlm_host should be run.

--- Additional comment from sprabhu on 2009-06-12 10:47:00 EDT ---

Created an attachment (id=347579)
proposed patch

The proposed patch simply adds a check in nlm_lookup_host so that nlm_host entries which have been killed earlier are not used.

This patch doesn't match the fixes upstream. There have been a number of changes made upstream to nlmsvc_invalidate_all() and nlm_host no longer have the h_killed component.  Since the number of changes required are quiet high and there is a risk of change involved, I decided to use this approach which is relatively risk free.

Comment 2 Don Zickus 2009-09-04 18:45:38 UTC

in kernel-2.6.18-165.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 7 yanfu,wang 2010-03-11 03:51:59 UTC

reproduce on 5.4:
1.setup nfs server which kernel is 5.4 (2.6.18-164.el5)
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      63547176   2837968  57429132   5% /
/dev/sda1               101086     12594     83273  14% /boot
tmpfs                  2022036         0   2022036   0% /dev/shm

# mount /dev/sda1 /test1

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      63547176   2837968  57429132   5% /
/dev/sda1               101086     12594     83273  14% /boot
tmpfs                  2022036         0   2022036   0% /dev/shm
/dev/sda1               101086     12594     83273  14% /test1

# grep test1 /etc/exports 
/test1 *(no_root_squash,rw,no_subtree_check)

# /etc/init.d/nfs restart


2.mount the share server:/test1 on a client(2.6.18-164.el5) and grab a lock on it.
client# mount server:/test1 /mnt
client# cd /mnt/
client# touch a
use the below test program to grabs a lock on a file
https://bugzilla.redhat.com/attachment.cgi?id=363937
client# ./a.out a 1
Opening file a    -- Done

Locking file a    -- Done

3.Run the command on the nfs server:
1)service nfslock restart
This is used to reset the time for the garbage collector which clears killed
nlm_host
2)Run the command again:
service nfslock restart
This is used to mark current nlm_host->h_killed to 1.
3)unexport /test1,
exportfs -u '*:/test1'
This is used to simulate the RHCS where the nfs service is stopped.
4) Run the command
service nfslock restart
to free all locks held on /test1
5) Try unmounting /test1
# umount /test1
umount: /test1: device is busy
umount: /test1: device is busy

set up 5.5 new kernel 2.6.18-192.el5 on client and follow the above steps, and umount successfully.

Comment 9 errata-xmlrpc 2010-03-30 06:58:18 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.