Description of problem: When I try a plock operation on an NFS mounted GFS file system I get the following message in /var/log/messages. The plock operation does work. Version-Release number of selected component (if applicable): cman-2.0.84-2.el5 kernel-2.6.18-92.el5 kmod-gfs-0.1.23-5.el5 How reproducible: 100% Steps to Reproduce: A. On Server 1. mkfs -t gfs -p lock_dlm -t ... /dev/foo 2. mount -t gfs /dev/foo /mnt/foo 3. exportfs -o rw client:/mnt/foo 4. tail -f /var/log/messages B. On client 1. mount server:/mnt/foo /mnt/foo 2. cd /mnt/foo 3. xiogen -i 1 -F 10k:testfile | xdoio -k Actual results: May 12 15:37:59 newport gfs_controld[4835]: plock result write err 0 errno 2 May 12 15:38:16 newport gfs_controld[4835]: plock result write err 0 errno 2 May 12 16:13:29 tank-04 gfs_controld[26444]: plock result write err 0 errno 9 May 12 16:14:07 tank-04 last message repeated 3 times May 12 16:21:31 tank-04 gfs_controld[26444]: plock result write err 0 errno 9 Expected results: No "error" messages Additional info:
I'm assuming there are no errors reported for plocks requested on gfs directly? I'm pretty sure this has to do with the way the source node of a request is identified and the fact that the node/process identifiers change for plocks arriving through nfs.
Correct, I ran the tests w/o NFS in the mix and I did not see any extra messages.
The error is harmless apart from the annoying messages. The kernel is returning the wrong value from write(2) on the plock device (0 instead of the number of bytes written). Until the kernel is fixed, this fix just removes the error message from gfs_controld. commit on RHEL5 branch 685498d154acfff23e4af7bfe874a7b0ed2eb9c5 commit on STABLE2 branch a6b6a30358fd5e247a37e2fe493ef6a683174b66
I face the same problem. We have RH 5.1, using gfs2 kernel 2.6.18-92.1.1.el5 kmod-gfs-0.1.19-7.el5_1.3 cman-2.0.84-2.el5 I got the error message like every 15 minutes. I am not sure if this problem is related with the error message I got but, when the number of usage increase the clients who exported this filesystem via nfs just hangs. I also see gfs_controld uses %100 cpu on the node.
I have the same problem serving nfs from gfs(1) with 5.2 versions: cman-2.0.84-2.el5 kernel-2.6.18-92.1.6.el5 kmod-gfs-0.1.23-5.el5 while the "plock result write err 0 errno 2" messge is most common, I also see this these messages as well: gfs_controld[5348]: plock result write err -1 errno 2 gfs_controld[5348]: plock result write err 0 errno 9 gfs_controld[5237]: plock result write err 0 errno 11 kernel: lock_dlm: gdlm_plock: vfs lock error file ffff81011e2748c0 fl \ ffff8100c2aa6ce0 kernel: lockd: grant for unknown block kernel: gfs2 lock granted after lock request failed; dangling lock!
(In reply to comment #5) > I have the same problem serving nfs from gfs(1) with 5.2 versions: > > cman-2.0.84-2.el5 > kernel-2.6.18-92.1.6.el5 > kmod-gfs-0.1.23-5.el5 > > while the "plock result write err 0 errno 2" messge is most common, I also see > this these messages as well: > > gfs_controld[5348]: plock result write err -1 errno 2 > gfs_controld[5348]: plock result write err 0 errno 9 > gfs_controld[5237]: plock result write err 0 errno 11 > kernel: lock_dlm: gdlm_plock: vfs lock error file ffff81011e2748c0 fl \ > ffff8100c2aa6ce0 > kernel: lockd: grant for unknown block > kernel: gfs2 lock granted after lock request failed; dangling lock! I have the same problem as Andrew, also with 5.2. Adding <gfs_controld plock_rate_limit="0" plock_ownership="1"/> to /etc/cluster/cluster.conf seemed to help the possibly-unrelated problem of nfs clients appearing to hang when accessing the nfs-mounted gfs filesystem, but access is still much slower than with 5.1.
Does this mean if the host exports the GFS file system via NFS, the NFS client will receive a lock request failure, even though the lock is eventually granted? I'm seeing very bad Firefox 3.0 performance on NFS clients when ~/.mozilla is on a GFS file system, because sqlite failures on files like places.sqlite.... I am wondering if this is the cause.
plocks work fine, this bz is just about a bad log message that's been removed (see comment 3)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0189.html