Bug 468274
Summary: | nfs posix locking broken on gfs | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | David Teigland <teigland> |
Component: | kernel | Assignee: | David Teigland <teigland> |
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.2 | CC: | adas, cluster-maint, edamato, konradr, staubach, steved, swhiteho |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-10-24 21:31:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Teigland
2008-10-23 20:22:46 UTC
Simple test to observe the problem owner values: On nfs client, repeat lock/unlock on single file several times: # ./plockuli /mnt/a 1849 fcntl F_WRLCK ... done press return to continue ... 1849 fcntl F_UNLCK ... done press return to continue ... 1849 fcntl F_WRLCK ... done press return to continue ... 1849 fcntl F_UNLCK ... done press return to continue ... 1849 fcntl F_WRLCK ... done press return to continue ... 1849 fcntl F_UNLCK ... done press return to continue ... 1849 fcntl F_WRLCK ... done press return to continue ... 1849 fcntl F_UNLCK ... done press return to continue ... On gfs node exporting to client above: x/y/z in the following messages are nodeid/pid/owner. # gfs_controld -DP 1224792871 x read plock 208004 LK WR 0-4 1/0/0 w 0 1224792871 x receive plock 208004 LK WR 0-4 1/0/0 w 0 1224792871 plock result write err 0 errno 9 1224792881 x read plock 208004 UN - 0-4 1/0/0 w 0 1224792881 x receive plock 208004 UN - 0-4 1/0/0 w 0 1224792882 x read plock 208004 LK WR 0-4 1/1/1 w 0 1224792882 x receive plock 208004 LK WR 0-4 1/1/1 w 0 1224792882 plock result write err 0 errno 9 1224792884 x read plock 208004 UN - 0-4 1/1/1 w 0 1224792884 x receive plock 208004 UN - 0-4 1/1/1 w 0 1224792884 x read plock 208004 LK WR 0-4 1/2/2 w 0 1224792884 x receive plock 208004 LK WR 0-4 1/2/2 w 0 1224792884 plock result write err 0 errno 9 1224792886 x read plock 208004 UN - 0-4 1/2/2 w 0 1224792886 x receive plock 208004 UN - 0-4 1/2/2 w 0 1224792886 x read plock 208004 LK WR 0-4 1/3/3 w 0 1224792886 x receive plock 208004 LK WR 0-4 1/3/3 w 0 1224792886 plock result write err 0 errno 9 1224792888 x read plock 208004 UN - 0-4 1/3/3 w 0 1224792888 x receive plock 208004 UN - 0-4 1/3/3 w 0 We see that the owner is 0,1,2,3... for each lock request. (The error messages here are harmless and have been fixed.) Copying an email I've sent to upstream nfs developers. Without nfs in the picture, gfs posix locks work like this: on node 1 process 1234 does fcntl(SETLK, WR) on inode 5 gfs calls dlm_posix_lock(5, file, fl) dlm sets up the op->info structure to define the operation info.pid = fl->fl_pid (1234) info.owner = fl->fl_owner (123ABC) dlm sends the op to dlm_controld in userspace dlm_controld distributes to all nodes via openais on nodes 1 & 2 dlm_controld adds to lock state table: held locks: inode 5, lock WR, pid 1234, owner 123ABC on node 2 process 5678 does fcntl(SETLK, WR) on inode 5 gfs calls dlm_posix_lock(5, file, fl) dlm sets up the op->info structure to define the operation info.pid = fl->fl_pid (5678) info.owner = fl->fl_owner (567DEF) dlm sends the op to dlm_controld in userspace dlm_controld distributes to all nodes via openais on nodes 1 & 2 dlm_controld compares this operation against held locks, finds that this lock op conflicts with the existing held lock based on comparing the nodeids (1 vs 2) and the info.owner values (123ABC vs 567DEF) adds this lock op to the waiting list, updated lock state table: held locks: inode 5, lock WR, pid 1234, owner 123ABC waiting locks: inode 5, lock WR, pid 5678, owner 567DEF Now, bringing nfs into the picture. The dlm checks whether it's an nfs lock request, if (fl->fl_lmops && fl->fl_lmops->fl_grant), and if so, it does info.pid = fl->fl_pid; info.owner = fl->fl_pid; presumably because fl->fl_owner is always lockd which doesn't help us distinguish the real lock owners. The assumption is that fl->fl_pid does distinguish lock owners. However, when I start process 9876 on an nfs client mounting gfs, and do the first SETLK, dlm_controld sees info.pid and info.owner of 0, i.e. fl->fl_pid is 0. process 9876 unlocks it, and does the same thing again; this time dlm_controld sees info.pid and info.owner of 1, i.e. fl->fl_pid is 1; each client request results in an incremented fl_pid. This, of course, doesn't work as a lock owner at all. I assume that fl->fl_pid would somehow need to be a combination of client identifier and pid on the client to successfully distinguish locks from two processes on two clients. On irc, David pointed me to nlmclnt_locks_init_private(), __nlm_alloc_pid() and mentioned that there's a client pid that lockd could be using instead of the counter. NLM clearly needs to be able to distinguish locks between different client processes itself, so the dlm ought to be able to extend that among nodes. Sorry for the alarm, it looks like I wasn't careful enough. I've done some more testing with various combinations of overlapping locks/unlocks, and it appears to be working after all. NFS does use an incrementing counter for these pids, but it's incremented "intelligently" so that it works as an owner value. So, it appears to be working correctly, and working as intended. (Note that the test case above is a bit special in the sense that no locks from the client continue to exist between each iteration, which causes the "owner" for the same program/pid to change each time.) |