Bug 244343
Summary: | GFS2 - NFS posix lock issues | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||
Component: | kernel | Assignee: | David Teigland <teigland> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | GFS Bugs <gfs-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.1 | CC: | cluster-maint, konradr, potluri, rkenna, staubach, steved, swhiteho | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-12-12 17:29:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 204760 | ||||||
Attachments: |
|
Description
Wendy Cheng
2007-06-15 07:42:28 UTC
Created attachment 157068 [details]
draft patch for lock_nolock hang
For dlm_lock data corruption problem, the faulty path can be demonstrated by the following execution sequences: 1) good run (mostly from RHEL NFS clients) - parent releases the lock before child gets the lock: Test 7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Parent: 7.2 - F_ULOCK [ ffc, 9] PASSED. Child: 7.3 - F_LOCK [ ffc, 9] PASSED. Child: Write child's version of the data and release lock. Parent: Now try to regain lock, parent should block. Child: Wrote 'bebebebeb' to testfile [ 4092, 9 ]. Child: 7.4 - F_ULOCK [ ffc, 9] PASSED. Parent: 7.5 - F_LOCK [ ffc, 9] PASSED. Parent: Check data in file to insure child unblocked. Parent: Read 'bebebebeb' from testfile [ 4092, 9 ]. Parent: 7.6 - COMPARE [ ffc, 9] PASSED. Parent: 7.7 - F_ULOCK [ ffc, 9] PASSED. 2) Bad run (mostly from Solaris client) - child is dispatched before parent releases the lock: Test 7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ 1ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 8188, 7 ]. Parent: Now free child to run, should block on lock. Child: 7.3 - F_LOCK [ 1ffc, 9] PASSED. Child: Write child's version of the data and release lock. Child: Wrote 'bebebebeb' to testfile [ 8188, 9 ]. Child: 7.4 - F_ULOCK [ 1ffc, 9] PASSED. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 8188, 7 ]. Parent: **** Test expected 'aaaa eh', read 'bebebeb'. Parent: 7.1 - COMPARE [ 1ffc, 7] FAILED! Parent: **** Expected equal, returned unequal... Parent: **** Probably implementation error. For this to happen, my guess is that the checking logic inside dev_write is not adequate: spin_lock(&ops_lock); list_for_each_entry(op, &recv_list, list) { if (op->info.fsid == info.fsid && op->info.number == info.number && op->info.owner == info.owner) { list_del_init(&op->list); found = 1; op->done = 1; memcpy(&op->info, &info, sizeof(info)); break; } } spin_unlock(&ops_lock); (will continue tomorrow) We used to have lock_kernel/unlock_kernel around posix_lock_file_wait(): http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/nolock/Attic/main.c?rev=1.4.6.2.2.2&content-type=text/x-cvsweb-markup&cvsroot=cluster It seems this was lost when we moved nolock into the upstream kernel. I don't know if that would help the current problem with nolock or not. The lock_nolock issue is (I think) mostly to do with lockd is single threaded. The draft patch from comment #2 works fine under NFS environment (it passes cthon lock test) but haven't tried it out under local (direct) gfs lock access scenario. Alternatively, gfs2_lock under lock_nolock can follow what other local filesystem (e.g. ext3) does by calling posix_lock_file() (implies gfs2_lock becomes a wrapper around posix_lock_file()). The problem with this is that it will make gfs2 lock_nolock breaks under NFS V4 (since VFS layer doesn't pass the "conflock" to gfs2_lock. Maybe this should be considered a defect from upstream code - it should have passed the "conflock" pointer ?). Another option is simply remove gfs2_lock from file operation method if lock protocol is lock_nolock during mount time (probably the most easy, quick, and fool-proof way since we simply hand-off the issue and let VFS takes total controls of the locking). We can discuss this later after we fully understand what goes wrong with dlm_lock case. Guess it is better to have an overall view, instead of ad-hoc fixes here and there. Looking into dlm_lock problem now. This is one of the issues .. not sure how to fix it though (since it is in user space). I think current RHEL 5 will have this issue too (?) In cluster/group/gfs_controld/plock.c, the do_get() checks conflicts thru is_conflict() as: list_for_each_entry(po, &r->locks, list) { if (po->nodeid == in->nodeid && po->owner == in->owner) continue; This won't work in NFS lockd case since the owner field is actually the "host" structure (representing the very same NFS client). So all lock inquiries from NFS lockd will pass this check. Within the kernel, the lockd passes a "nlmsvc_same_owner" function pointer to vfs's posix_test_lock() call to avoid this issue. But gfs controld is in user space !!! Guess we can hand-copy nlmsvc_smae_owner kernel code into user mode plock.c ? This is really not nice. ok, really looking at the two bugs just digged out at this moment, they exist in current RHEL 5, regardless nfs cluster locking patch in or out. In short, this is now a RHEL 5 bugziila. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Have to do bz #243136 first - without a correct nfs-gfs2 filehandle code, nfs locking over gfs2 is meaningless. Dependent on NFS fixes for posix locking, moving to 5.2 Need to start to work on this issue. Last time we knew test #7 would fail. IBM just sent an email over saying #2 now fails too with 2.6.18-51.el5 kernel. I will give it a try tonight ... On Mon, Oct 15, 2007 at 01:26:33PM -0700, Marc Eshel wrote: > Konrad can you please Wendy the information she need to create the > problem. > Sure. With the 3.6.18-51.el5 kernel we did a tests where the a GFS and GFS2 two-node cluster filesystem was exported via NFS. When connecting a RHEL3 client to it and running the NFS Connecthaton on the mounted share (using NFS v3 udp and tcp) the tests passed successfully. However if the same kernel is used with the locking patches the RHEL3 client NFS Connecthaton tests fail on TEST#2. They pass if you use a RHEL4 and RHEL5 box. The lock patches are available at http://darnok.org/locks I've also tested this with the upstream gfs2 git tree - exporting a GFS2 filesystem via NFS and running NFS Connecthaton would blow it up at TEST#2. (In reply to comment #11) > Need to start to work on this issue. Last time we knew test #7 would fail. > IBM just sent an email over saying #2 now fails too with 2.6.18-51.el5 > kernel. I will give it a try tonight ... What I said is that with 2.6.18-51.el5 and the locking patches it would fail. If I use 2.6.18-51.el5 stock kernel it works fine. If I use the 2.6.23-rc3-gfs tree it would fail too. Borrow a RHEL3 node from Josef. First run hang on test#7 ... look more like an NFS client issue. Then, just realized the machine has a customized kernel: 2.4.21-52.EL.bz249237hugemem. Will re-run. Test #2 passes though... Test #7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Yep.. it fails as the following on 53.el5 with Konrad's NFS locking patches. Without NFS locking patches, it works fine. Start to debug now ... Test #2 - Try to lock the whole file. Parent: 2.0 - F_TLOCK [ 0, ENDING] FAILED! Parent: **** Expected success, returned errno=37... Parent: **** Probably implementation error. ** PARENT pass 1 results: 9/9 pass, 0/0 warn, 1/1 fail (pass/total). ** CHILD pass 1 results: 0/0 pass, 0/0 warn, 0/0 fail (pass/total). lock tests failed Tests failed, leaving /mnt/dhcp136.perf.redhat.com mounted Comment #14 is an operator error (me) - since I dispatched the command too soon (before NFSD grace period ended). On the other hand, now we don't need Solaris clients to find problems. RHEL3 client seems to be able to generate similar issues: 1) With lock_nolock protocol, test #7 hangs as before (see this bz's problem statement and comment #1). Will retest the patch and ask for internal review tomorrow. 2) With lock_dlm protocol, didn't get chances to run test #7 since test suite dies with #2: Test #2 - Try to lock the whole file. Parent: 2.0 - F_TLOCK [ 0, ENDING] PASSED. Child: 2.1 - F_TEST [ 0, 1] FAILED! Child: **** Expected EACCES, returned success... Child: **** Probably implementation error. Server's /var/log/messages file shows: Oct 18 17:01:09 dhcp136 mountd[3886]: authenticated mount request from vm8.gsslab.rdu.redhat.com:917 for /server (/server) Oct 18 17:02:24 dhcp136 gfs_controld[3646]: plock result write err 0 errno 9 Will discuss this with dct first thing tomorrow monring. *** This bug has been marked as a duplicate of 232649 *** Break this out of 232649 since we need (at least) one seperate bugzilla for GFS1 and user mode daemon build that is normally one cycle behind base kernel build. * For GFS1 changes - will check into GFS1 CVS using this bugzilla. * For gfs_controld changes - will do a summary of gfs_controld (tentative) changes on Monday after this weekend's planned test runs - then transfer the issue to Dave. All changes are subject to Dave's review. * GFS1-RHEL5 has checked into CVS using this bugzilla. * GFS(1)-kernel CVS head is broken now (I'm using 2.6.24-rc2). Need to review Fabio's patches and fixes the issues accordingly. * Start lock_dlm testing at this moment. Weekend lock_dlm test runs didn't go well. I was able to pass test #2 (that failed previously) by adding pid checking into few calls in gfs_controld but couldn't pass test #7. There are totally 15 test cases in NFS cthon04 test suites. Look into the failures now. This is the failure scenario: 0. On NFS client machine ..... 1. Create Parent and child pair. 2. Parent obtains a file lock and writes something to it. 3. Child tries to get the same file lock, if obtained, writes something. 4. Parent reads the file (to make sure child doesn't write to it). 5. Parent releases the lock. 6. Child reads the file (to make sure it contains parent's data, followed by child data). The call fails at step (5) where the unlock should succeed but, instead, in GFS2 lock_dlm case, it fails with errno=37 (ENOLCK). The trace shows gfs_controld does successfully release the lock. The issue seems to be in kernel portion of dlm_lock call gdlm_punlock() - look like it gets the child's old entry, instead of its own entry from op->list : Dec 3 04:34:17 dhcp146 kernel: rv=0 Dec 3 04:34:17 dhcp146 last message repeated 5 times Dec 3 04:34:18 dhcp146 gfs_controld[6633]: plock result write err 0 errno 9 Dec 3 04:34:18 dhcp146 last message repeated 99 times Dec 3 04:34:19 dhcp146 gfs_controld[6633]: unlock_internal in->pid=4011 Dec 3 04:34:19 dhcp146 gfs_controld[6633]: unlock_internal po->pid=4011 Dec 3 04:34:19 dhcp146 gfs_controld[6633]: unlock_internal return rv=0 Dec 3 04:34:19 dhcp146 gfs_controld[6633]: plock result write err 0 errno 9 Dec 3 04:34:19 dhcp146 kernel: rv=-11 Dec 3 04:34:19 dhcp146 kernel: gfs2_lm_punlock return -11 Dec 3 04:34:19 dhcp146 kernel: rv=-2 In dev_write() we have this test to find the matching op: op->info.fsid == info.fsid && op->info.number == info.number && op->info.owner == info.owner Do we need to add something like this to distinguish between multiple processes on the same nfs client?: op->info.pid == info.pid That *was* my first attempt (adding pid check into dev_write()) last night and it subsequently hung test #2. Let me get more trace data out to see what goes wrong. The network driver in my RDU server couldn't come up. After a full day of trying, I'm giving it up now. In the middle of moving the tests to salem (smoke cluster) - building the new kernel could take a while though. The following is the issue I saw yesterday... Ater the pid checking is added into dev_write() ... 0. On nfs client ... 1. Parent places a lock request, gdlm_plock returns -EINPROGRESS. 2. Child tries to get a lock (via test lock), lockd comes in as: dhcp146 kernel: [<ffffffff8851ec7a>] :gfs2:gfs2_lock+0x94/0x110 dhcp146 kernel: [<ffffffff88484fde>] :lockd:nlmsvc_testlock+0x22d/0x319 dhcp146 kernel: [<ffffffff88488bf0>] :lockd:nlm4svc_proc_test+0x84/0xce dhcp146 kernel: [<ffffffff883ec741>] :sunrpc:svc_process+0x3da/0x712 dhcp146 kernel: [<ffffffff8848411c>] :lockd:lockd+0x0/0x274 dhcp146 kernel: [<ffffffff884842a6>] :lockd:lockd+0x18a/0x274 ... [snip] The dev_write shows the following at this moment: Dec 4 09:10:04 dhcp146 kernel: dev_write info.pid = 2349 Dec 4 09:10:04 dhcp146 kernel: dev_write in list Dec 4 09:10:04 dhcp146 kernel: pid not match op=2350 info=2349 Dec 4 09:10:04 dhcp146 kernel: gdlm dev_write no op 30002 102d7 The parent pid is 2349, child pid is 2350. So the child tries to lock with op->pid = 2350 (itself) but whoever writes to the dev_write uses info.pid = 2349 .. This is the place where I get lost. Dave ? As the result of comment #23, the test program hangs.... Worked with Dave today and got cthon04 lock test *completed*. Dave has nicely revised the patch into a single lock_dlm kernel patch.. Will hand this issue over to him. Now we just have to make sure the new patch also work well with SAMBA and other direct GFS1/2 plock access. Here's my upstream patch that makes the nfs tests work on gfs: https://www.redhat.com/archives/cluster-devel/2007-December/msg00027.html Konrad, I think it would be easiest if you folded that patch into yours for bz 196318. (In reply to comment #26) > Here's my upstream patch that makes the nfs tests work on gfs: > https://www.redhat.com/archives/cluster-devel/2007-December/msg00027.html > > Konrad, I think it would be easiest if you folded that patch into yours > for bz 196318. > I concur. I will work on that tomorrow (and do the testing tomorrow as well) so far I've tested: GFS2 (lock_dlm) exported via NFS and RHEL3U9, RHEL4U6, RHEL5U1, Sol9 and Sol10 running the NFS connecthaton with success. Next stage will be GFS2 (nolock), GFS, and ext3. Cross post to bugzilla 232649: To support Konrad's bugzilla 196318, I believe GFS team has completed the works from our end. There are two patches that are not posted to rhkernel- list yet: 1. Patch in comment #19 from this bugzilla for gfs2 lock_nolock protocol. 2. Patch in https://bugzilla.redhat.com/show_bug.cgi?id=244343#c27 for gfs2 lock_dlm protocol. sorry.. 1) should be "Patch in comment #19 of bugzilla 232649" Sorry, what does "GFS2 (nolock)" and/or "gfs2 lock_nolock_protocol" mean? When using GFS1/2 as a single node filesystem (not-clustered), it uses lock_nolock protocol. That is, gfs1/2 can be used in non-clustered environment. Since it doesn't have cluster locking overhead, it tends to run faster. I've tested with NFS exported share which underneach has: GFS2(lock_dlm), GFS2(nolock), GFS(nolock), GFS(dlm_lock), ext3 filesystem. That NFS share was then mounted by RHEL3, RHEL4, RHEL5, Sol9, and Sol10 NFS client machine and the NFS Connecthaton test-suite was run. The tests worked without regression. I've also tested in which the NFS client is the patched server with NFS targets exported by: RHEL3, RHEL4, RHEL5, Sol9, Sol10, and NetApp with success. I forgot to mention but the tests were carried out with all of the patch that Wendy listed in comment #29 and comment #30 of this BZ. *** This bug has been marked as a duplicate of 196318 *** |