Bug 239690 - connectathon lock/test10 fails on CIFS w/o unix extensions
Summary: connectathon lock/test10 fails on CIFS w/o unix extensions
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-10 15:25 UTC by Jeff Layton
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-24 18:39:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Layton 2007-05-10 15:25:50 UTC
Lock test10 in the connectathon suite fails if unix extensions are disabled.
I've not completely investigated the cause as of yet, but here are the results:


Test #10 - Make sure a locked region is split properly.
        Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
        Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
        Child:  10.2  - F_TEST  [               0,               1] PASSED.
        Child:  10.3  - F_TEST  [               2,               1] PASSED.
        Child:  10.4  - F_TEST  [               3,          ENDING] PASSED.
        Child:  10.5  - F_TEST  [               1,               1] FAILED!
        Child:  **** Expected success, returned EACCES...
        Child:  **** Probably implementation error.

**  CHILD pass 1 results: 3/3 pass, 0/0 warn, 1/1 fail (pass/total).
        Parent: Child died

** PARENT pass 1 results: 2/2 pass, 0/0 warn, 0/0 fail (pass/total).

Comment 1 Jeff Layton 2007-07-19 16:48:13 UTC
strace from test:

[pid  1923] open("./lockfile1922", O_RDWR|O_CREAT|O_SYNC, 0666) = 9
[pid  1923] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1,
pid=1923}) = 0
[pid  1923] write(1, "\tChild:  10.2  - F_TEST  [      "..., 69 Child:  10.2  -
F_TEST  [               0,               1] PASSED.
) = 69
[pid  1923] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=2, len=1,
pid=1923}) = 0
[pid  1923] write(1, "\tChild:  10.3  - F_TEST  [      "..., 69 Child:  10.3  -
F_TEST  [               2,               1] PASSED.
) = 69
[pid  1923] fcntl(9, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=3, len=0,
pid=1923}) = 0
[pid  1923] write(1, "\tChild:  10.4  - F_TEST  [      "..., 69 Child:  10.4  -
F_TEST  [               3,          ENDING] PASSED.
) = 69
[pid  1923] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=1, len=1,
pid=1923}) = 0
[pid  1923] write(1, "\tChild:  10.5  - F_TEST  [      "..., 69 Child:  10.5  -
F_TEST  [               1,               1] FAILED!
) = 69


Comment 2 Jeff Layton 2007-07-19 17:05:19 UTC
...easier to read version:

Test #10 - Make sure a locked region is split properly.
[pid  1934] open("./lockfile1934", O_RDWR|O_CREAT|O_SYNC, 0666) = 9
[pid  1934] fcntl(9, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=3}) = 0
        Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
[pid  1934] fcntl(9, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=1, len=1}) = 0
        Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
[pid  1935] open("./lockfile1934", O_RDWR|O_CREAT|O_SYNC, 0666) = 9
[pid  1935] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1,
pid=1935}) = 0
        Child:  10.2  - F_TEST  [               0,               1] PASSED.
[pid  1935] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=2, len=1,
pid=1935}) = 0
        Child:  10.3  - F_TEST  [               2,               1] PASSED.
[pid  1935] fcntl(9, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=3, len=0,
pid=1935}) = 0
        Child:  10.4  - F_TEST  [               3,          ENDING] PASSED.
[pid  1935] fcntl(9, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=1, len=1,
pid=1935}) = 0
        Child:  10.5  - F_TEST  [               1,               1] FAILED!
        Child:  **** Expected success, returned EACCES...
        Child:  **** Probably implementation error.
[pid  1935] unlink("./lockfile1934")    = 0


It looks like the error message is incorrect. fcntl returned 0 on that last call
from what I can tell. Maybe glibc is overriding that?


Comment 3 Jeff Layton 2007-07-19 17:17:20 UTC
When I run this program on a cifs filesystem with unix extensions enabled, I see
this:

[pid  2211] fcntl(9, F_GETLK, {type=F_UNLCK, whence=SEEK_SET, start=1, len=1,
pid=2211}) = 0
        Child:  10.5  - F_TEST  [               1,               1] PASSED.


...so it's setting F_UNLCK here rather than F_WRLCK. So this seems to be a
userspace problem rather than a kernel one. The program uses lockf(), so this
would seem to be a bug in the cthon04 test program or in glibc.


Comment 4 Jeff Layton 2007-07-19 18:00:58 UTC
My mistake -- this does seem to be a kernel problem:

       F_GETLK
              On  input  to  this call, lock describes a lock we would like to
              place on the file.  If the lock could be  placed,  fcntl()  does
              not  actually  place it, but returns F_UNLCK in the l_type field
              of lock and leaves the other fields of the structure  unchanged.
              If  one or more incompatible locks would prevent this lock being
              placed, then fcntl() returns details about one of these locks in
              the l_type, l_whence, l_start, and l_len fields of lock and sets
              l_pid to be the PID of the process holding that lock.

...so when unix extensions are disabled, it doesn't seem to l_type to F_UNLCK.
The question is, is this because the protocol forbids this for some reason or
because it's an oversight?



Comment 5 Jeff Layton 2007-07-19 19:05:48 UTC
Packet captures seem to show the call and respose from this going out on the wire:

        Parent: 10.0  - F_TLOCK [               0,               3] PASSED.

...but then I never see a call for this go out:

        Parent: 10.1  - F_ULOCK [               1,               1] PASSED.


Comment 6 Jeff Layton 2007-07-19 20:28:45 UTC
Wrote a simpler program to test this:

fcntl(3, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=3}) = 0
fcntl(3, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=1, len=1}) = 0

The second fcntl call does not seem to generate any traffic. In cifs_lock, it
looks like we fall into a codepath with this comment:

  } else if (numUnlock) {
      /* For each stored lock that this unlock overlaps
         completely, unlock it. */

...since this unlock doesn't overlap completely, I think this ends up being a
noop. I'm guessing the SMB protocol doesn't handle non-aligned lock/unlock
requests correctly, but am looking for confirmation.



Comment 7 Jeff Layton 2007-07-19 20:53:47 UTC
Jeremy Allison responded to my query about this:

-----[snip]-----
Limitation of CIFS locking without POSIX extensions. Non-POSIX
CIFS locks must exactly match the regions locked/unlocked - if
the regions don't lock the server has to return an error.

You have to have POSIX extensions turned on for CIFS locking
to work correctly from a POSIX client.
-----[snip]-----

Closing this as WONTFIX since it's a protocol limitation.


Comment 8 Jeff Layton 2007-07-20 15:24:38 UTC
Reopening bug for now. This may be fixable. Mail from Steve F:

----[snip]----

Jeremy has done some good talks on posix byte range locking vs. Windows
locking but what you are seeing is that overlapping posix locks are merged 
and a lock can be split by an unlock of part of the locked range, while 
Windows locks stack on top of each other, and the unlock is expected to
match the lock.  Perhaps a way to handle

lock bytes 1 through 3
unlock byte 2
would be to do

cifs lock bytes 1 through 3
cifs lock 1, cifs lock 3
unlock bytes 1 through 3

It would be fairly easy to emulate in lines 768 through 781 (current
mainline cifs) of fs/cifs/file.c   The current algorith unlocks the
stored (windows/cifs locks) that are within the lock request.

Have you tried a test such as the following to windows?

posix lock bytes 1 through 4 (which causes windows/cifs lock of bytes 1 through 4)
posix unlock bytes 1 and 2
posix unlock bytes 3 and 4  (which logically should cause the cifs lock to be
removed)
then Try to write from another client to any of the presumably now unlocked bytes.


Comment 9 Jeff Layton 2007-08-24 18:39:30 UTC
This isn't fixable after all. Jeremy Allison explained...

You can only stack read locks on top of write locks with windows. So if it's a
write lock then you can only delete the lock and race to get the pieces. There's
no way to delete part of the lock atomically.



Note You need to log in before you can comment on or make changes to this bug.