Bug 239685 - connectathon special/telldir test fails on CIFS against win2k3
Summary: connectathon special/telldir test fails on CIFS against win2k3
Keywords:
Status: CLOSED DUPLICATE of bug 465143
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-10 15:12 UTC by Jeff Layton
Modified: 2014-06-18 07:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-17 18:06:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace of telldir test (53.03 KB, text/plain)
2007-05-10 15:12 UTC, Jeff Layton
no flags Details
binary capture of telldir test against win2k8 (27.18 KB, application/octet-stream)
2008-09-26 17:47 UTC, Jeff Layton
no flags Details
patch -- make sure we have resume info before calling CIFSFindNext (7.89 KB, patch)
2008-09-30 19:13 UTC, Jeff Layton
no flags Details | Diff
patch -- make sure we have resume info before calling CIFSFindNext (7.89 KB, patch)
2008-11-16 12:42 UTC, Jeff Layton
no flags Details | Diff

Description Jeff Layton 2007-05-10 15:12:23 UTC
The connectathon telldir test fails when we mount a windows 2k3 server via CIFS.
When we run lseek to move to a new position in the directory, we suddenly get no
more entries. The directory offsets coming from windows seem to be somewhat bogus.

Strace attached...

Comment 1 Jeff Layton 2007-05-10 15:12:23 UTC
Created attachment 154477 [details]
strace of telldir test

Comment 2 Jeff Layton 2008-09-26 17:47:31 UTC
Created attachment 317817 [details]
binary capture of telldir test against win2k8

I don't seem to be able to reproduce this against win2k3 easily anymore, but have started seeing it against other servers (win2k8 and vista, in particular). This is a capture taken against win2k8. What it looks like is that we're occasionally doing a FindNext with a bogus starting value (\002).

Most likely that means that the previous FindFirst/FindNext wasn't parsed correctly. When I look at the trace, it looks like most of the Find* responses from the server have a trailing zero word. The one before the bad call didn't have this, so perhaps the code is expecting this for some reason and is falling down when it isn't present...

Comment 3 Jeff Layton 2008-09-30 13:53:10 UTC
I've been looking at this on rawhide and have been seeing this pop occasionally during the memcpy in CIFSFindNext:

BUG: unable to handle kernel paging request at ffff88000e8e89c8

...what's happening is that we're occasionally passing CIFSFindNext a search_info struct with a filename that points to a bogus area. I've also seen some captures from test failures that show filenames that are just repeating 0x6b patterns. That indicates a use after free problem, though this might just be that we're reaching outside of our buffer into other memory that's since been freed.

Comment 4 Jeff Layton 2008-09-30 14:33:22 UTC
The difference seems to be that win2k3 returns entries 150 at a time, and win2k8 returns them 56 at a time. This seems to somehow work around the test somehow, but there are clearly problems with lseek on directories in cifs. This directory has 200 entries. When I seek to position 116 on win2k3, I get this:

lseek(3, 116, SEEK_SET)                 = 116
getdents(3, /* 36 entries */, 16384)    = 864

...116+36 == 152, which I suppose is the 150 entries + the . and .. entries.

On win2k8:

lseek(3, 116, SEEK_SET)                 = 116
getdents(3, /* 0 entries */, 16384)     = 0

...we end up with no entries (and this seems to be a factor in making the telldir test fail on occasion).

Comment 5 Jeff Layton 2008-09-30 14:41:21 UTC
Steve F,
  Any thoughts on this problem? I'm working my way through the readdir code, but haven't quite wrapped my brain around it yet...

Comment 6 Jeff Layton 2008-09-30 15:31:06 UTC
I think I sort of see the problem. The seekdir and telldir offset args don't seem to match up correctly. When I do a readdir on every entry in the directory and then do a telldir they increase by 1 each time:

. 0
.. 1
0 4
1 5
10 6
100 7
101 8
102 9
103 10
104 11
105 12
106 13
107 14
108 15
109 16

...but when we seek into the directory to a particular position, it doesn't seem to move the position correctly.

Comment 7 Jeff Layton 2008-09-30 19:13:50 UTC
Created attachment 318109 [details]
patch -- make sure we have resume info before calling CIFSFindNext

This patch seems to fix the telldir test and also makes the calls on the wire look correct. I've done some testing with it this afternoon and I haven't detected any regressions from it. It just makes sure that we save the resume info from the last search (mostly the last filename from the previous search) so that CIFSFindNext can be called with the correct info.

I'll plan to push this to my upstream git tree and send a pull request to Steve French for it.

Comment 8 Jeff Layton 2008-11-16 12:42:43 UTC
Created attachment 323717 [details]
patch -- make sure we have resume info before calling CIFSFindNext

Fixed patch. Adds a bit of bounds checking when we get the LastNameOffset as well.

I'm going to see if I can get this into 5.3. It turns out that this bug can cause oopses in the right situation when we try to access an uninitialized pointer.

Comment 9 Jeff Layton 2009-02-17 18:06:46 UTC
Patchset went upstream. Closing as duplicate of cifs update BZ.

*** This bug has been marked as a duplicate of bug 465143 ***


Note You need to log in before you can comment on or make changes to this bug.