Bug 239685

Summary: connectathon special/telldir test fails on CIFS against win2k3
Product: Red Hat Enterprise Linux 5 Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: smfrench, ssorce, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-17 18:06:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of telldir test
none
binary capture of telldir test against win2k8
none
patch -- make sure we have resume info before calling CIFSFindNext
none
patch -- make sure we have resume info before calling CIFSFindNext none

Description Jeff Layton 2007-05-10 15:12:23 UTC
The connectathon telldir test fails when we mount a windows 2k3 server via CIFS.
When we run lseek to move to a new position in the directory, we suddenly get no
more entries. The directory offsets coming from windows seem to be somewhat bogus.

Strace attached...

Comment 1 Jeff Layton 2007-05-10 15:12:23 UTC
Created attachment 154477 [details]
strace of telldir test

Comment 2 Jeff Layton 2008-09-26 17:47:31 UTC
Created attachment 317817 [details]
binary capture of telldir test against win2k8

I don't seem to be able to reproduce this against win2k3 easily anymore, but have started seeing it against other servers (win2k8 and vista, in particular). This is a capture taken against win2k8. What it looks like is that we're occasionally doing a FindNext with a bogus starting value (\002).

Most likely that means that the previous FindFirst/FindNext wasn't parsed correctly. When I look at the trace, it looks like most of the Find* responses from the server have a trailing zero word. The one before the bad call didn't have this, so perhaps the code is expecting this for some reason and is falling down when it isn't present...

Comment 3 Jeff Layton 2008-09-30 13:53:10 UTC
I've been looking at this on rawhide and have been seeing this pop occasionally during the memcpy in CIFSFindNext:

BUG: unable to handle kernel paging request at ffff88000e8e89c8

...what's happening is that we're occasionally passing CIFSFindNext a search_info struct with a filename that points to a bogus area. I've also seen some captures from test failures that show filenames that are just repeating 0x6b patterns. That indicates a use after free problem, though this might just be that we're reaching outside of our buffer into other memory that's since been freed.

Comment 4 Jeff Layton 2008-09-30 14:33:22 UTC
The difference seems to be that win2k3 returns entries 150 at a time, and win2k8 returns them 56 at a time. This seems to somehow work around the test somehow, but there are clearly problems with lseek on directories in cifs. This directory has 200 entries. When I seek to position 116 on win2k3, I get this:

lseek(3, 116, SEEK_SET)                 = 116
getdents(3, /* 36 entries */, 16384)    = 864

...116+36 == 152, which I suppose is the 150 entries + the . and .. entries.

On win2k8:

lseek(3, 116, SEEK_SET)                 = 116
getdents(3, /* 0 entries */, 16384)     = 0

...we end up with no entries (and this seems to be a factor in making the telldir test fail on occasion).

Comment 5 Jeff Layton 2008-09-30 14:41:21 UTC
Steve F,
  Any thoughts on this problem? I'm working my way through the readdir code, but haven't quite wrapped my brain around it yet...

Comment 6 Jeff Layton 2008-09-30 15:31:06 UTC
I think I sort of see the problem. The seekdir and telldir offset args don't seem to match up correctly. When I do a readdir on every entry in the directory and then do a telldir they increase by 1 each time:

. 0
.. 1
0 4
1 5
10 6
100 7
101 8
102 9
103 10
104 11
105 12
106 13
107 14
108 15
109 16

...but when we seek into the directory to a particular position, it doesn't seem to move the position correctly.

Comment 7 Jeff Layton 2008-09-30 19:13:50 UTC
Created attachment 318109 [details]
patch -- make sure we have resume info before calling CIFSFindNext

This patch seems to fix the telldir test and also makes the calls on the wire look correct. I've done some testing with it this afternoon and I haven't detected any regressions from it. It just makes sure that we save the resume info from the last search (mostly the last filename from the previous search) so that CIFSFindNext can be called with the correct info.

I'll plan to push this to my upstream git tree and send a pull request to Steve French for it.

Comment 8 Jeff Layton 2008-11-16 12:42:43 UTC
Created attachment 323717 [details]
patch -- make sure we have resume info before calling CIFSFindNext

Fixed patch. Adds a bit of bounds checking when we get the LastNameOffset as well.

I'm going to see if I can get this into 5.3. It turns out that this bug can cause oopses in the right situation when we try to access an uninitialized pointer.

Comment 9 Jeff Layton 2009-02-17 18:06:46 UTC
Patchset went upstream. Closing as duplicate of cifs update BZ.

*** This bug has been marked as a duplicate of bug 465143 ***