RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1028899 - [NFSV3] read randomly return EOF when using O_EXCL against Windows NFS Server
Summary: [NFSV3] read randomly return EOF when using O_EXCL against Windows NFS Server
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.5
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jeff Layton
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-11 07:33 UTC by Chao Ye
Modified: 2013-11-20 15:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-20 15:26:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tshark cap (260.00 KB, application/vnd.tcpdump.pcap)
2013-11-12 05:14 UTC, Chao Ye
no flags Details

Description Chao Ye 2013-11-11 07:33:54 UTC
Description of problem:
Only observe this on Windows Server with RHEL NFSV3 client when doing IO using O_EXCL, NFS4.1 don't have such issue:
=================================================================
# grep FAIL /var/tmp/beakerlib-6lNWAG9/journal.txt | sort | uniq 
:: [   FAIL   ] :: RESULT: ClientNFSV3/Clientcombo
:: [   FAIL   ] :: RESULT: /kernel/filesystems/nfs/multihost/combo
:: [   FAIL   ] :: Running: dt bs=1024 flags=excl oflags=append (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=1536 flags=excl oflags=append (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=1536 flags=excl oflags=dsync (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=1536 flags=excl oflags=trunc (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3072 flags=excl oflags=append (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3072 flags=excl oflags=dsync (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3072 flags=excl oflags=trunc (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3584 flags=excl oflags=append (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3584 flags=excl oflags=dsync (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=3584 flags=excl oflags=trunc (Expected 0, got 254)
:: [   FAIL   ] :: Running: dt bs=512 flags=excl oflags=append (Expected 0, got 254)

254 is End of file code.

I tested different bs, 512/1024/1536/2048/2560/3072/3584/4096, but only 512/1024/1536/3072/3584 have such issue.

Version-Release number of selected component (if applicable):
2.6.32-428.el6.x86_64
nfs-utils-1.2.3-39.el6.x86_64

How reproducible:
About 30%

Steps to Reproduce:
1. mount -o vers=3 10.66.86.144:/NFS /mnt/testarea/client
2. wget http://porkchop.devel.redhat.com/qa/rhts/lookaside/dt.d.tar.gz
3. tar xxf dt.d.tar.gz
4. cd dt.d
5. make 
6. strace -f ./dt bs=3072 limit=64k of=/mnt/testarea/client/test.img flags=excl oflags=append dispose=delete enable=fsync,debug

Actual results:
read() return 0

Expected results:
No such issue.

Additional info:
Strace output:
=================================================================
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
write(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1536) = 1536
......
write(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1536) = 1536
write(2, "dt: Record #43, Writing a partia"..., 58) = 58
write(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1024) = 1024
write(2, "dt: Flushing data to file '/mnt/"..., 76) = 76
fsync(3)                                = 0
<===============================================fsync pass.

Failed strace output:
-----------------------------------------------------------------------------------------------
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1536) = 1536
......
read(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1536) = 1536
write(2, "dt: Record #43, Reading a partia"..., 58) = 58
read(3, "", 1024)                       = 0
<===============================================get 0 here.
write(2, "dt: End of file detected, count "..., 69) = 69

Pass strace output:
-----------------------------------------------------------------------------------------------
read(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1536) = 1536
write(2, "dt: Record #43, Reading a partia"..., 58) = 58
read(3, "9\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\30399\234\3039"..., 1024) = 1024
<===============================================get 1024 here.


Both pass/fail log are using bs=1536.

Comment 2 Chao Ye 2013-11-12 02:37:06 UTC
(In reply to Jeff Layton from comment #1)
> Do you have any captures from this? Does this test fail vs. other servers or
> just windows?

No captures yet, will do. Only fail on windows, test on RHEL/NetApp server pass, and also only windows nfsv3, v4.1 is cool.

Chao

Comment 3 Chao Ye 2013-11-12 05:14:28 UTC
Created attachment 822760 [details]
tshark cap

Comment 4 Jeff Layton 2013-11-12 14:22:18 UTC
It looks like this is a server bug.

There's this WRITE in frames 450 and 453:

450	2.650309000	10.66.86.24	10.66.86.144	NFS	V3 WRITE Call (Reply In 453), FH: 0x6f616b6d Offset: 64512 Len: 1024 FILE_SYNC

453	2.656404000	10.66.86.144	10.66.86.24	NFS	V3 WRITE Reply (Call In 450) Len: 1024 FILE_SYNC

...the wcc data in the reply says that the file increased in size from 64512 to 65536 at this point.

However just after that, in frames 456/457 there is a GETATTR call and reply, that report the size of the file to be 64512. All of the later attributes that I can see from the server report the size as 64512.

So either something raced in and truncated that file back to 64512 bytes after the write, or there's a server bug of some sort. I'd assume that you're running this on files that only this client would have access to, so my suspicion would be a server bug.

Any chance we could make this bug public? The Microsoft folks would probably like to see this...

Comment 8 Jeff Layton 2013-11-15 13:32:52 UTC
I directed this to Tom Talpey at MS and he asked:

"What's the version of the Windows server in this? I see 4.1 mentioned, so I am guessing Windows 8 or later? You can run "winver.exe" to get the full build string."

Chao, can you run that and let us know?

Comment 9 Chao Ye 2013-11-18 02:21:23 UTC
(In reply to Jeff Layton from comment #8)
> I directed this to Tom Talpey at MS and he asked:
> 
> "What's the version of the Windows server in this? I see 4.1 mentioned, so I
> am guessing Windows 8 or later? You can run "winver.exe" to get the full
> build string."
> 
> Chao, can you run that and let us know?

It's WindowsServer 2012 R2.

Chao

Comment 10 Jeff Layton 2013-11-20 15:26:26 UTC
We're now in communication with the MS devs who are responsible for this code. Closing this as NOTABUG since it's a server side bug.


Note You need to log in before you can comment on or make changes to this bug.