Bug 1219399 - NFS interoperability problem: Gluster Striped-Replicated can't read on vmware esxi 5.x NFS client
Summary: NFS interoperability problem: Gluster Striped-Replicated can't read on vmware...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.7.0
Hardware: x86_64
OS: Other
unspecified
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On: 1209298
Blocks: glusterfs-3.7.6
TreeView+ depends on / blocked
 
Reported: 2015-05-07 08:54 UTC by Niels de Vos
Modified: 2016-08-01 22:12 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.7.6
Doc Type: Bug Fix
Doc Text:
Clone Of: 1209298
Environment:
Last Closed: 2015-11-17 05:57:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Niels de Vos 2015-05-07 08:54:52 UTC
+++ This bug was initially created as a clone of Bug #1209298 +++

+++ This bug was initially created as a clone of Bug #1208384 +++

Description of problem:
When a vmware esxi NFS client is connected to any gluster striped-replicated it will produce a Input/output error when reading any file. This problem DOES NOT occur with gluster distributed and/or replicated which indicates a problem with stripe-xlator violating NFS RFC. 

Files can be written but not read without "Input/output error", dd can read the beginning of a file but not to the end of a file.  

gluster nfs logs show no error
esxi shows:
WARNING: NFS: 4031: Short read for object b00f 44 b08baeda 8a56598a 4c474f3a 581cb822 d9428cf3 6f95a4be 2f41ebeb 48d06ac4 3cd88f84334d09b6 c94c76e8 0 0 offset: 0x0 requested: 0x200 read: 0xd


Version-Release number of selected component (if applicable):
ESXI 5.x
Gluster 3.6.2

How reproducible:
every time, like the sun rising each day 

Steps to Reproduce:
1. create a striped-replicated volume
2. mount that volume in Esxi 5.x
3. read any file on that volume

Actual results:
"Input/output error"

Expected results:
contents or download of file

Additional info:

tcpdump on esxi during reading of test.file from striped-replicated NFS, cat test.file
returning a "cat: read error: Input/output error"

-rw-r--r--    1 root     root            13 Apr  2 01:18 test.file


~ # tcpdump-uw -i vmk0 -s 0 -vv tcp port 2049
tcpdump-uw: listening on vmk0, link-type EN10MB (Ethernet), capture size 65535 bytes
06:36:56.748208 IP (tos 0x0, ttl 64, id 13610, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe1d (incorrect -> 0x7c69), seq 3632025388, ack 670242925, win 512, length 0
06:36:56.756857 IP (tos 0x0, ttl 251, id 24385, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.862: Flags [.], cksum 0x3494 (correct), seq 1, ack 1, win 407, options [nop,nop,TS val 474610556 ecr 93647307], length 0
06:36:59.597853 IP (tos 0x0, ttl 64, id 13616, offset 0, flags [DF], proto TCP (6), length 192)
    ESXIHOST.1465963643 > GLUSTER.nfs: 136 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412F000000000000000000000000 "test.file"
06:36:59.609477 IP (tos 0x0, ttl 251, id 24386, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.1465963643: reply ok 244 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 post dattr: DIR 755 ids 0/0 sz 109 nlink 6 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 1 a/m/ctime 1427888541.862000000 1427937503.629000000 1427937503.629000000
06:36:59.609544 IP (tos 0x0, ttl 64, id 13618, offset 0, flags [DF], proto TCP (6), length 192)
    ESXIHOST.1465963644 > GLUSTER.nfs: 136 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412F000000000000000000000000 "test.file"
06:36:59.621880 IP (tos 0x0, ttl 251, id 24387, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.1465963644: reply ok 244 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 post dattr: DIR 755 ids 0/0 sz 109 nlink 6 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 1 a/m/ctime 1427888541.862000000 1427937503.629000000 1427937503.629000000
06:36:59.621946 IP (tos 0x0, ttl 64, id 13622, offset 0, flags [DF], proto TCP (6), length 180)
    ESXIHOST.1465963645 > GLUSTER.nfs: 124 access fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C 0001
06:36:59.631867 IP (tos 0x0, ttl 251, id 24388, offset 0, flags [DF], proto TCP (6), length 92)
    GLUSTER.nfs > ESXIHOST.1465963645: reply ok 36 access attr: c 0001
06:36:59.632012 IP (tos 0x0, ttl 64, id 13629, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.1465963646 > GLUSTER.nfs: 132 read fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C 512 bytes @ 0
06:36:59.644359 IP (tos 0x0, ttl 251, id 24389, offset 0, flags [DF], proto TCP (6), length 200)
    GLUSTER.nfs > ESXIHOST.1465963646: reply ok 144 read REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 13 bytes
06:36:59.748347 IP (tos 0x0, ttl 64, id 13636, offset 0, flags [DF], proto TCP (6), length 52)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe29 (incorrect -> 0x1f0c), seq 545, ack 685, win 512, options [nop,nop,TS val 93648599 ecr 474613443], length 0
06:37:04.650044 IP (tos 0x0, ttl 64, id 13674, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe1d (incorrect -> 0x779d), seq 544, ack 685, win 512, length 0
06:37:04.659316 IP (tos 0x0, ttl 251, id 24390, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.862: Flags [.], cksum 0x0bde (correct), seq 685, ack 545, win 407, options [nop,nop,TS val 474618458 ecr 93648599], length 0

tcpdump on esxi during reading of test.file from distributed vol NFS, cat test.file, results return file. 

 tcpdump-uw -i vmk0 -s 0 -vv tcp port 2049
tcpdump-uw: listening on vmk0, link-type EN10MB (Ethernet), capture size 65535 bytes
06:45:54.236546 IP (tos 0x0, ttl 64, id 32631, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c1c (incorrect -> 0xa42e), seq 2841430516, ack 3093939941, win 512, length 0
06:45:54.239278 IP (tos 0x0, ttl 64, id 12249, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.771: Flags [.], cksum 0xa3f8 (correct), seq 1, ack 1, win 323, options [nop,nop,TS val 445417858 ecr 213325589], length 0
06:45:56.666570 IP (tos 0x0, ttl 64, id 32640, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.952537526 > GLUSTER.nfs: 132 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9E000000000000000000000000 "test.file"
06:45:56.668774 IP (tos 0x0, ttl 64, id 12250, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.952537526: reply ok 244 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 post dattr: DIR 755 ids 0/0 sz 39 nlink 3 rdev 0/0 fsid 9effe30ab3b784a4 fileid 1 a/m/ctime 1427911388.674000000 1427911311.168000000 1427911312.634000000
06:45:56.668845 IP (tos 0x0, ttl 64, id 32643, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.952537527 > GLUSTER.nfs: 132 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9E000000000000000000000000 "test.file"
06:45:56.670064 IP (tos 0x0, ttl 64, id 12251, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.952537527: reply ok 244 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 post dattr: DIR 755 ids 0/0 sz 39 nlink 3 rdev 0/0 fsid 9effe30ab3b784a4 fileid 1 a/m/ctime 1427911388.674000000 1427911311.168000000 1427911312.634000000
06:45:56.670162 IP (tos 0x0, ttl 64, id 32647, offset 0, flags [DF], proto TCP (6), length 176)
    ESXIHOST.952537528 > GLUSTER.nfs: 120 access fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E 0001
06:45:56.670905 IP (tos 0x0, ttl 64, id 12252, offset 0, flags [DF], proto TCP (6), length 92)
    GLUSTER.nfs > ESXIHOST.952537528: reply ok 36 access attr: c 0001
06:45:56.670971 IP (tos 0x0, ttl 64, id 32648, offset 0, flags [DF], proto TCP (6), length 184)
    ESXIHOST.952537529 > GLUSTER.nfs: 128 read fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E 512 bytes @ 0
06:45:56.673450 IP (tos 0x0, ttl 64, id 12253, offset 0, flags [DF], proto TCP (6), length 200)
    GLUSTER.nfs > ESXIHOST.952537529: reply ok 144 read REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 14 bytes EOF
06:45:56.776511 IP (tos 0x0, ttl 64, id 32652, offset 0, flags [DF], proto TCP (6), length 52)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c28 (incorrect -> 0x9213), seq 529, ack 685, win 512, options [nop,nop,TS val 213326333 ecr 445420294], length 0
06:46:01.676392 IP (tos 0x0, ttl 64, id 32658, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c1c (incorrect -> 0x9f72), seq 528, ack 685, win 512, length 0
06:46:01.676819 IP (tos 0x0, ttl 64, id 12254, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.771: Flags [.], cksum 0x7f44 (correct), seq 685, ack 529, win 323, options [nop,nop,TS val 445425298 ecr 213326333], length 0

--- Additional comment from RHEL Product and Program Management on 2015-04-02 09:12:45 CEST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from  on 2015-04-03 10:38:22 CEST ---



--- Additional comment from  on 2015-04-03 10:39:34 CEST ---



--- Additional comment from Niels de Vos on 2015-04-06 21:00:47 CEST ---

Summary, more details in the attachment.

Difference in the return of the read:
- non-working: return all data (13 bytes of 512 requested), EOF=No
- working: return all data (11 bytes of 512 requested), EOF=yes

This suggests that Striped-Replicated does not set the EOF flag when appropriate. A short read seems to be marked as an error on ESXi.

The Linux NFS-client requests the exact number of bytes it wants to read. ESXi requests 512 bytes, even if the file is smaller.
-> need an other NFS-client for testing... nfsshell?

When a short-read is done, the Stripe xlator should ret=<size> and errno=ENOENT to make the NFS-server set EOF.

--- Additional comment from Anand Avati on 2015-04-06 21:02:39 CEST ---

REVIEW: http://review.gluster.org/10142 (stripe: set ENOENT when a READ hits EOF) posted (#1) for review on master by Niels de Vos (ndevos)

--- Additional comment from Niels de Vos on 2015-04-06 21:10:30 CEST ---

Tested the behaviour with nfsshell (which also requests more bytes than the file is long). The missing EOF is seen in tshark as well, with the patch the EOF is set.

I wonder if ESXi can use striped volumes too. Please let me know if you can test the patch and build packages yourself. I can build RPMs (tell me the distro and version) if you need assistance with that.

--- Additional comment from Anand Avati on 2015-04-07 03:52:59 CEST ---

REVIEW: http://review.gluster.org/10142 (stripe: set ENOENT when a READ hits EOF) posted (#2) for review on master by Niels de Vos (ndevos)

--- Additional comment from Anand Avati on 2015-05-03 23:12:10 CEST ---

REVIEW: http://review.gluster.org/10142 (stripe: set ENOENT when a READ hits EOF) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 1 Niels de Vos 2015-05-07 08:55:51 UTC
This needs a backport of http://review.gluster.org/10142, see the backporting guidelines for the process:

    http://www.gluster.org/community/documentation/index.php/Backport_Guidelines

Comment 2 Niels de Vos 2015-06-02 08:20:17 UTC
The required changes to fix this bug have not made it into glusterfs-3.7.1. This bug is now getting tracked for glusterfs-3.7.2.

Comment 3 Niels de Vos 2015-06-20 10:08:10 UTC
Unfortunately glusterfs-3.7.2 did not contain a code change that was associated with this bug report. This bug is now proposed to be a blocker for glusterfs-3.7.3.

Comment 4 Kaushal 2015-07-30 13:17:54 UTC
This bug could not be fixed in time for glusterfs-3.7.3. This is now being tracked for being fixed in glusterfs-3.7.4.

Comment 5 Kaushal 2015-10-28 12:28:45 UTC
This bug could not be fixed in time for glusterfs-3.7.4 or glusterfs-3.7.5. This is now being tracked for being fixed in glusterfs-3.7.6.

Comment 6 Vijay Bellur 2015-10-30 08:52:02 UTC
REVIEW: http://review.gluster.org/12470 (stripe: set ENOENT when a READ hits EOF) posted (#1) for review on release-3.7 by Niels de Vos (ndevos)

Comment 7 Vijay Bellur 2015-11-01 14:45:53 UTC
COMMIT: http://review.gluster.org/12470 committed in release-3.7 by Vijay Bellur (vbellur) 
------
commit d8060a4cbf8eda08687c165662704d7f27aa8e08
Author: Niels de Vos <ndevos>
Date:   Fri Oct 30 09:50:36 2015 +0100

    stripe: set ENOENT when a READ hits EOF
    
    The NFS-server sets EOF only in the READ reply when op_errno is set to
    ENOENT.  Xlators are expected to set op_errno to ENOENT when EOF is
    reached, op_ret will contain the number of bytes returned by the READ.
    
    When an NFS-client (like VMware ESXi) do a READ that exceeds the size of
    the file, errno should be set to EOF and the return value contains the
    number of bytes that are read (from the requested offset, until the end
    of the file). Not setting EOF on a correct short READ, can result in
    errors on the NFS-client.
    
    This is not an issue with the Linux NFS-client (or VFS). Linux is smart
    enough to not try to read more bytes than the file contains.
    
    Cherry picked from commit 2bd2ccf0fdd5390c1c07cb228048f93e5e516512:
    > BUG: 1209298
    > Change-Id: Ib15538744908a6001d729288d3e18a432d19050b
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: http://review.gluster.org/10142
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Kaleb KEITHLEY <kkeithle>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    
    BUG: 1219399
    Change-Id: Ib15538744908a6001d729288d3e18a432d19050b
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/12470
    Reviewed-by: jiffin tony Thottan <jthottan>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Raghavendra Talur 2015-11-17 05:57:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.