Bug 1208384 - NFS interoperability problem: Gluster Striped-Replicated can't read on vmware esxi 5.x NFS client
Summary: NFS interoperability problem: Gluster Striped-Replicated can't read on vmware...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.6.2
Hardware: x86_64
OS: Other
unspecified
high
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
: 1347657 (view as bug list)
Depends On: 1209298
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-02 06:54 UTC by darwin
Modified: 2020-01-02 09:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1209298 (view as bug list)
Environment:
Last Closed: 2016-07-31 21:20:57 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
tcpdump of esxi read with Input/output error (2.37 KB, application/octet-stream)
2015-04-03 08:38 UTC, darwin
no flags Details
tcpdump of esxi read with no error (2.07 KB, application/octet-stream)
2015-04-03 08:39 UTC, darwin
no flags Details
tcpdump/tshark analysis (4.51 KB, text/plain)
2015-04-06 19:00 UTC, Niels de Vos
no flags Details

Description darwin 2015-04-02 06:54:50 UTC
Description of problem:
When a vmware esxi NFS client is connected to any gluster striped-replicated it will produce a Input/output error when reading any file. This problem DOES NOT occur with gluster distributed and/or replicated which indicates a problem with stripe-xlator violating NFS RFC. 

Files can be written but not read without "Input/output error", dd can read the beginning of a file but not to the end of a file.  

gluster nfs logs show no error
esxi shows:
WARNING: NFS: 4031: Short read for object b00f 44 b08baeda 8a56598a 4c474f3a 581cb822 d9428cf3 6f95a4be 2f41ebeb 48d06ac4 3cd88f84334d09b6 c94c76e8 0 0 offset: 0x0 requested: 0x200 read: 0xd


Version-Release number of selected component (if applicable):
ESXI 5.x
Gluster 3.6.2

How reproducible:
every time, like the sun rising each day 

Steps to Reproduce:
1. create a striped-replicated volume
2. mount that volume in Esxi 5.x
3. read any file on that volume

Actual results:
"Input/output error"

Expected results:
contents or download of file

Additional info:

tcpdump on esxi during reading of test.file from striped-replicated NFS, cat test.file
returning a "cat: read error: Input/output error"

-rw-r--r--    1 root     root            13 Apr  2 01:18 test.file


~ # tcpdump-uw -i vmk0 -s 0 -vv tcp port 2049
tcpdump-uw: listening on vmk0, link-type EN10MB (Ethernet), capture size 65535 bytes
06:36:56.748208 IP (tos 0x0, ttl 64, id 13610, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe1d (incorrect -> 0x7c69), seq 3632025388, ack 670242925, win 512, length 0
06:36:56.756857 IP (tos 0x0, ttl 251, id 24385, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.862: Flags [.], cksum 0x3494 (correct), seq 1, ack 1, win 407, options [nop,nop,TS val 474610556 ecr 93647307], length 0
06:36:59.597853 IP (tos 0x0, ttl 64, id 13616, offset 0, flags [DF], proto TCP (6), length 192)
    ESXIHOST.1465963643 > GLUSTER.nfs: 136 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412F000000000000000000000000 "test.file"
06:36:59.609477 IP (tos 0x0, ttl 251, id 24386, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.1465963643: reply ok 244 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 post dattr: DIR 755 ids 0/0 sz 109 nlink 6 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 1 a/m/ctime 1427888541.862000000 1427937503.629000000 1427937503.629000000
06:36:59.609544 IP (tos 0x0, ttl 64, id 13618, offset 0, flags [DF], proto TCP (6), length 192)
    ESXIHOST.1465963644 > GLUSTER.nfs: 136 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412F000000000000000000000000 "test.file"
06:36:59.621880 IP (tos 0x0, ttl 251, id 24387, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.1465963644: reply ok 244 lookup fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 post dattr: DIR 755 ids 0/0 sz 109 nlink 6 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 1 a/m/ctime 1427888541.862000000 1427937503.629000000 1427937503.629000000
06:36:59.621946 IP (tos 0x0, ttl 64, id 13622, offset 0, flags [DF], proto TCP (6), length 180)
    ESXIHOST.1465963645 > GLUSTER.nfs: 124 access fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C 0001
06:36:59.631867 IP (tos 0x0, ttl 251, id 24388, offset 0, flags [DF], proto TCP (6), length 92)
    GLUSTER.nfs > ESXIHOST.1465963645: reply ok 36 access attr: c 0001
06:36:59.632012 IP (tos 0x0, ttl 64, id 13629, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.1465963646 > GLUSTER.nfs: 132 read fh Unknown/3A4F474C22B81C58F38C42D9BEA4956FEBEB412FC46AD048B6094D33848FD83C 512 bytes @ 0
06:36:59.644359 IP (tos 0x0, ttl 251, id 24389, offset 0, flags [DF], proto TCP (6), length 200)
    GLUSTER.nfs > ESXIHOST.1465963646: reply ok 144 read REG 644 ids 0/0 sz 13 nlink 1 rdev 0/0 fsid 2f41ebeb6f95a4be fileid 848fd83ce8764cc9 a/m/ctime 1427937609.815000000 1427937503.69000000 1427937504.320000000 13 bytes
06:36:59.748347 IP (tos 0x0, ttl 64, id 13636, offset 0, flags [DF], proto TCP (6), length 52)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe29 (incorrect -> 0x1f0c), seq 545, ack 685, win 512, options [nop,nop,TS val 93648599 ecr 474613443], length 0
06:37:04.650044 IP (tos 0x0, ttl 64, id 13674, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.862 > GLUSTER.nfs: Flags [.], cksum 0xbe1d (incorrect -> 0x779d), seq 544, ack 685, win 512, length 0
06:37:04.659316 IP (tos 0x0, ttl 251, id 24390, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.862: Flags [.], cksum 0x0bde (correct), seq 685, ack 545, win 407, options [nop,nop,TS val 474618458 ecr 93648599], length 0

tcpdump on esxi during reading of test.file from distributed vol NFS, cat test.file, results return file. 

 tcpdump-uw -i vmk0 -s 0 -vv tcp port 2049
tcpdump-uw: listening on vmk0, link-type EN10MB (Ethernet), capture size 65535 bytes
06:45:54.236546 IP (tos 0x0, ttl 64, id 32631, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c1c (incorrect -> 0xa42e), seq 2841430516, ack 3093939941, win 512, length 0
06:45:54.239278 IP (tos 0x0, ttl 64, id 12249, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.771: Flags [.], cksum 0xa3f8 (correct), seq 1, ack 1, win 323, options [nop,nop,TS val 445417858 ecr 213325589], length 0
06:45:56.666570 IP (tos 0x0, ttl 64, id 32640, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.952537526 > GLUSTER.nfs: 132 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9E000000000000000000000000 "test.file"
06:45:56.668774 IP (tos 0x0, ttl 64, id 12250, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.952537526: reply ok 244 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 post dattr: DIR 755 ids 0/0 sz 39 nlink 3 rdev 0/0 fsid 9effe30ab3b784a4 fileid 1 a/m/ctime 1427911388.674000000 1427911311.168000000 1427911312.634000000
06:45:56.668845 IP (tos 0x0, ttl 64, id 32643, offset 0, flags [DF], proto TCP (6), length 188)
    ESXIHOST.952537527 > GLUSTER.nfs: 132 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9E000000000000000000000000 "test.file"
06:45:56.670064 IP (tos 0x0, ttl 64, id 12251, offset 0, flags [DF], proto TCP (6), length 300)
    GLUSTER.nfs > ESXIHOST.952537527: reply ok 244 lookup fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 post dattr: DIR 755 ids 0/0 sz 39 nlink 3 rdev 0/0 fsid 9effe30ab3b784a4 fileid 1 a/m/ctime 1427911388.674000000 1427911311.168000000 1427911312.634000000
06:45:56.670162 IP (tos 0x0, ttl 64, id 32647, offset 0, flags [DF], proto TCP (6), length 176)
    ESXIHOST.952537528 > GLUSTER.nfs: 120 access fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E 0001
06:45:56.670905 IP (tos 0x0, ttl 64, id 12252, offset 0, flags [DF], proto TCP (6), length 92)
    GLUSTER.nfs > ESXIHOST.952537528: reply ok 36 access attr: c 0001
06:45:56.670971 IP (tos 0x0, ttl 64, id 32648, offset 0, flags [DF], proto TCP (6), length 184)
    ESXIHOST.952537529 > GLUSTER.nfs: 128 read fh Unknown/3A4F474CDBCF439F46594E15A484B7B30AE3FF9EFAAD08E175154181AC99246E 512 bytes @ 0
06:45:56.673450 IP (tos 0x0, ttl 64, id 12253, offset 0, flags [DF], proto TCP (6), length 200)
    GLUSTER.nfs > ESXIHOST.952537529: reply ok 144 read REG 644 ids 0/0 sz 14 nlink 1 rdev 0/0 fsid 9effe30ab3b784a4 fileid ac99246edab18160 a/m/ctime 1427957145.630000000 1427911311.172000000 1427911311.174000000 14 bytes EOF
06:45:56.776511 IP (tos 0x0, ttl 64, id 32652, offset 0, flags [DF], proto TCP (6), length 52)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c28 (incorrect -> 0x9213), seq 529, ack 685, win 512, options [nop,nop,TS val 213326333 ecr 445420294], length 0
06:46:01.676392 IP (tos 0x0, ttl 64, id 32658, offset 0, flags [DF], proto TCP (6), length 40)
    ESXIHOST.771 > GLUSTER.nfs: Flags [.], cksum 0x0c1c (incorrect -> 0x9f72), seq 528, ack 685, win 512, length 0
06:46:01.676819 IP (tos 0x0, ttl 64, id 12254, offset 0, flags [DF], proto TCP (6), length 52)
    GLUSTER.nfs > ESXIHOST.771: Flags [.], cksum 0x7f44 (correct), seq 685, ack 529, win 323, options [nop,nop,TS val 445425298 ecr 213326333], length 0

Comment 2 darwin 2015-04-03 08:38:22 UTC
Created attachment 1010539 [details]
tcpdump of esxi read with Input/output error

Comment 3 darwin 2015-04-03 08:39:34 UTC
Created attachment 1010540 [details]
tcpdump of esxi read with no error

Comment 4 Niels de Vos 2015-04-06 19:00:47 UTC
Created attachment 1011510 [details]
tcpdump/tshark analysis

Summary, more details in the attachment.

Difference in the return of the read:
- non-working: return all data (13 bytes of 512 requested), EOF=No
- working: return all data (11 bytes of 512 requested), EOF=yes

This suggests that Striped-Replicated does not set the EOF flag when appropriate. A short read seems to be marked as an error on ESXi.

The Linux NFS-client requests the exact number of bytes it wants to read. ESXi requests 512 bytes, even if the file is smaller.
-> need an other NFS-client for testing... nfsshell?

When a short-read is done, the Stripe xlator should ret=<size> and errno=ENOENT to make the NFS-server set EOF.

Comment 5 Anand Avati 2015-04-06 19:02:39 UTC
REVIEW: http://review.gluster.org/10142 (stripe: set ENOENT when a READ hits EOF) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 6 Niels de Vos 2015-04-06 19:10:30 UTC
Tested the behaviour with nfsshell (which also requests more bytes than the file is long). The missing EOF is seen in tshark as well, with the patch the EOF is set.

I wonder if ESXi can use striped volumes too. Please let me know if you can test the patch and build packages yourself. I can build RPMs (tell me the distro and version) if you need assistance with that.

Comment 7 darwin 2015-04-07 04:23:26 UTC
Thanks, I'd be happy to test, can you build for CentOS Core 7.0.1406

Comment 8 Niels de Vos 2015-04-14 09:14:40 UTC
Could you test the packages from here?

   https://kojipkgs.fedoraproject.org/scratch/devos/task_9474357/

These are based on the 3.6.2 version, added the patch from comment #5.

Comment 9 Niels de Vos 2015-05-07 08:58:49 UTC
The patch has been included in the master branch and will be available for testing in the next 3.8 nightly builds:

    http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/

Bug 1219399 has been opened to get a backport of http://review.gluster.org/10142 in the 3.7 version. Once this is fixed/merged in 3.7, a backport can also get included in 3.6.

Resetting the status of this bug, so that other developers can send the backports.

Comment 10 Niels de Vos 2016-06-21 12:26:12 UTC
*** Bug 1347657 has been marked as a duplicate of this bug. ***

Comment 11 Vijay Bellur 2016-06-21 12:28:21 UTC
REVIEW: http://review.gluster.org/14771 (stripe: set ENOENT when a READ hits EOF) posted (#1) for review on release-3.6 by Niels de Vos (ndevos)

Comment 12 Niels de Vos 2016-07-31 21:20:57 UTC
This is not a security bug, not going to fix this in 3.6.x because of http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html

Note that current 3.7.x (and 3.8) versions have been fixed already.


Note You need to log in before you can comment on or make changes to this bug.