Created attachment 1184658 [details] logs from directio test Beginning with 3.7.12 and 3.7.13 when using zfs backed bricks connecting to sharded files fails with direct io. How reproducible: Always Steps to Reproduce: 1. zfs backed bricks default settings except xattr=sa 2. gluster fs 3.7.12+ sharding enabled 3. dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/192.168.71.11\:_glustershard/81e19cd3-ae45-449c-b716-ec3e4ad4c2f0/images/test oflag=direct count=100 bs=1M Actual results: dd: error writing ‘/rhev/data-center/mnt/glusterSD/192.168.71.11:_glustershard/81e19cd3-ae45-449c-b716-ec3e4ad4c2f0/images/test’: Operation not permitted file test is created with file size defined by shard size. sharded file created in .shard are 0 Expected results: 100+0 records in 100+0 records out 104857600 bytes etc..... Additional info: Using proxmox users have been able to work around by changing disk caching from none to writethrough/back. Not sure this would help with oVirt as the pything script that checks storage with dd and oflag=direct also fails attaching client and brick log from test
in oVirt mailing list was asked to test these settings i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on results: dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test oflag=direct count=100 bs=1M dd: error writing ‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’: Invalid argument dd: closing output file ‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’: Invalid argument [2016-07-25 18:20:19.393121] E [MSGID: 113039] [posix.c:2939:posix_open] 0-glustershard-posix: open on /gluster2/brick1/1/.glusterfs/02/f4/02f4783b-2799-46d9-b787-53e4ccd9a052, flags: 16385 [Invalid argument] [2016-07-25 18:20:19.393204] E [MSGID: 115070] [server-rpc-fops.c:1568:server_open_cbk] 0-glustershard-server: 120: OPEN /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) ==> (Invalid argument) [Invalid argument] and /var/log/glusterfs/rhev-data-center-mnt-glusterSD-192.168.71.10\:_glustershard.log [2016-07-25 18:20:19.393275] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-0: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393270] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-1: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393317] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-2: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393357] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 117: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393389] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 118: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393611] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 119: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393708] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 120: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393771] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 121: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393840] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 122: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393914] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 123: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393982] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 124: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.394045] W [fuse-bridge.c:709:fuse_truncate_cbk] 0-glusterfs-fuse: 125: FTRUNCATE() ERR => -1 (Invalid argument) [2016-07-25 18:20:19.394338] W [fuse-bridge.c:1290:fuse_err_cbk] 0-glusterfs-fuse: 126: FLUSH() ERR => -1 (Invalid argument)
Also have heard from others with issue that problem exists in 3.8.x as well. I myself have not tested as my environment is still in 3.7.x
These are full settings I usually apply and run with features.shard-block-size: 64MB features.shard: on performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: on cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off
Hi, Open() on these affected files seems to be returning ENOENT, however as per the find command output you gave on ovirt-users ML, both the file and its gfid handle seem to be existing in the backend. Then the failure was not due to ENOENT. I looked at the code in posix again and there is evidence to suggest that the actual error code (the real reason for open() failing) is getting masked by stat in .unlink directory: 30 if (fd->inode->ia_type == IA_IFREG) { 29 _fd = open (real_path, fd->flags); 28 if (_fd == -1) { 27 POSIX_GET_FILE_UNLINK_PATH (priv->base_path, 26 fd->inode->gfid, 25 unlink_path); 24 _fd = open (unlink_path, fd->flags); 23 } 22 if (_fd == -1) { 21 op_errno = errno; 20 gf_msg (this->name, GF_LOG_ERROR, op_errno, 19 P_MSG_READ_FAILED, 18 "Failed to get anonymous " 17 "real_path: %s _fd = %d", real_path, _fd); 16 GF_FREE (pfd); 15 pfd = NULL; 14 goto out; 13 } 12 } In your case, on line 29, the open on .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d failed for a reason other than ENOENT (it can't be ENOENT because we already saw on doing find that the file exists). And then line 27 is executed. If the file exists in its real path, then it must be absent in .unlink directory (because the gfid handle can't be present at both places). So it is the open() on line 24 that is failing with ENOENT and not the open on line 29. I'll be sending a patch to fix this problem. Meanwhile, in order to understand why the open on line 29 failed, could you attach all of your bricks to strace, run the test again, wait for it to fail, and then attach both the strace output files and the resultant glusterfs client and brick logs here? # strace -ff -p <pid-of-the-brick> -o <path-where-you-want-to-capture-the-output>
REVIEW: http://review.gluster.org/15041 (storage/posix: Look for file in "unlink" dir IFF open on real-path fails with ENOENT) posted (#1) for review on release-3.7 by Krutika Dhananjay (kdhananj)
REVIEW: http://review.gluster.org/15041 (storage/posix: Look for file in "unlink" dir IFF open on real-path fails with ENOENT) posted (#2) for review on release-3.7 by Krutika Dhananjay (kdhananj)
Until laster this weekend I can't shutdown the cluster to update gluster that some of the earlier logs provided in mailing list come from. I did run strace earlier while running the dd commands I mentioned in this report that fail when attempting to create sharded files. Maybe they will be beneficial in some way until I can re-attempt the update on my running oVirt setup.
Created attachment 1185606 [details] strace run during failed dd strace logs during failed file creation
Created attachment 1185607 [details] logs from running dd command
Thanks. That was very helpful. <strace-output> ... ... open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9-a67f3495d65e", O_RDWR) = 17 ... ... open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9-a67f3495d65e", O_RDWR|O_DIRECT) = -1 EINVAL (Invalid argument) open("/gluster2/brick2/1/.glusterfs/unlink/13fde185-8bcf-4747-bec9-a67f3495d65e", O_RDWR|O_DIRECT) = -1 ENOENT (No such file or directory) ... ... </strace-output> From the above, it is clear that the open() is failing with EINVAL. But if you notice, open() on the file with O_RDWR succeeded. But when the same file was open()'d with O_DIRECT flag included, it failed with EINVAL. I checked `man 2 open` to find out when the syscall returns EINVAL. <man-page-excerpt> ... ... EINVAL The filesystem does not support the O_DIRECT flag. See NOTES for more information. EINVAL Invalid value in flags. EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor O_RDWR was specified. ... ... </man-page-excerpt> So it seems very likely that the EINVAL was due to O_DIRECT. At this point I wanted to ask you this - does zfs (or the version of it you're using) support O_DIRECT? -Krutika
(In reply to Krutika Dhananjay from comment #10) > Thanks. That was very helpful. > > <strace-output> > ... > ... > open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9- > a67f3495d65e", O_RDWR) = 17 > ... > ... > open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9- > a67f3495d65e", O_RDWR|O_DIRECT) = -1 EINVAL (Invalid argument) > open("/gluster2/brick2/1/.glusterfs/unlink/13fde185-8bcf-4747-bec9- > a67f3495d65e", O_RDWR|O_DIRECT) = -1 ENOENT (No such file or directory) > ... > ... > </strace-output> > > > From the above, it is clear that the open() is failing with EINVAL. But if > you notice, open() on the file with O_RDWR succeeded. But when the same file > was open()'d with O_DIRECT flag included, it failed with EINVAL. > > I checked `man 2 open` to find out when the syscall returns EINVAL. > > <man-page-excerpt> > ... > ... > EINVAL The filesystem does not support the O_DIRECT flag. See NOTES > for more information. > > EINVAL Invalid value in flags. > > EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor > O_RDWR was specified. > ... > ... > </man-page-excerpt> > > So it seems very likely that the EINVAL was due to O_DIRECT. > > At this point I wanted to ask you this - does zfs (or the version of it > you're using) support O_DIRECT? I think the mistake is done by me. I didn't backport http://review.gluster.org/14215 to 3.7 branch. > > -Krutika
REVIEW: http://review.gluster.org/15050 (protocol/client: Filter o-direct in readv/writev) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)
(In reply to Pranith Kumar K from comment #11) > (In reply to Krutika Dhananjay from comment #10) > > Thanks. That was very helpful. > > > > <strace-output> > > ... > > ... > > open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9- > > a67f3495d65e", O_RDWR) = 17 > > ... > > ... > > open("/gluster2/brick2/1/.glusterfs/13/fd/13fde185-8bcf-4747-bec9- > > a67f3495d65e", O_RDWR|O_DIRECT) = -1 EINVAL (Invalid argument) > > open("/gluster2/brick2/1/.glusterfs/unlink/13fde185-8bcf-4747-bec9- > > a67f3495d65e", O_RDWR|O_DIRECT) = -1 ENOENT (No such file or directory) > > ... > > ... > > </strace-output> > > > > > > From the above, it is clear that the open() is failing with EINVAL. But if > > you notice, open() on the file with O_RDWR succeeded. But when the same file > > was open()'d with O_DIRECT flag included, it failed with EINVAL. > > > > I checked `man 2 open` to find out when the syscall returns EINVAL. > > > > <man-page-excerpt> > > ... > > ... > > EINVAL The filesystem does not support the O_DIRECT flag. See NOTES > > for more information. > > > > EINVAL Invalid value in flags. > > > > EINVAL O_TMPFILE was specified in flags, but neither O_WRONLY nor > > O_RDWR was specified. > > ... > > ... > > </man-page-excerpt> > > > > So it seems very likely that the EINVAL was due to O_DIRECT. > > > > At this point I wanted to ask you this - does zfs (or the version of it > > you're using) support O_DIRECT? > > I think the mistake is done by me. I didn't backport > http://review.gluster.org/14215 to 3.7 branch. > > > > > -Krutika Oops sorry, I think your question is still valid. i.e. open with O_DIRECT shouldn't have failed!!
So basically on zfs we shouldn't have the option remote-dio off. For this to filter reads/writes we should get http://review.gluster.org/14215 in the next 3.7.x release
With remote-dio on the intial 64M file is written, but the files in .shard fail.
(In reply to David from comment #15) > With remote-dio on the intial 64M file is written, but the files in .shard > fail. Yes, that is because 3.7.13 is not filtering O_DIRECT in read/write of shards. Once the patch I mentioned above is merged, it will all work fine. But you must set remote-dio on.
Ahh I see what you mean now, apologies. Usually I have it on. I may have had it off in the logs of one of my tests submitted from a request in one of the mailing lists.
COMMIT: http://review.gluster.org/15050 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 3492f539a21223798dcadbb92e24cb7eb6cbf154 Author: Pranith Kumar K <pkarampu> Date: Thu May 5 07:59:03 2016 +0530 protocol/client: Filter o-direct in readv/writev >Change-Id: I519c666b3a7c0db46d47e08a6a7e2dbecc05edf2 >BUG: 1322214 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/14215 >Smoke: Gluster Build System <jenkins.com> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.com> >Reviewed-by: Krutika Dhananjay <kdhananj> >(cherry picked from commit 74837896c38bafdd862f164d147b75fcbb619e8f) BUG: 1360785 Pranith Kumar K <pkarampu> Change-Id: Ib4013b10598b0b988b9f9f163296b6afa425f8fd Reviewed-on: http://review.gluster.org/15050 Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
I guess one way to check whether zfs supports odirect or not is by running the same test you ran on glusterfs again, only this time use zfs directly to store the vm (keep the cache=none setting as it is). If open() fails with EINVAL, then very likely the issue is ZFS support for O_DIRECT (rather the lack of it).
COMMIT: http://review.gluster.org/15041 committed in release-3.7 by Atin Mukherjee (amukherj) ------ commit 72db4ac5701185fc3115f115f18fb2250f3050f4 Author: Krutika Dhananjay <kdhananj> Date: Thu Jul 28 22:37:38 2016 +0530 storage/posix: Look for file in "unlink" dir IFF open on real-path fails with ENOENT Backport of: http://review.gluster.org/#/c/15039/ PROBLEM: In some of our users' setups, open() on the anon fd failed for a reason other than ENOENT. But this error code is getting masked by a subsequent open() under posix's hidden "unlink" directory, which will fail with ENOENT because the gfid handle still exists under .glusterfs. And the log message following the two open()s ends up logging ENOENT, causing much confusion. FIX: Look for the presence of the file under "unlink" ONLY if the open() on the real_path failed with ENOENT. Change-Id: Id68bbe98740eea9889b17f8ea3126ed45970d26f BUG: 1360785 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15041 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.14, please open a new bug report. glusterfs-3.7.14 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-August/050319.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user