I have a replicated Gluster setup, 2 servers (fs-1 and fs-2) x 1 brick. I have two clients (web-1 and web-2) which are connected and simultaneously execute tasks. These clients mount the Gluster volume at /mnt/gfs. One task they execute looks like this (note this is pseudocode, the actual task is PHP): 1. @symlink(/mnt/gfs/slow265, /mnt/gfs/slow265.prod); 2. if (!is_link(/mnt/gfs/slow265.prod)) { 3. throw Exception; 4. } 5. symlink(/mnt/gfs/slow265.prod, /home/user/slow265.prod) Note that line 1 may fail on either client because the link may have been created by the other client, but this is suppressed, the link is checked and an exception is thrown if the link does not exist. These two tasks, when executed at the same time, usually succeed. However, in a recent run, we saw an error on web-1 in line 5 because the local filesystem symlink creation failed, despite line 2 confirming that the target Gluster symlink existed. I've created a PHP script which can be run simultaneously on two clients to recreate the error: https://gist.github.com/pdrakeweb/7347198 Running the same test script on a Gluster 3.0.8 setup does not cause the error to occur. Running the same test on a local-only filesystem also does not cause the error to occur. I'd appreciate any insight people might have into what is going on here and whether this is a bug in 3.4.1. Below are the related log entries from my Gluster servers and clients. Entries from web-1's mnt-gfs.log file: [2013-11-05 05:25:24.686506] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-test-fs-cluster-1-client-1: remote operation failed: File exists. Path: /slow265.prod (00000000-0000-0000-0000-000000000000) [2013-11-05 05:25:24.686584] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 0-test-fs-cluster-1-client-0: remote operation failed: File exists. Path: /slow265.prod (00000000-0000-0000-0000-000000000000) [2013-11-05 05:25:24.686649] E [dht-helper.c:1052:dht_inode_ctx_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f5e03dd4ff5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7f5e03dc1c89] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7f5e03dc3b34]))) 0-test-fs-cluster-1-dht: invalid argument: inode [2013-11-05 05:25:24.686687] E [dht-helper.c:1071:dht_inode_ctx_set] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75) [0x7f5e03dd4ff5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7f5e03dc1c89] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7f5e03dc3b52]))) 0-test-fs-cluster-1-dht: invalid argument: inode [2013-11-05 05:25:24.689670] W [fuse-bridge.c:1311:fuse_readlink_cbk] 0-glusterfs-fuse: 1736: /slow265.prod => -1 (Invalid argument) Entries from web-2's mnt-gfs.log file: [2013-11-05 05:25:26.164593] W [client-rpc-fops.c:2469:client3_3_link_cbk] 0-test-fs-cluster-1-client-1: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /slow265.prod) [2013-11-05 05:25:26.210652] W [client-rpc-fops.c:2469:client3_3_link_cbk] 0-test-fs-cluster-1-client-0: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /slow265.prod) Entries from fs-1's brick.log: [2013-11-05 05:25:24.832262] I [server-rpc-fops.c:575:server_mknod_cbk] 0-test-fs-cluster-1-server: 3337: MKNOD (null) (00000000-0000-0000-0000-000000000001/slow265.prod) ==> (File exists) [2013-11-05 05:25:26.391611] I [server-rpc-fops.c:1211:server_link_cbk] 0-test-fs-cluster-1-server: 3301: LINK /slow265.prod (3658314e-7730-4771-8ac3-2d6fb20b1b13) -> 00000000-0000-0000-0000-000000000001/slow265.prod ==> (File exists) Entries from fs-2's brick.log: [2013-11-05 05:25:24.554824] I [server-rpc-fops.c:575:server_mknod_cbk] 0-test-fs-cluster-1-server: 3290: MKNOD (null) (00000000-0000-0000-0000-000000000001/slow265.prod) ==> (File exists) [2013-11-05 05:25:26.160204] I [server-rpc-fops.c:1211:server_link_cbk] 0-test-fs-cluster-1-server: 3341: LINK /slow265.prod (3658314e-7730-4771-8ac3-2d6fb20b1b13) -> 00000000-0000-0000-0000-000000000001/slow265.prod ==> (File exists)
I've included straces from both successful and unsuccessful exections, as well as the PHP error information below. Let me know if there is anything else I can provide which would be helpful. PHP Error (as provided by error_get_last()): Array ( [type] => 2 [message] => symlink(): No such file or directory [file] => /tmp/symlink-test.php [line] => 78 ) Straces on both clients for symlink creation which was unsuccessful on one client: Strace on unsuccessful client web-1: lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 lstat("/tmp/test1385000751", 0x7fff2b5eb2e0) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 readlink("/mnt/gfs/test1385000751", 0x7fff2b5eb3f0, 4096) = -1 EINVAL (Invalid argument) lstat("/tmp/test1385000751", 0x7fff2b5ef2d0) = -1 ENOENT (No such file or directory) write(1, "Failed to create local link: /tm"..., 50) = 50 Strace on successful client web-2: lstat("/mnt/gfs/test1385000751", 0x7fff3171b720) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000751", 0x7fff31717730) = -1 ENOENT (No such file or directory) symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000751") = 0 lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 lstat("/tmp/test1385000751", 0x7fff31717730) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 readlink("/mnt/gfs/test1385000751", "/mnt/gfs/test-target"..., 4096) = 20 symlink("/mnt/gfs/test1385000751", "/tmp/test1385000751") = 0 lstat("/tmp/test1385000751", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0 Straces on both clients for symlink creation which was successful on both clients: Strace on successful client web-1: lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 lstat("/tmp/test1385000727", 0x7fff31717730) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target"..., 4096) = 20 symlink("/mnt/gfs/test1385000727", "/tmp/test1385000727") = 0 lstat("/tmp/test1385000727", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0 Strace on successful client web-2: lstat("/mnt/gfs/test1385000727", 0x7fff2b5ef2d0) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such file or directory) symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000727") = 0 lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 lstat("/tmp/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such file or directory) lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...}) = 0 readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target"..., 4096) = 20 symlink("/mnt/gfs/test1385000727", "/tmp/test1385000727") = 0 lstat("/tmp/test1385000727", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
REVIEW: http://review.gluster.org/6358 (cluster/dht: set layout in inode ctx even if linkfile fails) posted (#1) for review on release-3.4 by Anand Avati (avati)
COMMIT: http://review.gluster.org/6319 committed in master by Vijay Bellur (vbellur) ------ commit 9f793d70bab528e96daf3478261aeb32b2ae5523 Author: Anand Avati <avati> Date: Wed Nov 20 12:46:58 2013 -0800 cluster/dht: set layout in inode ctx even if linkfile fails Creating linkfile could have failed, but we dont care about linkfile for setting layout in the inode ctx (could be EEXIST etc.) So ignore @inode in cbk and pick it up from local->loc.inode Change-Id: I2952799d7ae0d3441b84b2ca2981afd75d7576e2 BUG: 1032859 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6319 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
COMMIT: http://review.gluster.org/6358 committed in release-3.4 by Anand Avati (avati) ------ commit 168bb24a60643e2aedb90170c3d8d3c447c5c5c6 Author: Anand Avati <avati> Date: Wed Nov 20 12:46:58 2013 -0800 cluster/dht: set layout in inode ctx even if linkfile fails Creating linkfile could have failed, but we dont care about linkfile for setting layout in the inode ctx (could be EEXIST etc.) So ignore @inode in cbk and pick it up from local->loc.inode Change-Id: I2952799d7ae0d3441b84b2ca2981afd75d7576e2 BUG: 1032859 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6358 Tested-by: Gluster Build System <jenkins.com>
REVIEW: http://review.gluster.org/6470 (cluster/dht: set layout in inode ctx even if linkfile fails) posted (#1) for review on release-3.4 by Shishir Gowda (gowda.shishir)
COMMIT: http://review.gluster.org/6470 committed in release-3.4 by Anand Avati (avati) ------ commit bfb7f0806b0abd05e232f7c7e7260969ba330ec1 Author: shishir gowda <gowda.shishir> Date: Tue Dec 10 14:59:39 2013 +0530 cluster/dht: set layout in inode ctx even if linkfile fails Creating linkfile could have failed, but we dont care about linkfile for setting layout in the inode ctx (could be EEXIST etc.) So ignore @inode in cbk and pick it up from local->loc.inode Backporting http://review.gluster.org/6319 BUG: 1032859 Change-Id: Ic95e303a4c060900d041820d4faa68d1c4685b6a Original-author: Anand Avati <avati> Signed-off-by: shishir gowda <gowda.shishir> Reviewed-on: http://review.gluster.org/6470 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Anand Avati <avati>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report. glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5". [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978 [2] http://news.gmane.org/gmane.comp.file-systems.gluster.user [3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137