Hide Forgot
Created attachment 188 [details] Fixed S10network script
Setup: 4 node distribute. set XCRYSDEN_SCRATCH=<gluster mount point> xcrysden version: 1.4.1 Statically linked. Can be downloaded from http://www.xcrysden.org/download/xc-1.4.1bin-linuxPC-static.tar.gz (GPLed) When xcrysden is run by setting XCRYSDEN_SCRATCH to a local file system it runs without errors. This happens in both 2.0.x and 3.0.x versions.
A hint is in the log. Right before the log ends, it says: [2010-05-05 16:37:44] N [fuse-bridge.c:3140:fuse_thread_proc] glusterfs-fuse: terminating upon getting ENODEV when reading /dev/fuse Looking further.
Version: 1.4.1 XCRYSDEN reference: A. Kokalj: J. Mol. Graphics Modelling, 1999, Vol. 17, 176--179. A. Kokalj: Comput. Mater. Science, 2003, Vol. 28, 155--168. A. Kokalj and M. Causa: Proceedings of High Performance Graphics Systems and Applications European Workshop, Bologna, Italy, 2000, 51--54. Please report bugs to: Tone.Kokalj TERMS OF USE: ------------- XCRYSDEN is released under the GNU General Public License. Whenever graphics generated by XCRYSDEN are used in scientific publications, it shall be greatly appreciated to include an explicit reference. The preferred form is the following: [ref] A. Kokalj: J. Mol. Graphics Modelling, 1999, Vol. 17, 176--179. Code available from http://www.xcrysden.org/. Running on platform: unix Executing: /appl/XCrySDen-1.4.1bin-static/bin/ftnunit Error in startup script: error deleting "fort_unit": file already exists while executing "file delete -force fort_unit" (procedure "FtnName" line 22) invoked from within "FtnName" invoked from within "set system(ftn_name) [FtnName]" ("eval" body line 65) invoked from within "eval [info body xcInit]" (file "/appl/XCrySDen-1.4.1bin-static/Tcl/xcInit.tcl" line 518) --------------------------------
I am able to reproduce the problem on mainline: Executing: /tmp/XCrySDen-1.4.1bin-static/bin/ftnunit Error in startup script: error deleting "fort_unit": file already exists while executing "file delete -force fort_unit" (procedure "FtnName" line 22) invoked from within "FtnName" invoked from within "set system(ftn_name) [FtnName]" ("eval" body line 65) invoked from within "eval [info body xcInit]" (file "/tmp/XCrySDen-1.4.1bin-static/Tcl/xcInit.tcl" line 518) The problem is cropping up because xc is trying to remove a directory that is not empty: [2010-05-10 12:20:39] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 48: RMDIR() /xc_27403/fort_unit => -1 (Directory not empty) [2010-05-10 12:20:39] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 54: RMDIR() /xc_27403/fort_unit => -1 (Directory not empty) hence the following message in strace: [pid 27450] rmdir("fort_unit") = -1 ENOTEMPTY (Directory not empty)
If the scratch directory is configured to a local disk filesystem, then it works perfectly fine.
When the scratch dir is set to /tmp/xctmp, xc is starting up correctly. Once the start-up completes, the directory contains: core dir1 In the previous run over GlusterFS, the fort_unit directory contained a file called fort.99. It is possible that the start-up process tried to remove fort.99 but that failed over GlusterFS mount point.
The GlusterFS client log has some more info that hints at the problem: ############FIRST THE fort.99 FILE IS CREATED################## [2010-05-10 12:33:04] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 33: LOOKUP() /xc_27590/fort_unit/fort.99 => -1 (No such file or directory) [2010-05-10 12:33:04] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 34: LOOKUP /xc_27590/fort_unit/fort.99 [2010-05-10 12:33:04] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 34: LOOKUP() /xc_27590/fort_unit/fort.99 => -1 (No such file or directory) [2010-05-10 12:33:04] T [fuse-bridge.c:1766:fuse_create] glusterfs-fuse: 35: CREATE /xc_27590/fort_unit/fort.99 [2010-05-10 12:33:04] T [dht-common.c:3050:dht_create] dist: creating /xc_27590/fort_unit/fort.99 on localdiskclient4 [2010-05-10 12:33:04] T [fuse-bridge.c:1658:fuse_create_cbk] glusterfs-fuse: 35: CREATE() /xc_27590/fort_unit/fort.99 => 0x1f9b5a0 (ino=458853) [2010-05-10 12:33:04] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 36: GETATTR 33146320 (/xc_27590/fort_unit) [2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 36: STAT() /xc_27590/fort_unit => 458840 [2010-05-10 12:33:04] T [fuse-bridge.c:2662:fuse_getxattr] glusterfs-fuse: 37: GETXATTR /xc_27590/fort_unit/fort.99/33137440 (security.capability) [2010-05-10 12:33:04] T [fuse-bridge.c:2523:fuse_xattr_cbk] glusterfs-fuse: 37: GETXATTR() /xc_27590/fort_unit/fort.99 => 0 ######################DATA GETS WRITTEN TO fort.99######################### [2010-05-10 12:33:04] T [fuse-bridge.c:1958:fuse_write] glusterfs-fuse: 38: WRITE (0x1f9b5a0, size=6, offset=0) [2010-05-10 12:33:04] T [fuse-bridge.c:1906:fuse_writev_cbk] glusterfs-fuse: 38: WRITE => 6/6,0/0 [2010-05-10 12:33:04] T [fuse-bridge.c:812:fuse_getattr] glusterfs-fuse: 39: FGETATTR 33137440 (/xc_27590/fort_unit/fort.99/0x1f9b5a0) [2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 39: FSTAT() /xc_27590/fort_unit/fort.99 => 458853 [2010-05-10 12:33:04] T [fuse-bridge.c:1997:fuse_flush] glusterfs-fuse: 40: FLUSH 0x1f9b5a0 [2010-05-10 12:33:04] T [fuse-bridge.c:1148:fuse_err_cbk] glusterfs-fuse: 40: FLUSH() ERR => 0 [2010-05-10 12:33:04] T [fuse-bridge.c:2056:fuse_release] glusterfs-fuse: 41: RELEASE 0x1f9b5a0 ######################fort_unit DIRECTORY IS OPENED######################### [2010-05-10 12:33:04] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 42: OPENDIR /xc_27590/fort_unit [2010-05-10 12:33:04] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 42: OPENDIR() /xc_27590/fort_unit => 0x1f9b5a0 [2010-05-10 12:33:04] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 43: READDIR (0x1f9b5a0, size=4096, offset=0) [2010-05-10 12:33:04] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 43: READDIR => 2/4096,0 ########HINT OF A PROBLEM ABOVE######################## We know fort_unit by now contains the fort.99 file, then why does fuse_readdir_cbk show that op_ret = 2, which means there are just 2 entries in the directory, i.e. the "." and ".." and no fort.99. ###################################################### [2010-05-10 12:33:04] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 44: RELEASEDIR 0x1f9b5a0 [2010-05-10 12:33:04] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 45: GETATTR 33146320 (/xc_27590/fort_unit) [2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 45: STAT() /xc_27590/fort_unit => 458840 [2010-05-10 12:33:04] T [fuse-bridge.c:1036:fuse_setattr] glusterfs-fuse: 46: SETATTR (33146320)/xc_27590/fort_unit [2010-05-10 12:33:04] T [fuse-bridge.c:906:fuse_setattr_cbk] glusterfs-fuse: 46: SETATTR() /xc_27590/fort_unit => 458840 ############# BUT ON RMDIR ################################ [2010-05-10 12:33:04] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 48: RMDIR /xc_27590/fort_unit ############## DHT DOES SAY THAT THERE ARE 3 ENTRIES IN fort_unit########## [2010-05-10 12:33:04] T [dht-common.c:3627:dht_rmdir_readdirp_cbk] dist: readdir on localdiskclient4 for /xc_27590/fort_unit returned 3 entries [2010-05-10 12:33:04] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 48: RMDIR() /xc_27590/fort_unit => -1 (Directory not empty)
Running with debug/trace loaded before distribute and over both subvolumes of distribute, this log below shows: [2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] dist: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient2trace: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient4trace: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient4trace: 42: (op_ret=0, op_errno=22, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient2trace: 42: (op_ret=0, op_errno=22, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] dist: 42: (op_ret=0, op_errno=117, fd=0x1317ee0) [2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] dist: 43: (fd=0x1317ee0, size=4096, offset=0) [2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient2trace: 43: (fd=0x1317ee0, size=4096, offset=0) [2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient2trace: 43 :(op_ret=2, op_errno=2) [2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] dist: 43 :(op_ret=2, op_errno=0) That distribute is sending a readdirp fop only to a first subvolume and not to the second one. Since the first one does not contain the file created earlier, i.e. fort.99, the op_ret=2, hence the rmdir follows and fails with ENOTEMPTY.
The lines that follow the previous log contents: [2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] dist: 52 :(op_ret=2, op_errno=0) [2010-05-10 13:56:03] N [trace.c:1392:trace_rmdir] dist: 54: (loc {path=/xc_28556/fort_unit, ino=229416}) [2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient2trace: 54:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1318560) [2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient4trace: 54:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1318560) [2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient2trace: 54: (op_ret=0, op_errno=22, fd=0x1318560) [2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient2trace: 54: (fd=0x1318560, size=4096, offset=0) [2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient2trace: 54 :(op_ret=2, op_errno=2) [2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient4trace: 54: (op_ret=0, op_errno=22, fd=0x1318560) [2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient4trace: 54: (fd=0x1318560, size=4096, offset=0) [2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient4trace: 54 :(op_ret=3, op_errno=2) [2010-05-10 13:56:03] N [trace.c:851:trace_rmdir_cbk] dist: 54: (op_ret=-1, op_errno=39) [2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] dist: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755) [2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] localdiskclient4trace: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755) [2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] localdiskclient2trace: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755) [2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] localdiskclient2trace: 55: (op_ret=0, *statpre {ia_ino=114708, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=114708, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}) [2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] localdiskclient4trace: 55: (op_ret=0, *statpre {ia_ino=114709, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=114709, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}) [2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] dist: 55: (op_ret=0, *statpre {ia_ino=229416, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=229416, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]})
The gdb step into shows why the STACK_WIND does not happen for the second subvolume which contains 3 entries, including the one for fort.99 file: (gdb) 2216 op_ret = count; (gdb) p count $2 = 2 (gdb) n 2224 if (prev->this != dht_last_up_subvol (this)) (gdb) 2225 op_errno = 0; (gdb) p prev $3 = (call_frame_t *) 0x12aeb80 (gdb) p *prev->this $4 = {name = 0x129f3f0 "localdiskclient2trace", type = 0x129fd70 "debug/trace", next = 0x129e700, prev = 0x12a0520, parents = 0x12a1fa0, children = 0x12a04e0, options = 0x129f080, fops = 0x7ff52da557c0, mops = 0x7ff52da55a80, cbks = 0x7ff52da55aa0, dumpops = 0x0, volume_options = {next = 0x129fdd0, prev = 0x129fdd0}, fini = 0x7ff52d85239b <fini>, init = 0x7ff52d851fef <init>, notify = 0x7ff52ec399c3 <default_notify>, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, ctx = 0x1296010, itable = 0x0, ready = 1 '\001', init_succeeded = 1 '\001', private = 0x0} (gdb) n 2228 if (count == 0) { (gdb) n 2249 if (op_ret < 0) (gdb) n 2252 DHT_STACK_UNWIND (readdirp, frame, op_ret, op_errno, &entries); (gdb) n 2254 gf_dirent_free (&entries);
Syscalls leading up to the rmdir show that getdents for /mnt/fort_unit is called but only once after the directory has been opened. 14567 open("/mnt/xc_14567/fort_unit/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4 14567 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14567 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 14567 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff76b8000 14567 getdents64(4, /* 2 entries */, 131072) = 48 ################2 entries returned ################## 14567 munmap(0xf76b8000, 135168) = 0 14567 close(4) = 0 14567 getcwd("/mnt/xc_14567/fort_unit", 4097) = 24 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 lstat64("/mnt/xc_14567", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 chdir("/mnt/xc_14567") = 0 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 getcwd("/mnt/xc_14567", 4097) = 14 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 lstat64("/mnt/xc_14567", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14567 lstat64("/mnt/xc_14567/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14567 getcwd("/mnt/xc_14567", 4097) = 14 14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14567 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14567 chmod("fort_unit", 040755) = 0 14567 rmdir("fort_unit") = -1 ENOTEMPTY (Directory not empty)
xc app does not fail always. It does run to completion in some runs. For eg, the strace log of a full run: 14843 getcwd("/mnt/xc_14843/fort_unit", 4097) = 24 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 stat64("/mnt/xc_14843/fort_unit/", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 open("/mnt/xc_14843/fort_unit/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4 14843 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 14843 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7611000 14843 getdents64(4, /* 3 entries */, 131072) = 80 ################# The difference this time is that the 3 entries are returned the first time. And 2 entries in the second getdents call below. ######################################### 14843 _llseek(4, 1403963200, [1403963200], SEEK_SET) = 0 14843 getdents64(4, /* 2 entries */, 131072) = 48 14843 munmap(0xf7611000, 135168) = 0 14843 close(4) = 0 14843 getcwd("/mnt/xc_14843/fort_unit", 4097) = 24 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 chdir("/mnt/xc_14843") = 0 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 getcwd("/mnt/xc_14843", 4097) = 14 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 lstat64("/mnt/xc_14843/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 getcwd("/mnt/xc_14843", 4097) = 14 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 lstat64("/mnt/xc_14843/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 getcwd("/mnt/xc_14843", 4097) = 14 14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 14843 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 chmod("fort_unit", 040755) = 0 14843 rmdir("fort_unit") = -1 ENOTEMPTY (Directory not empty) 14843 lstat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 open("fort_unit", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4 14843 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 14843 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 14843 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7611000 14843 getdents64(4, /* 3 entries */, 131072) = 80 14843 _llseek(4, 1403963200, [1403963200], SEEK_SET) = 0 14843 lstat64("fort_unit/fort.99", {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 14843 unlink("fort_unit/fort.99") = 0 14843 getdents64(4, /* 2 entries */, 131072) = 48 14843 munmap(0xf7611000, 135168) = 0 14843 close(4) = 0 14843 rmdir("fort_unit") = 0
GlusterFS client log of a successful run: [2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 273: MKDIR() /xc_18125/fort_unit => 229496 (0) [2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 274: GETATTR 8570640 (/xc_18125) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 274: STAT() /xc_18125 => 229486 [2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 275: LOOKUP /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 275: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory) [2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 276: LOOKUP /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 276: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory) [2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 277: LOOKUP /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 277: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory) [2010-05-10 17:11:23] T [fuse-bridge.c:1766:fuse_create] glusterfs-fuse: 278: CREATE /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [dht-common.c:3051:dht_create] dist: creating /xc_18125/fort_unit/fort.99 on localdiskclient2 [2010-05-10 17:11:23] T [fuse-bridge.c:1658:fuse_create_cbk] glusterfs-fuse: 278: CREATE() /xc_18125/fort_unit/fort.99 => 0x82dcd0 (ino=229500) [2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 279: GETATTR 8572896 (/xc_18125/fort_unit) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 279: STAT() /xc_18125/fort_unit => 229496 [2010-05-10 17:11:23] T [fuse-bridge.c:2662:fuse_getxattr] glusterfs-fuse: 280: GETXATTR /xc_18125/fort_unit/fort.99/8573040 (security.capability) [2010-05-10 17:11:23] T [fuse-bridge.c:2523:fuse_xattr_cbk] glusterfs-fuse: 280: GETXATTR() /xc_18125/fort_unit/fort.99 => 0 [2010-05-10 17:11:23] T [fuse-bridge.c:1958:fuse_write] glusterfs-fuse: 281: WRITE (0x82dcd0, size=6, offset=0) [2010-05-10 17:11:23] T [fuse-bridge.c:1906:fuse_writev_cbk] glusterfs-fuse: 281: WRITE => 6/6,0/0 [2010-05-10 17:11:23] T [fuse-bridge.c:812:fuse_getattr] glusterfs-fuse: 282: FGETATTR 8573040 (/xc_18125/fort_unit/fort.99/0x82dcd0) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 282: FSTAT() /xc_18125/fort_unit/fort.99 => 229500 [2010-05-10 17:11:23] T [fuse-bridge.c:1997:fuse_flush] glusterfs-fuse: 283: FLUSH 0x82dcd0 [2010-05-10 17:11:23] T [fuse-bridge.c:1148:fuse_err_cbk] glusterfs-fuse: 283: FLUSH() ERR => 0 [2010-05-10 17:11:23] T [fuse-bridge.c:2056:fuse_release] glusterfs-fuse: 284: RELEASE 0x82dcd0 [2010-05-10 17:11:23] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 285: OPENDIR /xc_18125/fort_unit [2010-05-10 17:11:23] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 285: OPENDIR() /xc_18125/fort_unit => 0x82dcd0 [2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 286: READDIR (0x82dcd0, size=4096, offset=0) [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: fort.99 [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: . [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .. [2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 286: READDIR => 3/4096,0 [2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 287: READDIR (0x82dcd0, size=4096, offset=1403963200) [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: . [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .. [2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 287: READDIR => 2/4096,1403963200 [2010-05-10 17:11:23] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 288: RELEASEDIR 0x82dcd0 [2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 289: GETATTR 8572896 (/xc_18125/fort_unit) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 289: STAT() /xc_18125/fort_unit => 229496 [2010-05-10 17:11:23] T [fuse-bridge.c:1036:fuse_setattr] glusterfs-fuse: 290: SETATTR (8572896)/xc_18125/fort_unit [2010-05-10 17:11:23] T [fuse-bridge.c:906:fuse_setattr_cbk] glusterfs-fuse: 290: SETATTR() /xc_18125/fort_unit => 229496 [2010-05-10 17:11:23] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 292: RMDIR /xc_18125/fort_unit [2010-05-10 17:11:23] T [dht-common.c:3628:dht_rmdir_readdirp_cbk] dist: readdir on localdiskclient2 for /xc_18125/fort_unit returned 3 entries [2010-05-10 17:11:23] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 292: RMDIR() /xc_18125/fort_unit => -1 (Directory not empty) [2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 294: LOOKUP /xc_18125/fort_unit [2010-05-10 17:11:23] T [dht-layout.c:309:dht_disk_layout_merge] dist: merged to layout: 0 - 2147483646 (type 0) from localdiskclient2 [2010-05-10 17:11:23] T [dht-layout.c:309:dht_disk_layout_merge] dist: merged to layout: 2147483647 - 4294967295 (type 0) from localdiskclient4 [2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 294: LOOKUP() /xc_18125/fort_unit => 229496 (0) [2010-05-10 17:11:23] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 295: OPENDIR /xc_18125/fort_unit [2010-05-10 17:11:23] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 295: OPENDIR() /xc_18125/fort_unit => 0x82e360 [2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 296: READDIR (0x82e360, size=4096, offset=0) [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: fort.99 [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: . [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .. [2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 296: READDIR => 3/4096,0 [2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 297: GETATTR 8572896 (/xc_18125/fort_unit) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 297: STAT() /xc_18125/fort_unit => 229496 [2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 298: LOOKUP /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 298: LOOKUP() /xc_18125/fort_unit/fort.99 => 229500 (0) [2010-05-10 17:11:23] T [fuse-bridge.c:1419:fuse_unlink] glusterfs-fuse: 299: UNLINK /xc_18125/fort_unit/fort.99 [2010-05-10 17:11:23] T [fuse-bridge.c:1207:fuse_unlink_cbk] glusterfs-fuse: 299: UNLINK() /xc_18125/fort_unit/fort.99 => 0 [2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 301: READDIR (0x82e360, size=4096, offset=1403963200) [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: . [2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .. [2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 301: READDIR => 2/4096,1403963200 [2010-05-10 17:11:23] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 302: RELEASEDIR 0x82e360 [2010-05-10 17:11:23] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 303: RMDIR /xc_18125/fort_unit [2010-05-10 17:11:23] T [fuse-bridge.c:1207:fuse_unlink_cbk] glusterfs-fuse: 303: RMDIR() /xc_18125/fort_unit => 0 [2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 305: GETATTR 8570640 (/xc_18125) [2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 305: STAT() /xc_18125 => 229486
I've tested this with 2.0.6, 2.0.9, 3.0.1 and all have this problem. I am also increasingly feeling that the bug might be in the tcl implementation that is packaged along with this application's binary.
Sac We need to confirm if this application is being run for the first time on GlusterFS? Did it ever run for him on earlier versions?
> We need to confirm if this application is being run for the first time on > GlusterFS? Did it ever run for him on earlier versions? This is the first time they are running this application.
Either the application or the TCL interpreter is calling getdents or readdir while assuming that the fort.99 entry will be available from the first call to this function. This is not a correct use of getdents, since the app or the TCL interpreter needs to continue calling getdents till it returns 0, i.e. EOF. The problem is in the interaction of distribute and such non-conformant behaviour of TCL/xc because on each getdents call, distribute returns entries from one subvolume, i.e. it does not aggregate entries from multiple subvolumes, knowing that the app must call getdents again till it receives EOF. distribute returns EOF when it has returned the last directory entry of the last subvolume. The non-conformant behaviour of TCL or xc is confirmed in runs where distribute does return the fort.99 entry in the first call to getdents(see the logs of successful xc runs over glusterfs). This is when the xc runs to completion over GlusterFS. There are a couple of things we can do here: 1. Run the app with upgraded TCL interpreter and associated libs. If the bug is in TCL, xc will run just fine, assuming the bug has been fixed. Know that the TCL that comes packaged with xc is a 2005 release. 2. If this does not fix it, file a bug with xc developers. They might be using getdents in a non-conformant manner. 3. If xc is not using getdents, then file a bug with TCL developers.
Sac, i can confirm that xc runs fine over a storage/posix only config.
> Sac, i can confirm that xc runs fine over a storage/posix only config. Thank you, I will confirm this to the customer.
Sac, I am closing this unless there is anything new to add that points to a GlusterFS problem. Thanks for providing whatever info I needed here. Do re-open if you think otherwise.