Bug 762618 (GLUSTER-886) - Application xcrysden does not start when glusterfs mount point is given as scratch directory
Summary: Application xcrysden does not start when glusterfs mount point is given as sc...
Keywords:
Status: CLOSED NOTABUG
Alias: GLUSTER-886
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-05 10:46 UTC by Sachidananda Urs
Modified: 2010-05-17 06:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Logs with trace (86.17 KB, text/x-log)
2010-05-05 08:12 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2010-05-05 08:12:58 UTC
Created attachment 188 [details]
Fixed S10network script

Comment 1 Sachidananda Urs 2010-05-05 10:46:45 UTC
Setup: 4 node distribute.

set XCRYSDEN_SCRATCH=<gluster mount point>

xcrysden version: 1.4.1 Statically linked. Can be downloaded from http://www.xcrysden.org/download/xc-1.4.1bin-linuxPC-static.tar.gz  (GPLed)

When xcrysden is run by setting XCRYSDEN_SCRATCH to a local file system it runs  without errors.

This happens in both 2.0.x and 3.0.x versions.

Comment 2 Shehjar Tikoo 2010-05-06 06:57:46 UTC
A hint is in the log. Right before the log ends, it says:

[2010-05-05 16:37:44] N [fuse-bridge.c:3140:fuse_thread_proc] glusterfs-fuse: terminating upon getting ENODEV when reading /dev/fuse

Looking further.

Comment 3 Sachidananda Urs 2010-05-10 03:26:35 UTC
Version: 1.4.1

XCRYSDEN reference: 
A. Kokalj: J. Mol. Graphics Modelling, 1999, Vol. 17, 176--179.
A. Kokalj: Comput. Mater. Science, 2003, Vol. 28, 155--168.
A. Kokalj and M. Causa: Proceedings of High Performance Graphics 
Systems and Applications European Workshop, 
Bologna, Italy, 2000, 51--54.

Please report bugs to: Tone.Kokalj


TERMS OF USE:
-------------
XCRYSDEN is released under the GNU General Public License.

Whenever graphics generated by XCRYSDEN are used in scientific
publications, it shall be greatly appreciated to include an explicit
reference. The preferred form is the following:

[ref] A. Kokalj: J. Mol. Graphics Modelling, 1999, Vol. 17,
176--179. Code available from http://www.xcrysden.org/.

Running on platform: unix
Executing: /appl/XCrySDen-1.4.1bin-static/bin/ftnunit
Error in startup script: error deleting "fort_unit": file already exists
while executing
"file delete -force fort_unit"
(procedure "FtnName" line 22)
invoked from within
"FtnName"
invoked from within
"set system(ftn_name) [FtnName]"
("eval" body line 65)
invoked from within
"eval [info body xcInit]"
(file "/appl/XCrySDen-1.4.1bin-static/Tcl/xcInit.tcl" line 518)
--------------------------------

Comment 4 Shehjar Tikoo 2010-05-10 03:54:07 UTC
I am able to reproduce the problem on mainline:
Executing: /tmp/XCrySDen-1.4.1bin-static/bin/ftnunit
Error in startup script: error deleting "fort_unit": file already exists
    while executing
"file delete -force fort_unit"
    (procedure "FtnName" line 22)
    invoked from within
"FtnName"
    invoked from within
"set system(ftn_name) [FtnName]"
    ("eval" body line 65)
    invoked from within
"eval [info body xcInit]"
    (file "/tmp/XCrySDen-1.4.1bin-static/Tcl/xcInit.tcl" line 518)



The problem is cropping up because xc is trying to remove a directory that is not empty:
[2010-05-10 12:20:39] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 48: RMDIR() /xc_27403/fort_unit => -1 (Directory not empty)
[2010-05-10 12:20:39] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 54: RMDIR() /xc_27403/fort_unit => -1 (Directory not empty)



hence the following message in strace:
[pid 27450] rmdir("fort_unit")          = -1 ENOTEMPTY (Directory not empty)

Comment 5 Sachidananda Urs 2010-05-10 04:11:16 UTC
If the scratch directory is configured to a local disk filesystem, then it works perfectly fine.

Comment 6 Shehjar Tikoo 2010-05-10 04:12:25 UTC
When the scratch dir is set to /tmp/xctmp, xc is starting up correctly. Once
the start-up completes, the directory contains:

core  dir1

In the previous run over GlusterFS, the fort_unit directory contained a file
called fort.99. It is possible that the start-up process tried to remove
fort.99 but that failed over GlusterFS mount point.

Comment 7 Shehjar Tikoo 2010-05-10 04:21:39 UTC
The GlusterFS client log has some more info that hints at the problem:


############FIRST THE fort.99 FILE IS CREATED##################
[2010-05-10 12:33:04] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 33: LOOKUP() /xc_27590/fort_unit/fort.99 => -1 (No such file or directory)
[2010-05-10 12:33:04] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 34: LOOKUP /xc_27590/fort_unit/fort.99
[2010-05-10 12:33:04] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 34: LOOKUP() /xc_27590/fort_unit/fort.99 => -1 (No such file or directory)
[2010-05-10 12:33:04] T [fuse-bridge.c:1766:fuse_create] glusterfs-fuse: 35: CREATE /xc_27590/fort_unit/fort.99
[2010-05-10 12:33:04] T [dht-common.c:3050:dht_create] dist: creating /xc_27590/fort_unit/fort.99 on localdiskclient4
[2010-05-10 12:33:04] T [fuse-bridge.c:1658:fuse_create_cbk] glusterfs-fuse: 35: CREATE() /xc_27590/fort_unit/fort.99 => 0x1f9b5a0 (ino=458853)
[2010-05-10 12:33:04] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 36: GETATTR 33146320 (/xc_27590/fort_unit)
[2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 36: STAT() /xc_27590/fort_unit => 458840
[2010-05-10 12:33:04] T [fuse-bridge.c:2662:fuse_getxattr] glusterfs-fuse: 37: GETXATTR /xc_27590/fort_unit/fort.99/33137440 (security.capability)
[2010-05-10 12:33:04] T [fuse-bridge.c:2523:fuse_xattr_cbk] glusterfs-fuse: 37: GETXATTR() /xc_27590/fort_unit/fort.99 => 0


######################DATA GETS WRITTEN TO fort.99#########################
[2010-05-10 12:33:04] T [fuse-bridge.c:1958:fuse_write] glusterfs-fuse: 38: WRITE (0x1f9b5a0, size=6, offset=0)
[2010-05-10 12:33:04] T [fuse-bridge.c:1906:fuse_writev_cbk] glusterfs-fuse: 38: WRITE => 6/6,0/0
[2010-05-10 12:33:04] T [fuse-bridge.c:812:fuse_getattr] glusterfs-fuse: 39: FGETATTR 33137440 (/xc_27590/fort_unit/fort.99/0x1f9b5a0)
[2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 39: FSTAT() /xc_27590/fort_unit/fort.99 => 458853
[2010-05-10 12:33:04] T [fuse-bridge.c:1997:fuse_flush] glusterfs-fuse: 40: FLUSH 0x1f9b5a0
[2010-05-10 12:33:04] T [fuse-bridge.c:1148:fuse_err_cbk] glusterfs-fuse: 40: FLUSH() ERR => 0
[2010-05-10 12:33:04] T [fuse-bridge.c:2056:fuse_release] glusterfs-fuse: 41: RELEASE 0x1f9b5a0



######################fort_unit DIRECTORY IS OPENED#########################
[2010-05-10 12:33:04] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 42: OPENDIR /xc_27590/fort_unit
[2010-05-10 12:33:04] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 42: OPENDIR() /xc_27590/fort_unit => 0x1f9b5a0
[2010-05-10 12:33:04] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 43: READDIR (0x1f9b5a0, size=4096, offset=0)
[2010-05-10 12:33:04] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 43: READDIR => 2/4096,0

########HINT OF A PROBLEM ABOVE########################
We know fort_unit by now contains the fort.99 file, then why does fuse_readdir_cbk show that op_ret = 2, which means there are just 2 entries in the directory, i.e. the "." and  ".." and no fort.99.
######################################################


[2010-05-10 12:33:04] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 44: RELEASEDIR 0x1f9b5a0
[2010-05-10 12:33:04] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 45: GETATTR 33146320 (/xc_27590/fort_unit)
[2010-05-10 12:33:04] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 45: STAT() /xc_27590/fort_unit => 458840
[2010-05-10 12:33:04] T [fuse-bridge.c:1036:fuse_setattr] glusterfs-fuse: 46: SETATTR (33146320)/xc_27590/fort_unit
[2010-05-10 12:33:04] T [fuse-bridge.c:906:fuse_setattr_cbk] glusterfs-fuse: 46: SETATTR() /xc_27590/fort_unit => 458840

############# BUT ON RMDIR ################################
[2010-05-10 12:33:04] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 48: RMDIR /xc_27590/fort_unit

##############  DHT DOES SAY THAT THERE ARE 3 ENTRIES IN fort_unit##########

[2010-05-10 12:33:04] T [dht-common.c:3627:dht_rmdir_readdirp_cbk] dist: readdir on localdiskclient4 for /xc_27590/fort_unit returned 3 entries
[2010-05-10 12:33:04] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 48: RMDIR() /xc_27590/fort_unit => -1 (Directory not empty)

Comment 8 Shehjar Tikoo 2010-05-10 05:32:18 UTC
Running with debug/trace loaded before distribute and over both subvolumes of distribute, this log below shows:


[2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] dist: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient2trace: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient4trace: 42:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient4trace: 42: (op_ret=0, op_errno=22, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient2trace: 42: (op_ret=0, op_errno=22, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] dist: 42: (op_ret=0, op_errno=117, fd=0x1317ee0)
[2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] dist: 43: (fd=0x1317ee0, size=4096, offset=0)
[2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient2trace: 43: (fd=0x1317ee0, size=4096, offset=0)
[2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient2trace: 43 :(op_ret=2, op_errno=2)
[2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] dist: 43 :(op_ret=2, op_errno=0)

That distribute is sending a readdirp fop only to a first subvolume and not to the second one. Since the first one does not contain the file created earlier, i.e. fort.99, the op_ret=2, hence the rmdir follows and fails with ENOTEMPTY.

Comment 9 Shehjar Tikoo 2010-05-10 05:49:47 UTC
The lines that follow the previous log contents:

[2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] dist: 52 :(op_ret=2, op_errno=0)
[2010-05-10 13:56:03] N [trace.c:1392:trace_rmdir] dist: 54: (loc {path=/xc_28556/fort_unit, ino=229416})
[2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient2trace: 54:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1318560)
[2010-05-10 13:56:03] N [trace.c:1765:trace_opendir] localdiskclient4trace: 54:( loc {path=/xc_28556/fort_unit, ino=229416}, fd=0x1318560)
[2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient2trace: 54: (op_ret=0, op_errno=22, fd=0x1318560)
[2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient2trace: 54: (fd=0x1318560, size=4096, offset=0)
[2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient2trace: 54 :(op_ret=2, op_errno=2)
[2010-05-10 13:56:03] N [trace.c:816:trace_opendir_cbk] localdiskclient4trace: 54: (op_ret=0, op_errno=22, fd=0x1318560)
[2010-05-10 13:56:03] N [trace.c:1782:trace_readdirp] localdiskclient4trace: 54: (fd=0x1318560, size=4096, offset=0)
[2010-05-10 13:56:03] N [trace.c:298:trace_readdirp_cbk] localdiskclient4trace: 54 :(op_ret=3, op_errno=2)
[2010-05-10 13:56:03] N [trace.c:851:trace_rmdir_cbk] dist: 54: (op_ret=-1, op_errno=39)
[2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] dist: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755)
[2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] localdiskclient4trace: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755)
[2010-05-10 13:56:03] N [trace.c:1476:trace_setattr] localdiskclient2trace: 55: (loc {path=/xc_28556/fort_unit, ino=229416}, mode=755)
[2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] localdiskclient2trace: 55: (op_ret=0, *statpre {ia_ino=114708, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=114708, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]})
[2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] localdiskclient4trace: 55: (op_ret=0, *statpre {ia_ino=114709, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=114709, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]})
[2010-05-10 13:56:03] N [trace.c:386:trace_setattr_cbk] dist: 55: (op_ret=0, *statpre {ia_ino=229416, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]}, *statpost {ia_ino=229416, st_mode=40755, ia_uid=1000, ia_gid=1000, ia_atime=[May 10 13:56:03], ia_mtime=[May 10 13:56:03], ia_ctime=[May 10 13:56:03]})

Comment 10 Shehjar Tikoo 2010-05-10 07:51:37 UTC
The gdb step into shows why the STACK_WIND does not happen for the second subvolume which contains 3 entries, including the one for fort.99 file:

(gdb) 
2216		op_ret = count;
(gdb) p count
$2 = 2
(gdb) n
2224	        if (prev->this != dht_last_up_subvol (this))
(gdb) 
2225	                op_errno = 0;
(gdb) p prev
$3 = (call_frame_t *) 0x12aeb80
(gdb) p *prev->this
$4 = {name = 0x129f3f0 "localdiskclient2trace", type = 0x129fd70 "debug/trace", next = 0x129e700, prev = 0x12a0520, parents = 0x12a1fa0, children = 0x12a04e0, options = 0x129f080, 
  fops = 0x7ff52da557c0, mops = 0x7ff52da55a80, cbks = 0x7ff52da55aa0, dumpops = 0x0, volume_options = {next = 0x129fdd0, prev = 0x129fdd0}, fini = 0x7ff52d85239b <fini>, 
  init = 0x7ff52d851fef <init>, notify = 0x7ff52ec399c3 <default_notify>, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, 
  ctx = 0x1296010, itable = 0x0, ready = 1 '\001', init_succeeded = 1 '\001', private = 0x0}
(gdb) n
2228		if (count == 0) {
(gdb) n
2249		if (op_ret < 0)
(gdb) n
2252		DHT_STACK_UNWIND (readdirp, frame, op_ret, op_errno, &entries);
(gdb) n
2254		gf_dirent_free (&entries);

Comment 11 Shehjar Tikoo 2010-05-10 08:23:05 UTC
Syscalls leading up to the rmdir show that getdents for /mnt/fort_unit is called but only once after the directory has been opened.


14567 open("/mnt/xc_14567/fort_unit/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
14567 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14567 fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0
14567 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff76b8000
14567 getdents64(4, /* 2 entries */, 131072) = 48
################2 entries returned ##################




14567 munmap(0xf76b8000, 135168)        = 0
14567 close(4)                          = 0
14567 getcwd("/mnt/xc_14567/fort_unit", 4097) = 24
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 lstat64("/mnt/xc_14567", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 chdir("/mnt/xc_14567")            = 0
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 getcwd("/mnt/xc_14567", 4097)     = 14
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 lstat64("/mnt/xc_14567", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14567 lstat64("/mnt/xc_14567/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14567 getcwd("/mnt/xc_14567", 4097)     = 14
14567 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14567 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14567 chmod("fort_unit", 040755)        = 0
14567 rmdir("fort_unit")                = -1 ENOTEMPTY (Directory not empty)

Comment 12 Shehjar Tikoo 2010-05-10 08:39:18 UTC
xc app does not fail always. It does run to completion in some runs. For eg, the strace log of a full run:

14843 getcwd("/mnt/xc_14843/fort_unit", 4097) = 24
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 stat64("/mnt/xc_14843/fort_unit/", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 open("/mnt/xc_14843/fort_unit/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
14843 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0
14843 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7611000
14843 getdents64(4, /* 3 entries */, 131072) = 80
#################
The difference this time is that the 3 entries are returned the first time.
And 2 entries in the second getdents call below.
#########################################


14843 _llseek(4, 1403963200, [1403963200], SEEK_SET) = 0
14843 getdents64(4, /* 2 entries */, 131072) = 48
14843 munmap(0xf7611000, 135168)        = 0
14843 close(4)                          = 0
14843 getcwd("/mnt/xc_14843/fort_unit", 4097) = 24
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 chdir("/mnt/xc_14843")            = 0
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 getcwd("/mnt/xc_14843", 4097)     = 14
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 lstat64("/mnt/xc_14843", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 lstat64("/mnt/xc_14843/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 getcwd("/mnt/xc_14843", 4097)     = 14
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 lstat64("/mnt/xc_14843/fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 getcwd("/mnt/xc_14843", 4097)     = 14
14843 lstat64("/mnt", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
14843 stat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 chmod("fort_unit", 040755)        = 0
14843 rmdir("fort_unit")                = -1 ENOTEMPTY (Directory not empty)
14843 lstat64("fort_unit", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 open("fort_unit", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
14843 fstat64(4, {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
14843 fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0
14843 old_mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7611000
14843 getdents64(4, /* 3 entries */, 131072) = 80
14843 _llseek(4, 1403963200, [1403963200], SEEK_SET) = 0
14843 lstat64("fort_unit/fort.99", {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
14843 unlink("fort_unit/fort.99")       = 0
14843 getdents64(4, /* 2 entries */, 131072) = 48
14843 munmap(0xf7611000, 135168)        = 0
14843 close(4)                          = 0
14843 rmdir("fort_unit")                = 0

Comment 13 Shehjar Tikoo 2010-05-10 08:46:33 UTC
GlusterFS client log of a successful run:

[2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 273: MKDIR() /xc_18125/fort_unit => 229496 (0)
[2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 274: GETATTR 8570640 (/xc_18125)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 274: STAT() /xc_18125 => 229486
[2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 275: LOOKUP /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 275: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory)
[2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 276: LOOKUP /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 276: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory)
[2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 277: LOOKUP /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [fuse-bridge.c:523:fuse_entry_cbk] glusterfs-fuse: 277: LOOKUP() /xc_18125/fort_unit/fort.99 => -1 (No such file or directory)
[2010-05-10 17:11:23] T [fuse-bridge.c:1766:fuse_create] glusterfs-fuse: 278: CREATE /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [dht-common.c:3051:dht_create] dist: creating /xc_18125/fort_unit/fort.99 on localdiskclient2
[2010-05-10 17:11:23] T [fuse-bridge.c:1658:fuse_create_cbk] glusterfs-fuse: 278: CREATE() /xc_18125/fort_unit/fort.99 => 0x82dcd0 (ino=229500)
[2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 279: GETATTR 8572896 (/xc_18125/fort_unit)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 279: STAT() /xc_18125/fort_unit => 229496
[2010-05-10 17:11:23] T [fuse-bridge.c:2662:fuse_getxattr] glusterfs-fuse: 280: GETXATTR /xc_18125/fort_unit/fort.99/8573040 (security.capability)
[2010-05-10 17:11:23] T [fuse-bridge.c:2523:fuse_xattr_cbk] glusterfs-fuse: 280: GETXATTR() /xc_18125/fort_unit/fort.99 => 0
[2010-05-10 17:11:23] T [fuse-bridge.c:1958:fuse_write] glusterfs-fuse: 281: WRITE (0x82dcd0, size=6, offset=0)
[2010-05-10 17:11:23] T [fuse-bridge.c:1906:fuse_writev_cbk] glusterfs-fuse: 281: WRITE => 6/6,0/0
[2010-05-10 17:11:23] T [fuse-bridge.c:812:fuse_getattr] glusterfs-fuse: 282: FGETATTR 8573040 (/xc_18125/fort_unit/fort.99/0x82dcd0)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 282: FSTAT() /xc_18125/fort_unit/fort.99 => 229500
[2010-05-10 17:11:23] T [fuse-bridge.c:1997:fuse_flush] glusterfs-fuse: 283: FLUSH 0x82dcd0
[2010-05-10 17:11:23] T [fuse-bridge.c:1148:fuse_err_cbk] glusterfs-fuse: 283: FLUSH() ERR => 0
[2010-05-10 17:11:23] T [fuse-bridge.c:2056:fuse_release] glusterfs-fuse: 284: RELEASE 0x82dcd0
[2010-05-10 17:11:23] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 285: OPENDIR /xc_18125/fort_unit
[2010-05-10 17:11:23] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 285: OPENDIR() /xc_18125/fort_unit => 0x82dcd0
[2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 286: READDIR (0x82dcd0, size=4096, offset=0)
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: fort.99
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: ..
[2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 286: READDIR => 3/4096,0
[2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 287: READDIR (0x82dcd0, size=4096, offset=1403963200)
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: ..
[2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 287: READDIR => 2/4096,1403963200
[2010-05-10 17:11:23] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 288: RELEASEDIR 0x82dcd0
[2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 289: GETATTR 8572896 (/xc_18125/fort_unit)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 289: STAT() /xc_18125/fort_unit => 229496
[2010-05-10 17:11:23] T [fuse-bridge.c:1036:fuse_setattr] glusterfs-fuse: 290: SETATTR (8572896)/xc_18125/fort_unit
[2010-05-10 17:11:23] T [fuse-bridge.c:906:fuse_setattr_cbk] glusterfs-fuse: 290: SETATTR() /xc_18125/fort_unit => 229496
[2010-05-10 17:11:23] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 292: RMDIR /xc_18125/fort_unit
[2010-05-10 17:11:23] T [dht-common.c:3628:dht_rmdir_readdirp_cbk] dist: readdir on localdiskclient2 for /xc_18125/fort_unit returned 3 entries
[2010-05-10 17:11:23] D [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 292: RMDIR() /xc_18125/fort_unit => -1 (Directory not empty)
[2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 294: LOOKUP /xc_18125/fort_unit
[2010-05-10 17:11:23] T [dht-layout.c:309:dht_disk_layout_merge] dist: merged to layout: 0 - 2147483646 (type 0) from localdiskclient2
[2010-05-10 17:11:23] T [dht-layout.c:309:dht_disk_layout_merge] dist: merged to layout: 2147483647 - 4294967295 (type 0) from localdiskclient4
[2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 294: LOOKUP() /xc_18125/fort_unit => 229496 (0)
[2010-05-10 17:11:23] T [fuse-bridge.c:2126:fuse_opendir] glusterfs-fuse: 295: OPENDIR /xc_18125/fort_unit
[2010-05-10 17:11:23] T [fuse-bridge.c:844:fuse_fd_cbk] glusterfs-fuse: 295: OPENDIR() /xc_18125/fort_unit => 0x82e360
[2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 296: READDIR (0x82e360, size=4096, offset=0)
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: fort.99
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: ..
[2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 296: READDIR => 3/4096,0
[2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 297: GETATTR 8572896 (/xc_18125/fort_unit)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 297: STAT() /xc_18125/fort_unit => 229496
[2010-05-10 17:11:23] T [fuse-bridge.c:596:fuse_lookup] glusterfs-fuse: 298: LOOKUP /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [fuse-bridge.c:470:fuse_entry_cbk] glusterfs-fuse: 298: LOOKUP() /xc_18125/fort_unit/fort.99 => 229500 (0)
[2010-05-10 17:11:23] T [fuse-bridge.c:1419:fuse_unlink] glusterfs-fuse: 299: UNLINK /xc_18125/fort_unit/fort.99
[2010-05-10 17:11:23] T [fuse-bridge.c:1207:fuse_unlink_cbk] glusterfs-fuse: 299: UNLINK() /xc_18125/fort_unit/fort.99 => 0
[2010-05-10 17:11:23] T [fuse-bridge.c:2247:fuse_readdir] glusterfs-fuse: 301: READDIR (0x82e360, size=4096, offset=1403963200)
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: .
[2010-05-10 17:11:23] E [dht-common.c:2185:dht_readdirp_cbk] DHT: ENTRY: ..
[2010-05-10 17:11:23] T [fuse-bridge.c:2192:fuse_readdir_cbk] glusterfs-fuse: 301: READDIR => 2/4096,1403963200
[2010-05-10 17:11:23] T [fuse-bridge.c:2265:fuse_releasedir] glusterfs-fuse: 302: RELEASEDIR 0x82e360
[2010-05-10 17:11:23] T [fuse-bridge.c:1450:fuse_rmdir] glusterfs-fuse: 303: RMDIR /xc_18125/fort_unit
[2010-05-10 17:11:23] T [fuse-bridge.c:1207:fuse_unlink_cbk] glusterfs-fuse: 303: RMDIR() /xc_18125/fort_unit => 0
[2010-05-10 17:11:23] T [fuse-bridge.c:803:fuse_getattr] glusterfs-fuse: 305: GETATTR 8570640 (/xc_18125)
[2010-05-10 17:11:23] T [fuse-bridge.c:704:fuse_attr_cbk] glusterfs-fuse: 305: STAT() /xc_18125 => 229486

Comment 14 Shehjar Tikoo 2010-05-10 10:27:43 UTC
I've tested this with 2.0.6, 2.0.9, 3.0.1 and all have this problem.

I am also increasingly feeling that the bug might be in the tcl implementation that is packaged along with this application's binary.

Comment 15 Shehjar Tikoo 2010-05-11 01:16:37 UTC
Sac

We need to confirm if this application is being run for the first time on GlusterFS? Did it ever run for him on earlier versions?

Comment 16 Sachidananda Urs 2010-05-12 06:13:25 UTC
> We need to confirm if this application is being run for the first time on
> GlusterFS? Did it ever run for him on earlier versions?

This is the first time they are running this application.

Comment 17 Shehjar Tikoo 2010-05-14 04:35:12 UTC
Either the application or the TCL interpreter is calling getdents or readdir while assuming that the fort.99 entry will be available from the first call to this function. This is not a correct use of getdents, since the app or the TCL interpreter needs to continue calling getdents till it returns 0, i.e. EOF.

The problem is in the interaction of distribute and such non-conformant behaviour of TCL/xc because on each getdents call, distribute returns entries from one subvolume, i.e. it does not aggregate entries from multiple subvolumes, knowing that the app must call getdents again till it receives EOF.

distribute returns EOF when it has returned the last directory entry of the last subvolume.

The non-conformant behaviour of TCL or xc is confirmed in runs where distribute does return the fort.99 entry in the first call to getdents(see the logs of successful xc runs over glusterfs). This is when the xc runs to completion over GlusterFS.

There are a couple of things we can do here:

1. Run the app with upgraded TCL interpreter and associated libs. If the bug is in TCL, xc will run just fine, assuming the bug has been fixed. Know that the TCL that comes packaged with xc is a 2005 release.

2. If this does not fix it, file a bug with xc developers. They might be using getdents in a non-conformant manner.

3. If xc is not using getdents, then file a bug with TCL developers.

Comment 18 Shehjar Tikoo 2010-05-14 04:50:15 UTC
Sac, i can confirm that xc runs fine over a storage/posix only config.

Comment 19 Sachidananda Urs 2010-05-14 04:52:12 UTC
> Sac, i can confirm that xc runs fine over a storage/posix only config.

Thank you, I will confirm this to the customer.

Comment 20 Shehjar Tikoo 2010-05-17 03:20:29 UTC
Sac, I am closing this unless there is anything new to add that points to a GlusterFS problem. Thanks for providing whatever info I needed here. Do re-open if you think otherwise.


Note You need to log in before you can comment on or make changes to this bug.