Bug 1041109 - structure needs cleaning
Summary: structure needs cleaning
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: 3.4.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 13:39 UTC by Johan Huysmans
Modified: 2015-10-07 13:50 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-10-07 13:49:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterfs logfile with DEBUG (16.21 KB, text/x-log)
2013-12-12 13:39 UTC, Johan Huysmans
no flags Details

Description Johan Huysmans 2013-12-12 13:39:12 UTC
Created attachment 835798 [details]
glusterfs logfile with DEBUG

Description of problem:
When reading a file from the fuse.glusterfs mountpoint I receive a "Structure needs cleaning"

Version-Release number of selected component (if applicable):
3.4.1-3 (from gluster.org)

my Setup:
I have 2 servers with a 64bit system running the glusterd with following configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.quick-read: off
performance.open-behind: off
performance.lazy-open: off
performance.flush-behind: off
network.ping-timeout: 5
performance.stat-prefetch: off
performance.force-readdirp: on

I have 2 clients with a 32bit system running the fuse.glusterd:
from '/proc/mounts':
SRV-0:/testvolume /mnt/sharedfs fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
ps output:
/usr/sbin/glusterfs --enable-ino32 --volfile-id=/testvolume --volfile-server=SRV-0 /mnt/sharedfs

How reproducible:
After some days after running without problems the problems occurs.
Once it occurs it is reproducable every time.

Steps to Reproduce, use-case 1: create file + rename file
[root@CL-1 sharedfs]# echo "hello" > test
[root@CL-1 sharedfs]# md5sum test
b1946ac92492d2347c6235b4d2611184  test

[root@CL-2 sharedfs]# md5sum test
b1946ac92492d2347c6235b4d2611184  test

[root@CL-1 sharedfs]# mv test test2
[root@CL-1 sharedfs]# md5sum test test2
md5sum: test: No such file or directory
b1946ac92492d2347c6235b4d2611184  test2

[root@CL-2 amp1]# md5sum test test2
b1946ac92492d2347c6235b4d2611184  test
b1946ac92492d2347c6235b4d2611184  test2

[root@CL-1 amp1]# rm test2
rm: remove regular file `test2'? y
[root@CL-1 amp1]# md5sum test test2
md5sum: test: No such file or directory
md5sum: test2: No such file or directory

[root@CL-2 amp1]# md5sum test test2
md5sum: test: Structure needs cleaning
md5sum: test2: No such file or directory


Steps to Reproduce, use-case 2: mv new file over old file
[root@CL-1 amp1]# echo "hello" > world
[root@CL-1 amp1]# cat world 
hello

[root@CL-2 amp1]# cat world 
hello

[root@CL-1 amp1]# echo "world" > hello
[root@CL-1 amp1]# mv hello world 
mv: overwrite `world'? y

[root@CL-2 amp1]# cat world 
cat: world: Structure needs cleaning

problems:
* When a file is moved (renamed) the old name shouldn't be available
* "Structure needs cleaning" shouldn't occur

Comment 1 Khoi Mai 2013-12-19 20:30:52 UTC
I've upgraded the gluster bricks and clients from glusterfs3.4.1-3
to glusterfs3.4.2qa4

glusterfs --version
glusterfs 3.4.2qa4 built on Dec 17 2013 17:15:16

[root@brick1 ~]# gluster volume info khoi

Volume Name: khoi
Type: Distributed-Replicate
Volume ID: dc075c4f-df76-482a-a750-efccb346527e
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: omhq1826:/static/khoi
Brick2: omdx1448:/static/khoi
Brick3: omhq1832:/static/khoi
Brick4: omdx14f0:/static/khoi
Options Reconfigured:
features.quota: on
geo-replication.indexing: off
features.limit-usage: /:10GB
performance.write-behind: on
performance.lazy-open: off


Steps to reproduce: client1:create file, edit save file.  client2:read file changed by client1

[root@client1 khoi]# date
Thu Dec 19 14:06:55 CST 2013
[root@client1 khoi]# echo "hello" > world
[root@client1 khoi]# cat world
hello
[root@client1 khoi]# ls -lt world
-rw-r--r-- 1 root root 6 Dec 19 14:07 world
[root@client1 khoi]# vi world
[root@client1 khoi]# ls -lrt
total 1
-rw-r--r-- 1 root root 62 Dec 19 14:08 world
[root@client1 khoi]# cat world
hello
this is a test and the date/time is 12/19/2013 14:08PM.
[root@client1 khoi]#

[root@client2 khoi]# date
Thu Dec 19 14:06:57 CST 2013
[root@client2 khoi]# ls -lt
total 1
-rw-r--r-- 1 root root 6 Dec 19 14:07 world
[root@client2 khoi]# cat world
hello
[root@client2 khoi]# cat world
cat: world: Stale file handle
[root@client2 khoi]# strace -o /tmp/stale_world.txt cat world
cat: world: Stale file handle
[root@client2 khoi]# strace -o /tmp/ls_stale_path.txt ls -l
total 1
-rw-r--r-- 1 root root 62 Dec 19 14:08 world
[root@client2 khoi]# strace -o /tmp/cleaned_file_world.txt cat world
hello
this is a test and the date/time is 12/19/2013 14:08PM.
[root@client2 khoi]#

client2 strace log of stale cat of file http://fpaste.org/63343/48439613/
client2 strace log long listing of directory http://fpaste.org/63344/74844421/
client2 strace log of cat-able file http://fpaste.org/63345/84506138/

client1 volume log http://fpaste.org/63347/38748469/
client2 volume log http://fpaste.org/63346/87484654/

brick1 log http://fpaste.org/63348/13874847/
brick2 log http://fpaste.org/63349/84806138/
brick3 log http://fpaste.org/63350/13874848/
brick4 log http://fpaste.org/63351/13874848/

Comment 2 Johan Huysmans 2014-01-16 10:08:20 UTC
I retest this with glusterfs 3.4.2 built on Jan  3 2014 12:38:26.

The error "Structure needs cleaning" is not show. Instead it just displays "No such file or directory".
However that file is available on the other node.

This seems a critical bug to me as it are only mv operations to trigger this problem!

This is how I can reproduce it:
[root@SRV-1 sharedfs]# echo "world" > test
[root@SRV-1 sharedfs]# md5sum test
591785b794601e212b260e25925636fd  test

[root@SRV-2 sharedfs]# md5sum test
591785b794601e212b260e25925636fd  test

[root@SRV-1 sharedfs]# mv test test2
[root@SRV-1 sharedfs]# md5sum test test2
md5sum: test: No such file or directory
591785b794601e212b260e25925636fd  test2

[root@SRV-2 sharedfs]# md5sum test test2
591785b794601e212b260e25925636fd  test
591785b794601e212b260e25925636fd  test2

[root@SRV-1 sharedfs]# echo "hello" > test
[root@SRV-1 sharedfs]# md5sum test test2
b1946ac92492d2347c6235b4d2611184  test
591785b794601e212b260e25925636fd  test2

[root@SRV-2 sharedfs]# md5sum test test2
591785b794601e212b260e25925636fd  test
591785b794601e212b260e25925636fd  test2

[root@SRV-1 sharedfs]# rm test2 
rm: remove regular file `test2'? y
[dia3-blade_1-root@SRV-1 sharedfs]# md5sum test test2
b1946ac92492d2347c6235b4d2611184  test
md5sum: test2: No such file or directory


[root@SRV-2 sharedfs]# md5sum test test2
md5sum: test: No such file or directory
md5sum: test2: No such file or directory

[root@SRV-2 sharedfs]# ls -al test*
-rw-r--r-- 1 root root 6 Jan 16 09:47 test

[root@SRV-2 sharedfs]# md5sum test test2
b1946ac92492d2347c6235b4d2611184  test
md5sum: test2: No such file or directory

Comment 3 Khoi Mai 2014-01-20 19:22:45 UTC
Johan,

I have found in this document https://forge.gluster.org/hadoop/pages/InstallingAndConfiguringGlusterFS

that the behavior does not happen when testing on their provided 6.2 kernel.  I've tried to implement my reproducable steps and 100% it did not return "file not found".

I have opened a new case for a better understanding of the patch that was in the 6.2 kernel vs. what we are currently using in 6.4.

Comment 4 Khoi Mai 2014-01-31 22:41:33 UTC
I have tested this on rhel6.5 and it appears to have been patched.

Comment 5 Khoi Mai 2014-02-03 18:44:12 UTC
While the result is acceptable when 2 clients are reading/writing to the same file on rhel 6.5 2.6.32-431.3.1.el6.x86_64


The gluster client logs shows otherwise:

[root@omhq1cbf glusterfs]# cat mnt-khoi.log
[2014-02-03 18:24:29.379996] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 (/usr/sbin/glusterfs --volfile-id=/khoi --volfile-server=omhq1826 /mnt/khoi)
[2014-02-03 18:24:29.383834] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-02-03 18:24:29.383876] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2014-02-03 18:24:29.392828] I [quota.c:3051:quota_parse_limits] 0-khoi-quota: /:10737418240
[2014-02-03 18:24:29.392851] I [quota.c:3083:quota_parse_limits] 0-khoi-quota: /:10737418240
[2014-02-03 18:24:29.396296] I [socket.c:3480:socket_init] 0-khoi-client-3: SSL support is NOT enabled
[2014-02-03 18:24:29.396331] I [socket.c:3495:socket_init] 0-khoi-client-3: using system polling thread
[2014-02-03 18:24:29.397079] I [socket.c:3480:socket_init] 0-khoi-client-2: SSL support is NOT enabled
[2014-02-03 18:24:29.397095] I [socket.c:3495:socket_init] 0-khoi-client-2: using system polling thread
[2014-02-03 18:24:29.397867] I [socket.c:3480:socket_init] 0-khoi-client-1: SSL support is NOT enabled
[2014-02-03 18:24:29.397884] I [socket.c:3495:socket_init] 0-khoi-client-1: using system polling thread
[2014-02-03 18:24:29.398640] I [socket.c:3480:socket_init] 0-khoi-client-0: SSL support is NOT enabled
[2014-02-03 18:24:29.398657] I [socket.c:3495:socket_init] 0-khoi-client-0: using system polling thread
[2014-02-03 18:24:29.398697] I [client.c:2154:notify] 0-khoi-client-0: parent translators are ready, attempting connect on transport
[2014-02-03 18:24:29.402303] I [client.c:2154:notify] 0-khoi-client-1: parent translators are ready, attempting connect on transport
[2014-02-03 18:24:29.406581] I [client.c:2154:notify] 0-khoi-client-2: parent translators are ready, attempting connect on transport
[2014-02-03 18:24:29.410741] I [client.c:2154:notify] 0-khoi-client-3: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume khoi-client-0
  2:     type protocol/client
  3:     option transport-type tcp
  4:     option remote-subvolume /static/khoi
  5:     option remote-host omhq1826
  6: end-volume
  7:
  8: volume khoi-client-1
  9:     type protocol/client
 10:     option transport-type tcp
 11:     option remote-subvolume /static/khoi
 12:     option remote-host omdx1448
 13: end-volume
 14:
 15: volume khoi-client-2
 16:     type protocol/client
 17:     option transport-type tcp
 18:     option remote-subvolume /static/khoi
 19:     option remote-host omhq1832
 20: end-volume
 21:
 22: volume khoi-client-3
 23:     type protocol/client
 24:     option transport-type tcp
 25:     option remote-subvolume /static/khoi
 26:     option remote-host omdx14f0
 27: end-volume
 28:
 29: volume khoi-replicate-0
 30:     type cluster/replicate
 31:     option eager-lock on
 32:     subvolumes khoi-client-0 khoi-client-1
 33: end-volume
 34:
 35: volume khoi-replicate-1
 36:     type cluster/replicate
 37:     option eager-lock on
 38:     subvolumes khoi-client-2 khoi-client-3
 39: end-volume
 40:
 41: volume khoi-dht
 42:     type cluster/distribute
 43:     subvolumes khoi-replicate-0 khoi-replicate-1
 44: end-volume
 45:
 46: volume khoi-quota
 47:     type features/quota
 48:     option timeout 0
 49:     option limit-set /:10GB
 50:     subvolumes khoi-dht
 51: end-volume
 52:
 53: volume khoi-write-behind
 54:     type performance/write-behind
 55:     subvolumes khoi-quota
 56: end-volume
 57:
 58: volume khoi-read-ahead
 59:     type performance/read-ahead
 60:     subvolumes khoi-write-behind
 61: end-volume
 62:
 63: volume khoi-io-cache
 64:     type performance/io-cache
 65:     subvolumes khoi-read-ahead
 66: end-volume
 67:
 68: volume khoi-open-behind
 69:     type performance/open-behind
 70:     option lazy-open off
 71:     subvolumes khoi-io-cache
 72: end-volume
 73:
 74: volume khoi
 75:     type debug/io-stats
 76:     option count-fop-hits off
 77:     option latency-measurement off
 78:     subvolumes khoi-open-behind
 79: end-volume

+------------------------------------------------------------------------------+
[2014-02-03 18:24:29.415664] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-0: changing port to 49157 (from 0)
[2014-02-03 18:24:29.415703] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-2: changing port to 49157 (from 0)
[2014-02-03 18:24:29.415723] W [socket.c:514:__socket_rwv] 0-khoi-client-0: readv failed (No data available)
[2014-02-03 18:24:29.419345] W [socket.c:514:__socket_rwv] 0-khoi-client-2: readv failed (No data available)
[2014-02-03 18:24:29.422881] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-1: changing port to 49157 (from 0)
[2014-02-03 18:24:29.422946] W [socket.c:514:__socket_rwv] 0-khoi-client-1: readv failed (No data available)
[2014-02-03 18:24:29.426523] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:24:29.426584] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-3: changing port to 49158 (from 0)
[2014-02-03 18:24:29.426603] W [socket.c:514:__socket_rwv] 0-khoi-client-3: readv failed (No data available)
[2014-02-03 18:24:29.430129] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:24:29.430205] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-0: Connected to 72.37.14.110:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:24:29.430217] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:24:29.430255] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-0: Subvolume 'khoi-client-0' came back up; going online.
[2014-02-03 18:24:29.430565] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-0: Server lk version = 1
[2014-02-03 18:24:29.430637] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-2: Connected to 72.37.14.111:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:24:29.430648] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:24:29.430689] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-1: Subvolume 'khoi-client-2' came back up; going online.
[2014-02-03 18:24:29.430879] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:24:29.430965] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-2: Server lk version = 1
[2014-02-03 18:24:29.431307] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:24:29.431634] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-1: Connected to 72.37.1.80:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:24:29.431657] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:24:29.432015] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-3: Connected to 72.37.1.88:49158, attached to remote volume '/static/khoi'.
[2014-02-03 18:24:29.432046] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:24:29.438924] I [fuse-bridge.c:4769:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-02-03 18:24:29.439060] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-3: Server lk version = 1
[2014-02-03 18:24:29.439105] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-1: Server lk version = 1
[2014-02-03 18:24:29.439170] I [fuse-bridge.c:3724:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2014-02-03 18:24:29.439952] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-0: added root inode
[2014-02-03 18:24:29.440854] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-1: added root inode
[2014-02-03 18:25:51.919220] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory
[2014-02-03 18:25:51.919777] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory
[2014-02-03 18:25:51.922142] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 63: STAT() /world => -1 (Structure needs cleaning)
[2014-02-03 18:26:04.223082] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory
[2014-02-03 18:26:04.223625] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory
[2014-02-03 18:26:05.166949] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory
[2014-02-03 18:26:05.167483] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory
[2014-02-03 18:27:36.245658] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3f10ce8b6d] (-->/lib64/libpthread.so.0() [0x3f114079d1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down
[2014-02-03 18:27:36.245682] I [fuse-bridge.c:5260:fini] 0-fuse: Unmounting '/mnt/khoi'.
[2014-02-03 18:27:36.251209] I [fuse-bridge.c:4628:fuse_thread_proc] 0-fuse: unmounting /mnt/khoi
[2014-02-03 18:31:35.521296] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 (/usr/sbin/glusterfs --volfile-id=/khoi --volfile-server=omhq1826 /mnt/khoi)
[2014-02-03 18:31:35.528457] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-02-03 18:31:35.528505] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2014-02-03 18:31:35.537061] I [quota.c:3051:quota_parse_limits] 0-khoi-quota: /:10737418240
[2014-02-03 18:31:35.537083] I [quota.c:3083:quota_parse_limits] 0-khoi-quota: /:10737418240
[2014-02-03 18:31:35.540759] I [socket.c:3480:socket_init] 0-khoi-client-3: SSL support is NOT enabled
[2014-02-03 18:31:35.540795] I [socket.c:3495:socket_init] 0-khoi-client-3: using system polling thread
[2014-02-03 18:31:35.541531] I [socket.c:3480:socket_init] 0-khoi-client-2: SSL support is NOT enabled
[2014-02-03 18:31:35.541548] I [socket.c:3495:socket_init] 0-khoi-client-2: using system polling thread
[2014-02-03 18:31:35.542284] I [socket.c:3480:socket_init] 0-khoi-client-1: SSL support is NOT enabled
[2014-02-03 18:31:35.542304] I [socket.c:3495:socket_init] 0-khoi-client-1: using system polling thread
[2014-02-03 18:31:35.543017] I [socket.c:3480:socket_init] 0-khoi-client-0: SSL support is NOT enabled
[2014-02-03 18:31:35.543036] I [socket.c:3495:socket_init] 0-khoi-client-0: using system polling thread
[2014-02-03 18:31:35.543073] I [client.c:2154:notify] 0-khoi-client-0: parent translators are ready, attempting connect on transport
[2014-02-03 18:31:35.546654] I [client.c:2154:notify] 0-khoi-client-1: parent translators are ready, attempting connect on transport
[2014-02-03 18:31:35.550506] I [client.c:2154:notify] 0-khoi-client-2: parent translators are ready, attempting connect on transport
[2014-02-03 18:31:35.554341] I [client.c:2154:notify] 0-khoi-client-3: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume khoi-client-0
  2:     type protocol/client
  3:     option transport-type tcp
  4:     option remote-subvolume /static/khoi
  5:     option remote-host omhq1826
  6: end-volume
  7:
  8: volume khoi-client-1
  9:     type protocol/client
 10:     option transport-type tcp
 11:     option remote-subvolume /static/khoi
 12:     option remote-host omdx1448
 13: end-volume
 14:
 15: volume khoi-client-2
 16:     type protocol/client
 17:     option transport-type tcp
 18:     option remote-subvolume /static/khoi
 19:     option remote-host omhq1832
 20: end-volume
 21:
 22: volume khoi-client-3
 23:     type protocol/client
 24:     option transport-type tcp
 25:     option remote-subvolume /static/khoi
 26:     option remote-host omdx14f0
 27: end-volume
 28:
 29: volume khoi-replicate-0
 30:     type cluster/replicate
 31:     option eager-lock on
 32:     subvolumes khoi-client-0 khoi-client-1
 33: end-volume
 34:
 35: volume khoi-replicate-1
 36:     type cluster/replicate
 37:     option eager-lock on
 38:     subvolumes khoi-client-2 khoi-client-3
 39: end-volume
 40:
 41: volume khoi-dht
 42:     type cluster/distribute
 43:     subvolumes khoi-replicate-0 khoi-replicate-1
 44: end-volume
 45:
 46: volume khoi-quota
 47:     type features/quota
 48:     option timeout 0
 49:     option limit-set /:10GB
 50:     subvolumes khoi-dht
 51: end-volume
 52:
 53: volume khoi-write-behind
 54:     type performance/write-behind
 55:     subvolumes khoi-quota
 56: end-volume
 57:
 58: volume khoi-read-ahead
 59:     type performance/read-ahead
 60:     subvolumes khoi-write-behind
 61: end-volume
 62:
 63: volume khoi-io-cache
 64:     type performance/io-cache
 65:     subvolumes khoi-read-ahead
 66: end-volume
 67:
 68: volume khoi-open-behind
 69:     type performance/open-behind
 70:     option lazy-open off
 71:     subvolumes khoi-io-cache
 72: end-volume
 73:
 74: volume khoi
 75:     type debug/io-stats
 76:     option count-fop-hits off
 77:     option latency-measurement off
 78:     subvolumes khoi-open-behind
 79: end-volume

+------------------------------------------------------------------------------+
[2014-02-03 18:31:35.558946] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-0: changing port to 49157 (from 0)
[2014-02-03 18:31:35.558985] W [socket.c:514:__socket_rwv] 0-khoi-client-0: readv failed (No data available)
[2014-02-03 18:31:35.562438] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-2: changing port to 49157 (from 0)
[2014-02-03 18:31:35.562511] W [socket.c:514:__socket_rwv] 0-khoi-client-2: readv failed (No data available)
[2014-02-03 18:31:35.565931] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-3: changing port to 49158 (from 0)
[2014-02-03 18:31:35.565959] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-1: changing port to 49157 (from 0)
[2014-02-03 18:31:35.565976] W [socket.c:514:__socket_rwv] 0-khoi-client-3: readv failed (No data available)
[2014-02-03 18:31:35.569773] W [socket.c:514:__socket_rwv] 0-khoi-client-1: readv failed (No data available)
[2014-02-03 18:31:35.573627] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:31:35.573894] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:31:35.574188] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-0: Connected to 72.37.14.110:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:31:35.574228] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:31:35.574291] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-0: Subvolume 'khoi-client-0' came back up; going online.
[2014-02-03 18:31:35.574364] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:31:35.574423] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-2: Connected to 72.37.14.111:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:31:35.574434] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:31:35.574464] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-1: Subvolume 'khoi-client-2' came back up; going online.
[2014-02-03 18:31:35.574536] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-0: Server lk version = 1
[2014-02-03 18:31:35.574781] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-2: Server lk version = 1
[2014-02-03 18:31:35.574922] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2014-02-03 18:31:35.575082] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-3: Connected to 72.37.1.88:49158, attached to remote volume '/static/khoi'.
[2014-02-03 18:31:35.575106] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:31:35.575661] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-1: Connected to 72.37.1.80:49157, attached to remote volume '/static/khoi'.
[2014-02-03 18:31:35.575686] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2014-02-03 18:31:35.581057] I [fuse-bridge.c:4769:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-02-03 18:31:35.581227] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-1: Server lk version = 1
[2014-02-03 18:31:35.581257] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-3: Server lk version = 1
[2014-02-03 18:31:35.581835] I [fuse-bridge.c:3724:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2014-02-03 18:31:35.582654] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-0: added root inode
[2014-02-03 18:31:35.583444] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-1: added root inode
[2014-02-03 18:32:57.711800] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-khoi-client-0: remote operation failed: Stale file handle. Path: /world (898c7eb6-4a1b-4de7-9485-bd4496a3b31b)
[2014-02-03 18:32:57.712087] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-khoi-client-1: remote operation failed: Stale file handle. Path: /world (898c7eb6-4a1b-4de7-9485-bd4496a3b31b)
[2014-02-03 18:40:16.546774] I [fuse-bridge.c:4628:fuse_thread_proc] 0-fuse: unmounting /mnt/khoi
[2014-02-03 18:40:16.547158] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3f10ce8b6d] (-->/lib64/libpthread.so.0() [0x3f114079d1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down

Comment 6 Susant Kumar Palai 2014-02-13 10:06:22 UTC
I was able to reproduce once the "Stale file handle" issue as described by Khoi Mai in Comment 1. 

[root@vm1 home]# cat world 
cat: world: Stale file handle
[root@vm1 home]# cat world 
cat: world: Stale file handle
[root@vm1 home]# cat world 
cat: world: Stale file handle
[root@vm1 home]# cat world 
cat: world: Stale file handle
[root@vm1 home]# cat world 
cat: world: Stale file handle
[root@vm1 home]# cat world 
cat: world: Stale file handle

Then I cleaned up the cache by following command and it worked.
[root@vm1 home]# echo 3 > /proc/sys/vm/drop_caches
[root@vm1 home]# cat world 
jasdfdsfkjlksadf
ljldsaf
lkdsajfsaf
sdalfasfd
hello

Comment 7 Johan Huysmans 2014-02-13 10:12:51 UTC
Hi Susant Kumar Palai, 

Can you indicate on which gluster version and which centos/redhat version you could reproduce it?

This could be usefull as we have seen different behaviour between different versions of gluster and centos.

thx!

Comment 8 Susant Kumar Palai 2014-02-13 11:06:43 UTC
(In reply to Johan Huysmans from comment #7)
> Hi Susant Kumar Palai, 
> 
> Can you indicate on which gluster version and which centos/redhat version
> you could reproduce it?
> 
> This could be usefull as we have seen different behaviour between different
> versions of gluster and centos.
> 
> thx!

Hey Johan,
    I reproduced this in the latest upstream glusterfs(master) and I am using RHS.
   Would you please try to reproduce the same bug with "--use-readdirp=no" while mounting and update here. Just to narrow down the problem.(I tried the same and was not able to reproduce).

Thx!

Comment 9 Khoi Mai 2014-02-13 17:10:45 UTC
rhel 6.5 2.6.32-431.3.1.el6.x86_64 has proven to allow for concurrent access by the FUSE clients.

Comment 10 Johan Huysmans 2014-02-20 13:04:11 UTC
Hi Susant,

Sorry for this late response.
I was not able to find time to reproduce the issue with the --use-readdirp option. However in the next days I will plan on doing the tests.

You indicated that you reproduced the issue.
Can you provide more information on how you triggered the problem and what exactly is causing this issue (which combination of actions is needed to trigger the problem).

Thx.

Comment 11 Johan Huysmans 2014-02-24 14:17:17 UTC
Hi,

I'm running my testsetup with the --use-readdirp option.

I have 1 node writing (and moving) files, mounted with --use-readdirp.
I have 2 nodes reading files, 1 mounted with, 1 mounted without --use-readdirp.

I haven't received a "Stale File Handle" message, however I'm able to reproduce the action and results as described in comment 2.
The problem occurs on the node without the --use-readdirp=no option and 
The problem doesn't occur on the node with --use-readdirp=no option.

I hope this helps you in finding the root cause.

Comment 12 Khoi Mai 2014-02-24 14:59:47 UTC
when you say "nodes" are you referring to the clients?  I might try that.  I have an entry server that does changes to a volume, while i have another apache servers mounting up the same volume as read-only.  Is the goal only to reduce the logged error "stale file handle" even though functionally the file is accessible?

Comment 14 Susant Kumar Palai 2014-02-26 12:22:33 UTC
The problem was in readdirp kernel module. The fix has gone for RHEL 6.5.

Comment 15 Khoi Mai 2014-02-26 13:10:49 UTC
Is this feature documented anywhere describing when it is best to use it ?  Is it only for rhel 6?

Comment 16 Ivan Ilves 2014-03-11 09:56:52 UTC
I fear to sound stupid, but I got this "structure needs cleaning" error after 
setting volume option cluster.metadata-change-log to off. Error went away after
resetting cluster.metadata-change-log its to default value.
I was able to reproduce error on both RHEL 6.4 running GlusterFS 3.4.2 compiled 
from source and Ubuntu Server 12.04 running GlusterFS 3.4.2 from PPA. Both 64bit.

Comment 17 Khoi Mai 2014-04-28 21:26:11 UTC
Comment 14; are you saying when I mount gluster volumes as FUSE mounts I should use the --use-readdirp=no option?  Then the desired outcome is what?

Comment 18 Susant Kumar Palai 2014-04-29 05:15:57 UTC
Hi Khoi Mai,
    apologies for late response. Yes, you should use --use-readdirp=no option to avoid "Stale file handle error "

Comment 19 Khoi Mai 2014-06-02 19:58:13 UTC
Susant,

When i mount 

mount -t glusterfs -o use-readdirp=no omhq1826:/khoi /mnt

Do you suggestion only mounting this method to the clients where the reads are such as apache?  Or mount it everywhere for that volume?

Comment 20 Susant Kumar Palai 2014-06-03 06:07:39 UTC
Khoi,
  use-readdirp=no option is specific to the client process(mount). Well to avoid "Stale file handle error" you should use the option for all mount process.

Comment 21 Khoi Mai 2014-06-09 18:28:39 UTC
Susant, do you know if this will be addressed on future version of glusterfs like 3.4.4 , then i'll have to undo my changes ?

Comment 22 Joe Julian 2014-06-09 18:40:13 UTC
Susant: Which kernel release has this fixed?

Comment 23 Niels de Vos 2015-05-17 21:59:14 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 24 Kaleb KEITHLEY 2015-10-07 13:49:43 UTC
GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Comment 25 Kaleb KEITHLEY 2015-10-07 13:50:53 UTC
GlusterFS 3.4.x has reached end-of-life.\                                                   \                                                                               If this bug still exists in a later release please reopen this and change the version or open a new bug.


Note You need to log in before you can comment on or make changes to this bug.