Created attachment 835798 [details] glusterfs logfile with DEBUG Description of problem: When reading a file from the fuse.glusterfs mountpoint I receive a "Structure needs cleaning" Version-Release number of selected component (if applicable): 3.4.1-3 (from gluster.org) my Setup: I have 2 servers with a 64bit system running the glusterd with following configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.quick-read: off performance.open-behind: off performance.lazy-open: off performance.flush-behind: off network.ping-timeout: 5 performance.stat-prefetch: off performance.force-readdirp: on I have 2 clients with a 32bit system running the fuse.glusterd: from '/proc/mounts': SRV-0:/testvolume /mnt/sharedfs fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 ps output: /usr/sbin/glusterfs --enable-ino32 --volfile-id=/testvolume --volfile-server=SRV-0 /mnt/sharedfs How reproducible: After some days after running without problems the problems occurs. Once it occurs it is reproducable every time. Steps to Reproduce, use-case 1: create file + rename file [root@CL-1 sharedfs]# echo "hello" > test [root@CL-1 sharedfs]# md5sum test b1946ac92492d2347c6235b4d2611184 test [root@CL-2 sharedfs]# md5sum test b1946ac92492d2347c6235b4d2611184 test [root@CL-1 sharedfs]# mv test test2 [root@CL-1 sharedfs]# md5sum test test2 md5sum: test: No such file or directory b1946ac92492d2347c6235b4d2611184 test2 [root@CL-2 amp1]# md5sum test test2 b1946ac92492d2347c6235b4d2611184 test b1946ac92492d2347c6235b4d2611184 test2 [root@CL-1 amp1]# rm test2 rm: remove regular file `test2'? y [root@CL-1 amp1]# md5sum test test2 md5sum: test: No such file or directory md5sum: test2: No such file or directory [root@CL-2 amp1]# md5sum test test2 md5sum: test: Structure needs cleaning md5sum: test2: No such file or directory Steps to Reproduce, use-case 2: mv new file over old file [root@CL-1 amp1]# echo "hello" > world [root@CL-1 amp1]# cat world hello [root@CL-2 amp1]# cat world hello [root@CL-1 amp1]# echo "world" > hello [root@CL-1 amp1]# mv hello world mv: overwrite `world'? y [root@CL-2 amp1]# cat world cat: world: Structure needs cleaning problems: * When a file is moved (renamed) the old name shouldn't be available * "Structure needs cleaning" shouldn't occur
I've upgraded the gluster bricks and clients from glusterfs3.4.1-3 to glusterfs3.4.2qa4 glusterfs --version glusterfs 3.4.2qa4 built on Dec 17 2013 17:15:16 [root@brick1 ~]# gluster volume info khoi Volume Name: khoi Type: Distributed-Replicate Volume ID: dc075c4f-df76-482a-a750-efccb346527e Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: omhq1826:/static/khoi Brick2: omdx1448:/static/khoi Brick3: omhq1832:/static/khoi Brick4: omdx14f0:/static/khoi Options Reconfigured: features.quota: on geo-replication.indexing: off features.limit-usage: /:10GB performance.write-behind: on performance.lazy-open: off Steps to reproduce: client1:create file, edit save file. client2:read file changed by client1 [root@client1 khoi]# date Thu Dec 19 14:06:55 CST 2013 [root@client1 khoi]# echo "hello" > world [root@client1 khoi]# cat world hello [root@client1 khoi]# ls -lt world -rw-r--r-- 1 root root 6 Dec 19 14:07 world [root@client1 khoi]# vi world [root@client1 khoi]# ls -lrt total 1 -rw-r--r-- 1 root root 62 Dec 19 14:08 world [root@client1 khoi]# cat world hello this is a test and the date/time is 12/19/2013 14:08PM. [root@client1 khoi]# [root@client2 khoi]# date Thu Dec 19 14:06:57 CST 2013 [root@client2 khoi]# ls -lt total 1 -rw-r--r-- 1 root root 6 Dec 19 14:07 world [root@client2 khoi]# cat world hello [root@client2 khoi]# cat world cat: world: Stale file handle [root@client2 khoi]# strace -o /tmp/stale_world.txt cat world cat: world: Stale file handle [root@client2 khoi]# strace -o /tmp/ls_stale_path.txt ls -l total 1 -rw-r--r-- 1 root root 62 Dec 19 14:08 world [root@client2 khoi]# strace -o /tmp/cleaned_file_world.txt cat world hello this is a test and the date/time is 12/19/2013 14:08PM. [root@client2 khoi]# client2 strace log of stale cat of file http://fpaste.org/63343/48439613/ client2 strace log long listing of directory http://fpaste.org/63344/74844421/ client2 strace log of cat-able file http://fpaste.org/63345/84506138/ client1 volume log http://fpaste.org/63347/38748469/ client2 volume log http://fpaste.org/63346/87484654/ brick1 log http://fpaste.org/63348/13874847/ brick2 log http://fpaste.org/63349/84806138/ brick3 log http://fpaste.org/63350/13874848/ brick4 log http://fpaste.org/63351/13874848/
I retest this with glusterfs 3.4.2 built on Jan 3 2014 12:38:26. The error "Structure needs cleaning" is not show. Instead it just displays "No such file or directory". However that file is available on the other node. This seems a critical bug to me as it are only mv operations to trigger this problem! This is how I can reproduce it: [root@SRV-1 sharedfs]# echo "world" > test [root@SRV-1 sharedfs]# md5sum test 591785b794601e212b260e25925636fd test [root@SRV-2 sharedfs]# md5sum test 591785b794601e212b260e25925636fd test [root@SRV-1 sharedfs]# mv test test2 [root@SRV-1 sharedfs]# md5sum test test2 md5sum: test: No such file or directory 591785b794601e212b260e25925636fd test2 [root@SRV-2 sharedfs]# md5sum test test2 591785b794601e212b260e25925636fd test 591785b794601e212b260e25925636fd test2 [root@SRV-1 sharedfs]# echo "hello" > test [root@SRV-1 sharedfs]# md5sum test test2 b1946ac92492d2347c6235b4d2611184 test 591785b794601e212b260e25925636fd test2 [root@SRV-2 sharedfs]# md5sum test test2 591785b794601e212b260e25925636fd test 591785b794601e212b260e25925636fd test2 [root@SRV-1 sharedfs]# rm test2 rm: remove regular file `test2'? y [dia3-blade_1-root@SRV-1 sharedfs]# md5sum test test2 b1946ac92492d2347c6235b4d2611184 test md5sum: test2: No such file or directory [root@SRV-2 sharedfs]# md5sum test test2 md5sum: test: No such file or directory md5sum: test2: No such file or directory [root@SRV-2 sharedfs]# ls -al test* -rw-r--r-- 1 root root 6 Jan 16 09:47 test [root@SRV-2 sharedfs]# md5sum test test2 b1946ac92492d2347c6235b4d2611184 test md5sum: test2: No such file or directory
Johan, I have found in this document https://forge.gluster.org/hadoop/pages/InstallingAndConfiguringGlusterFS that the behavior does not happen when testing on their provided 6.2 kernel. I've tried to implement my reproducable steps and 100% it did not return "file not found". I have opened a new case for a better understanding of the patch that was in the 6.2 kernel vs. what we are currently using in 6.4.
I have tested this on rhel6.5 and it appears to have been patched.
While the result is acceptable when 2 clients are reading/writing to the same file on rhel 6.5 2.6.32-431.3.1.el6.x86_64 The gluster client logs shows otherwise: [root@omhq1cbf glusterfs]# cat mnt-khoi.log [2014-02-03 18:24:29.379996] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 (/usr/sbin/glusterfs --volfile-id=/khoi --volfile-server=omhq1826 /mnt/khoi) [2014-02-03 18:24:29.383834] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-02-03 18:24:29.383876] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2014-02-03 18:24:29.392828] I [quota.c:3051:quota_parse_limits] 0-khoi-quota: /:10737418240 [2014-02-03 18:24:29.392851] I [quota.c:3083:quota_parse_limits] 0-khoi-quota: /:10737418240 [2014-02-03 18:24:29.396296] I [socket.c:3480:socket_init] 0-khoi-client-3: SSL support is NOT enabled [2014-02-03 18:24:29.396331] I [socket.c:3495:socket_init] 0-khoi-client-3: using system polling thread [2014-02-03 18:24:29.397079] I [socket.c:3480:socket_init] 0-khoi-client-2: SSL support is NOT enabled [2014-02-03 18:24:29.397095] I [socket.c:3495:socket_init] 0-khoi-client-2: using system polling thread [2014-02-03 18:24:29.397867] I [socket.c:3480:socket_init] 0-khoi-client-1: SSL support is NOT enabled [2014-02-03 18:24:29.397884] I [socket.c:3495:socket_init] 0-khoi-client-1: using system polling thread [2014-02-03 18:24:29.398640] I [socket.c:3480:socket_init] 0-khoi-client-0: SSL support is NOT enabled [2014-02-03 18:24:29.398657] I [socket.c:3495:socket_init] 0-khoi-client-0: using system polling thread [2014-02-03 18:24:29.398697] I [client.c:2154:notify] 0-khoi-client-0: parent translators are ready, attempting connect on transport [2014-02-03 18:24:29.402303] I [client.c:2154:notify] 0-khoi-client-1: parent translators are ready, attempting connect on transport [2014-02-03 18:24:29.406581] I [client.c:2154:notify] 0-khoi-client-2: parent translators are ready, attempting connect on transport [2014-02-03 18:24:29.410741] I [client.c:2154:notify] 0-khoi-client-3: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume khoi-client-0 2: type protocol/client 3: option transport-type tcp 4: option remote-subvolume /static/khoi 5: option remote-host omhq1826 6: end-volume 7: 8: volume khoi-client-1 9: type protocol/client 10: option transport-type tcp 11: option remote-subvolume /static/khoi 12: option remote-host omdx1448 13: end-volume 14: 15: volume khoi-client-2 16: type protocol/client 17: option transport-type tcp 18: option remote-subvolume /static/khoi 19: option remote-host omhq1832 20: end-volume 21: 22: volume khoi-client-3 23: type protocol/client 24: option transport-type tcp 25: option remote-subvolume /static/khoi 26: option remote-host omdx14f0 27: end-volume 28: 29: volume khoi-replicate-0 30: type cluster/replicate 31: option eager-lock on 32: subvolumes khoi-client-0 khoi-client-1 33: end-volume 34: 35: volume khoi-replicate-1 36: type cluster/replicate 37: option eager-lock on 38: subvolumes khoi-client-2 khoi-client-3 39: end-volume 40: 41: volume khoi-dht 42: type cluster/distribute 43: subvolumes khoi-replicate-0 khoi-replicate-1 44: end-volume 45: 46: volume khoi-quota 47: type features/quota 48: option timeout 0 49: option limit-set /:10GB 50: subvolumes khoi-dht 51: end-volume 52: 53: volume khoi-write-behind 54: type performance/write-behind 55: subvolumes khoi-quota 56: end-volume 57: 58: volume khoi-read-ahead 59: type performance/read-ahead 60: subvolumes khoi-write-behind 61: end-volume 62: 63: volume khoi-io-cache 64: type performance/io-cache 65: subvolumes khoi-read-ahead 66: end-volume 67: 68: volume khoi-open-behind 69: type performance/open-behind 70: option lazy-open off 71: subvolumes khoi-io-cache 72: end-volume 73: 74: volume khoi 75: type debug/io-stats 76: option count-fop-hits off 77: option latency-measurement off 78: subvolumes khoi-open-behind 79: end-volume +------------------------------------------------------------------------------+ [2014-02-03 18:24:29.415664] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-0: changing port to 49157 (from 0) [2014-02-03 18:24:29.415703] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-2: changing port to 49157 (from 0) [2014-02-03 18:24:29.415723] W [socket.c:514:__socket_rwv] 0-khoi-client-0: readv failed (No data available) [2014-02-03 18:24:29.419345] W [socket.c:514:__socket_rwv] 0-khoi-client-2: readv failed (No data available) [2014-02-03 18:24:29.422881] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-1: changing port to 49157 (from 0) [2014-02-03 18:24:29.422946] W [socket.c:514:__socket_rwv] 0-khoi-client-1: readv failed (No data available) [2014-02-03 18:24:29.426523] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:24:29.426584] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-3: changing port to 49158 (from 0) [2014-02-03 18:24:29.426603] W [socket.c:514:__socket_rwv] 0-khoi-client-3: readv failed (No data available) [2014-02-03 18:24:29.430129] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:24:29.430205] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-0: Connected to 72.37.14.110:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:24:29.430217] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:24:29.430255] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-0: Subvolume 'khoi-client-0' came back up; going online. [2014-02-03 18:24:29.430565] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-0: Server lk version = 1 [2014-02-03 18:24:29.430637] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-2: Connected to 72.37.14.111:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:24:29.430648] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-2: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:24:29.430689] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-1: Subvolume 'khoi-client-2' came back up; going online. [2014-02-03 18:24:29.430879] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:24:29.430965] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-2: Server lk version = 1 [2014-02-03 18:24:29.431307] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:24:29.431634] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-1: Connected to 72.37.1.80:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:24:29.431657] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-1: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:24:29.432015] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-3: Connected to 72.37.1.88:49158, attached to remote volume '/static/khoi'. [2014-02-03 18:24:29.432046] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-3: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:24:29.438924] I [fuse-bridge.c:4769:fuse_graph_setup] 0-fuse: switched to graph 0 [2014-02-03 18:24:29.439060] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-3: Server lk version = 1 [2014-02-03 18:24:29.439105] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-1: Server lk version = 1 [2014-02-03 18:24:29.439170] I [fuse-bridge.c:3724:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2014-02-03 18:24:29.439952] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-0: added root inode [2014-02-03 18:24:29.440854] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-1: added root inode [2014-02-03 18:25:51.919220] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory [2014-02-03 18:25:51.919777] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory [2014-02-03 18:25:51.922142] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 63: STAT() /world => -1 (Structure needs cleaning) [2014-02-03 18:26:04.223082] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory [2014-02-03 18:26:04.223625] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory [2014-02-03 18:26:05.166949] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-1: remote operation failed: No such file or directory [2014-02-03 18:26:05.167483] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-khoi-client-0: remote operation failed: No such file or directory [2014-02-03 18:27:36.245658] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3f10ce8b6d] (-->/lib64/libpthread.so.0() [0x3f114079d1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down [2014-02-03 18:27:36.245682] I [fuse-bridge.c:5260:fini] 0-fuse: Unmounting '/mnt/khoi'. [2014-02-03 18:27:36.251209] I [fuse-bridge.c:4628:fuse_thread_proc] 0-fuse: unmounting /mnt/khoi [2014-02-03 18:31:35.521296] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 (/usr/sbin/glusterfs --volfile-id=/khoi --volfile-server=omhq1826 /mnt/khoi) [2014-02-03 18:31:35.528457] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-02-03 18:31:35.528505] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2014-02-03 18:31:35.537061] I [quota.c:3051:quota_parse_limits] 0-khoi-quota: /:10737418240 [2014-02-03 18:31:35.537083] I [quota.c:3083:quota_parse_limits] 0-khoi-quota: /:10737418240 [2014-02-03 18:31:35.540759] I [socket.c:3480:socket_init] 0-khoi-client-3: SSL support is NOT enabled [2014-02-03 18:31:35.540795] I [socket.c:3495:socket_init] 0-khoi-client-3: using system polling thread [2014-02-03 18:31:35.541531] I [socket.c:3480:socket_init] 0-khoi-client-2: SSL support is NOT enabled [2014-02-03 18:31:35.541548] I [socket.c:3495:socket_init] 0-khoi-client-2: using system polling thread [2014-02-03 18:31:35.542284] I [socket.c:3480:socket_init] 0-khoi-client-1: SSL support is NOT enabled [2014-02-03 18:31:35.542304] I [socket.c:3495:socket_init] 0-khoi-client-1: using system polling thread [2014-02-03 18:31:35.543017] I [socket.c:3480:socket_init] 0-khoi-client-0: SSL support is NOT enabled [2014-02-03 18:31:35.543036] I [socket.c:3495:socket_init] 0-khoi-client-0: using system polling thread [2014-02-03 18:31:35.543073] I [client.c:2154:notify] 0-khoi-client-0: parent translators are ready, attempting connect on transport [2014-02-03 18:31:35.546654] I [client.c:2154:notify] 0-khoi-client-1: parent translators are ready, attempting connect on transport [2014-02-03 18:31:35.550506] I [client.c:2154:notify] 0-khoi-client-2: parent translators are ready, attempting connect on transport [2014-02-03 18:31:35.554341] I [client.c:2154:notify] 0-khoi-client-3: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume khoi-client-0 2: type protocol/client 3: option transport-type tcp 4: option remote-subvolume /static/khoi 5: option remote-host omhq1826 6: end-volume 7: 8: volume khoi-client-1 9: type protocol/client 10: option transport-type tcp 11: option remote-subvolume /static/khoi 12: option remote-host omdx1448 13: end-volume 14: 15: volume khoi-client-2 16: type protocol/client 17: option transport-type tcp 18: option remote-subvolume /static/khoi 19: option remote-host omhq1832 20: end-volume 21: 22: volume khoi-client-3 23: type protocol/client 24: option transport-type tcp 25: option remote-subvolume /static/khoi 26: option remote-host omdx14f0 27: end-volume 28: 29: volume khoi-replicate-0 30: type cluster/replicate 31: option eager-lock on 32: subvolumes khoi-client-0 khoi-client-1 33: end-volume 34: 35: volume khoi-replicate-1 36: type cluster/replicate 37: option eager-lock on 38: subvolumes khoi-client-2 khoi-client-3 39: end-volume 40: 41: volume khoi-dht 42: type cluster/distribute 43: subvolumes khoi-replicate-0 khoi-replicate-1 44: end-volume 45: 46: volume khoi-quota 47: type features/quota 48: option timeout 0 49: option limit-set /:10GB 50: subvolumes khoi-dht 51: end-volume 52: 53: volume khoi-write-behind 54: type performance/write-behind 55: subvolumes khoi-quota 56: end-volume 57: 58: volume khoi-read-ahead 59: type performance/read-ahead 60: subvolumes khoi-write-behind 61: end-volume 62: 63: volume khoi-io-cache 64: type performance/io-cache 65: subvolumes khoi-read-ahead 66: end-volume 67: 68: volume khoi-open-behind 69: type performance/open-behind 70: option lazy-open off 71: subvolumes khoi-io-cache 72: end-volume 73: 74: volume khoi 75: type debug/io-stats 76: option count-fop-hits off 77: option latency-measurement off 78: subvolumes khoi-open-behind 79: end-volume +------------------------------------------------------------------------------+ [2014-02-03 18:31:35.558946] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-0: changing port to 49157 (from 0) [2014-02-03 18:31:35.558985] W [socket.c:514:__socket_rwv] 0-khoi-client-0: readv failed (No data available) [2014-02-03 18:31:35.562438] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-2: changing port to 49157 (from 0) [2014-02-03 18:31:35.562511] W [socket.c:514:__socket_rwv] 0-khoi-client-2: readv failed (No data available) [2014-02-03 18:31:35.565931] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-3: changing port to 49158 (from 0) [2014-02-03 18:31:35.565959] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-khoi-client-1: changing port to 49157 (from 0) [2014-02-03 18:31:35.565976] W [socket.c:514:__socket_rwv] 0-khoi-client-3: readv failed (No data available) [2014-02-03 18:31:35.569773] W [socket.c:514:__socket_rwv] 0-khoi-client-1: readv failed (No data available) [2014-02-03 18:31:35.573627] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:31:35.573894] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:31:35.574188] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-0: Connected to 72.37.14.110:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:31:35.574228] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:31:35.574291] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-0: Subvolume 'khoi-client-0' came back up; going online. [2014-02-03 18:31:35.574364] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:31:35.574423] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-2: Connected to 72.37.14.111:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:31:35.574434] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-2: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:31:35.574464] I [afr-common.c:3698:afr_notify] 0-khoi-replicate-1: Subvolume 'khoi-client-2' came back up; going online. [2014-02-03 18:31:35.574536] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-0: Server lk version = 1 [2014-02-03 18:31:35.574781] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-2: Server lk version = 1 [2014-02-03 18:31:35.574922] I [client-handshake.c:1658:select_server_supported_programs] 0-khoi-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-02-03 18:31:35.575082] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-3: Connected to 72.37.1.88:49158, attached to remote volume '/static/khoi'. [2014-02-03 18:31:35.575106] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-3: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:31:35.575661] I [client-handshake.c:1456:client_setvolume_cbk] 0-khoi-client-1: Connected to 72.37.1.80:49157, attached to remote volume '/static/khoi'. [2014-02-03 18:31:35.575686] I [client-handshake.c:1468:client_setvolume_cbk] 0-khoi-client-1: Server and Client lk-version numbers are not same, reopening the fds [2014-02-03 18:31:35.581057] I [fuse-bridge.c:4769:fuse_graph_setup] 0-fuse: switched to graph 0 [2014-02-03 18:31:35.581227] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-1: Server lk version = 1 [2014-02-03 18:31:35.581257] I [client-handshake.c:450:client_set_lk_version_cbk] 0-khoi-client-3: Server lk version = 1 [2014-02-03 18:31:35.581835] I [fuse-bridge.c:3724:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2014-02-03 18:31:35.582654] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-0: added root inode [2014-02-03 18:31:35.583444] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-khoi-replicate-1: added root inode [2014-02-03 18:32:57.711800] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-khoi-client-0: remote operation failed: Stale file handle. Path: /world (898c7eb6-4a1b-4de7-9485-bd4496a3b31b) [2014-02-03 18:32:57.712087] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-khoi-client-1: remote operation failed: Stale file handle. Path: /world (898c7eb6-4a1b-4de7-9485-bd4496a3b31b) [2014-02-03 18:40:16.546774] I [fuse-bridge.c:4628:fuse_thread_proc] 0-fuse: unmounting /mnt/khoi [2014-02-03 18:40:16.547158] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3f10ce8b6d] (-->/lib64/libpthread.so.0() [0x3f114079d1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down
I was able to reproduce once the "Stale file handle" issue as described by Khoi Mai in Comment 1. [root@vm1 home]# cat world cat: world: Stale file handle [root@vm1 home]# cat world cat: world: Stale file handle [root@vm1 home]# cat world cat: world: Stale file handle [root@vm1 home]# cat world cat: world: Stale file handle [root@vm1 home]# cat world cat: world: Stale file handle [root@vm1 home]# cat world cat: world: Stale file handle Then I cleaned up the cache by following command and it worked. [root@vm1 home]# echo 3 > /proc/sys/vm/drop_caches [root@vm1 home]# cat world jasdfdsfkjlksadf ljldsaf lkdsajfsaf sdalfasfd hello
Hi Susant Kumar Palai, Can you indicate on which gluster version and which centos/redhat version you could reproduce it? This could be usefull as we have seen different behaviour between different versions of gluster and centos. thx!
(In reply to Johan Huysmans from comment #7) > Hi Susant Kumar Palai, > > Can you indicate on which gluster version and which centos/redhat version > you could reproduce it? > > This could be usefull as we have seen different behaviour between different > versions of gluster and centos. > > thx! Hey Johan, I reproduced this in the latest upstream glusterfs(master) and I am using RHS. Would you please try to reproduce the same bug with "--use-readdirp=no" while mounting and update here. Just to narrow down the problem.(I tried the same and was not able to reproduce). Thx!
rhel 6.5 2.6.32-431.3.1.el6.x86_64 has proven to allow for concurrent access by the FUSE clients.
Hi Susant, Sorry for this late response. I was not able to find time to reproduce the issue with the --use-readdirp option. However in the next days I will plan on doing the tests. You indicated that you reproduced the issue. Can you provide more information on how you triggered the problem and what exactly is causing this issue (which combination of actions is needed to trigger the problem). Thx.
Hi, I'm running my testsetup with the --use-readdirp option. I have 1 node writing (and moving) files, mounted with --use-readdirp. I have 2 nodes reading files, 1 mounted with, 1 mounted without --use-readdirp. I haven't received a "Stale File Handle" message, however I'm able to reproduce the action and results as described in comment 2. The problem occurs on the node without the --use-readdirp=no option and The problem doesn't occur on the node with --use-readdirp=no option. I hope this helps you in finding the root cause.
when you say "nodes" are you referring to the clients? I might try that. I have an entry server that does changes to a volume, while i have another apache servers mounting up the same volume as read-only. Is the goal only to reduce the logged error "stale file handle" even though functionally the file is accessible?
The problem was in readdirp kernel module. The fix has gone for RHEL 6.5.
Is this feature documented anywhere describing when it is best to use it ? Is it only for rhel 6?
I fear to sound stupid, but I got this "structure needs cleaning" error after setting volume option cluster.metadata-change-log to off. Error went away after resetting cluster.metadata-change-log its to default value. I was able to reproduce error on both RHEL 6.4 running GlusterFS 3.4.2 compiled from source and Ubuntu Server 12.04 running GlusterFS 3.4.2 from PPA. Both 64bit.
Comment 14; are you saying when I mount gluster volumes as FUSE mounts I should use the --use-readdirp=no option? Then the desired outcome is what?
Hi Khoi Mai, apologies for late response. Yes, you should use --use-readdirp=no option to avoid "Stale file handle error "
Susant, When i mount mount -t glusterfs -o use-readdirp=no omhq1826:/khoi /mnt Do you suggestion only mounting this method to the clients where the reads are such as apache? Or mount it everywhere for that volume?
Khoi, use-readdirp=no option is specific to the client process(mount). Well to avoid "Stale file handle error" you should use the option for all mount process.
Susant, do you know if this will be addressed on future version of glusterfs like 3.4.4 , then i'll have to undo my changes ?
Susant: Which kernel release has this fixed?
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.
GlusterFS 3.4.x has reached end-of-life. If this bug still exists in a later release please reopen this and change the version or open a new bug.
GlusterFS 3.4.x has reached end-of-life.\ \ If this bug still exists in a later release please reopen this and change the version or open a new bug.