Hide Forgot
setup is, [root@centos-qa-client-2 ~]# /opt/glusterfs/git_inst/sbin/gluster volume info dr3 Volume Name: dr3 Type: Distributed-Replicate Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.1.12.134:/mnt/d3 Brick2: 10.1.12.135:/mnt/r3 Brick3: 10.1.12.134:/mnt/d-d3 Brick4: 10.1.12.135:/mnt/r-r3 Brick5: 10.1.12.134:/mnt/dd-d3 Brick6: 10.1.12.135:/mnt/rr-r3 [root@centos-qa-client-2 ~]# in the above setup I added bricks, Brick5: 10.1.12.134:/mnt/dd-d3 Brick6: 10.1.12.135:/mnt/rr-r3 after creating data over fuse mount, data included just creating few filed of size 1024. after addition the rebalance fails, this issue is seen twice already. log glimpse from the glusterd log file, [2011-07-22 01:24:20.769730] I [glusterd-rebalance.c:280:glusterd_defrag_start] 0-rebalance: rebalance on /etc/glusterd/mount/dr3 complete [2011-07-22 01:24:20.770901] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:962) [2011-07-22 01:24:20.840145] I [glusterd-rebalance.c:422:glusterd_rebalance_cmd_attempted_log] 0-glusterd: Received rebalance volume 3 on dr3 [2011-07-22 01:24:20.840777] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:946) logs glimpse from the bricks logs,(10.1.12.135:/mnt/rr-r3) from the newly added replicate brick, [2011-07-22 01:24:20.669310] I [server-handshake.c:534:server_setvolume] 0-dr3-server: accepted client from 10.1.12.135:964 [2011-07-22 01:24:20.770924] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.dr3-server: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.12.135:964) [2011-07-22 01:24:20.770980] I [server-helpers.c:485:do_fd_cleanup] 0-dr3-server: fd cleanup on / [2011-07-22 01:24:20.771004] I [server.c:438:server_rpc_notify] 0-dr3-server: disconnected connection from 10.1.12.135:964 [2011-07-22 01:24:20.771023] I [server-helpers.c:783:server_connection_destroy] 0-dr3-server: destroyed connection of centos-qa-client-3-29997-2011/07/22-01:24:16:640972-dr3-client-5 [2011-07-22 01:24:55.432259] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.dr3-server: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.12.134:989) [2011-07-22 01:24:55.432349] I [server.c:438:server_rpc_notify] 0-dr3-server: disconnected connection from 10.1.12.134:989 [2011-07-22 01:24:55.432370] I [server-helpers.c:783:server_connection_destroy] 0-dr3-server: destroyed connection of centos-qa-client-2-23514-2011/07/22-01:17:07:634652-dr3-client-5 from the newly added distribute brick,(10.1.12.134:/mnt/dd-d3) [2011-07-22 01:17:41.698905] W [socket.c:204:__socket_rwv] 0-tcp.dr3-server: readv failed (Connection reset by peer) [2011-07-22 01:17:41.698919] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.dr3-server: reading from socket failed. Error (Connection reset by peer), peer (10.1.12.135:946) [2011-07-22 01:17:41.698943] I [server-helpers.c:485:do_fd_cleanup] 0-dr3-server: fd cleanup on / [2011-07-22 01:17:41.698960] I [server.c:438:server_rpc_notify] 0-dr3-server: disconnected connection from 10.1.12.135:946 [2011-07-22 01:17:41.698976] I [server-helpers.c:783:server_connection_destroy] 0-dr3-server: destroyed connection of centos-qa-client-3-29997-2011/07/22-01:24:16:640972-dr3-client-4 [2011-07-22 01:18:16.358620] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.dr3-server: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.12.134:1022) [2011-07-22 01:18:16.358699] I [server.c:438:server_rpc_notify] 0-dr3-server: disconnected connection from 10.1.12.134:1022 [2011-07-22 01:18:16.358737] I [server-helpers.c:783:server_connection_destroy] 0-dr3-server: destroyed connection of centos-qa-client-2-23514-2011/07/22-01:17:07:634652-dr3-client-4 but the peer status shows that it is still connected, [root@centos-qa-client-3 glusterfs]# /opt/glusterfs/git_inst/sbin/gluster peer status Number of Peers: 1 Hostname: 10.1.12.134 Uuid: 017e0724-58ae-427a-b7f8-928850b3dba6 State: Peer in Cluster (Connected) [root@centos-qa-client-3 glusterfs]#
the git repo is synced to this id, commit 37b4b7ccad5a479ed2ce96598e902a12fd457bd5 Author: Raghavendra Bhat <raghavendrabhat> Date: Wed Jul 20 12:21:12 2011 +0530 debug/io-stats: allow log-level to be set to INFO Change-Id: Ieba6591b0641dcb7dad724c1d8199dc5e91b6bfd
Hi Saurabh, Can you check if rebalance works fine now? because after http://review.gluster.com/90 most of the things should now work fine. Regards, Amar
Yeah Amar, the issue is still happening, latest sync points to this change number, Change-Id: I85cf4afc1f534f5f51018449d5d84baef18fce23
rebalance fails after quota disable and adding new bricks, [root@Centos1 ~]# /opt/glusterfs/3.3.0//sbin/gluster volume quota drep disable Disabling quota will delete all the quota configuration. Do you want to continue? (y/n) y Disabling quota has been successful [root@Centos1 ~]# /opt/glusterfs/3.3.0//sbin/gluster volume add-brick drep 10.1.12.134:/mnt/d-d3 10.1.12.135:/mnt/r-r3 Add Brick successful [root@Centos1 ~]# /opt/glusterfs/3.3.0//sbin/gluster volume rebalance drep start starting rebalance on volume drep has been successful [root@Centos1 ~]# /opt/glusterfs/3.3.0//sbin/gluster volume rebalance drep status rebalance failed [2011-08-19 08:42:27.649555] I [dht-rebalance.c:317:dht_migrate_file] 0-drep-dht: /d0/f.26: attempting to move from drep-replicate-0 to d rep-replicate-4 [2011-08-19 08:42:27.650134] W [dict.c:356:dict_del] (-->/opt/glusterfs/3.3.0//lib/glusterfs/3.3.0qa3/xlator/cluster/distribute.so(dht_mi grate_file+0x54e) [0x2aaaada2e7be] (-->/opt/glusterfs/3.3.0//lib/libglusterfs.so.0(syncop_lookup+0x148) [0x2b540f63f508] (-->/opt/gluster fs/3.3.0//lib/glusterfs/3.3.0qa3/xlator/cluster/replicate.so(afr_lookup+0x403) [0x2aaaad8105f3]))) 0-dict: !this || key=gfid-req [2011-08-19 08:42:27.652039] W [dht-rebalance.c:244:__dht_check_free_space] 0-drep-dht: data movement attempted from node (drep-replicate -0) with higher disk space to a node (drep-replicate-4) with lesser disk space (/d0/f.26) [2011-08-19 08:42:27.652098] C [fuse-bridge.c:876:fuse_setxattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2011-08-19 08:42:27.652233] W [fuse-bridge.c:847:fuse_err_cbk] 0-glusterfs-fuse: 142: SETXATTR() /d0/f.26 => -1 (Operation not supported) pending frames: frame : type(1) op(SETXATTR) frame : type(1) op(SETXATTR) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2011-08-19 08:42:27 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 [2011-08-19 08:42:27.649555] I [dht-rebalance.c:317:dht_migrate_file] 0-drep-dht: /d0/f.26: attempting to move from drep-replicate-0 to d rep-replicate-4 [2011-08-19 08:42:27.650134] W [dict.c:356:dict_del] (-->/opt/glusterfs/3.3.0//lib/glusterfs/3.3.0qa3/xlator/cluster/distribute.so(dht_mi grate_file+0x54e) [0x2aaaada2e7be] (-->/opt/glusterfs/3.3.0//lib/libglusterfs.so.0(syncop_lookup+0x148) [0x2b540f63f508] (-->/opt/gluster fs/3.3.0//lib/glusterfs/3.3.0qa3/xlator/cluster/replicate.so(afr_lookup+0x403) [0x2aaaad8105f3]))) 0-dict: !this || key=gfid-req [2011-08-19 08:42:27.652039] W [dht-rebalance.c:244:__dht_check_free_space] 0-drep-dht: data movement attempted from node (drep-replicate -0) with higher disk space to a node (drep-replicate-4) with lesser disk space (/d0/f.26) [2011-08-19 08:42:27.652098] C [fuse-bridge.c:876:fuse_setxattr_cbk] 0-glusterfs-fuse: extended attribute not supported by the backend storage [2011-08-19 08:42:27.652233] W [fuse-bridge.c:847:fuse_err_cbk] 0-glusterfs-fuse: 142: SETXATTR() /d0/f.26 => -1 (Operation not supported) pending frames: frame : type(1) op(SETXATTR) frame : type(1) op(SETXATTR) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2011-08-19 08:42:27 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0qa3 /lib64/libc.so.6[0x3470c302d0] /opt/glusterfs/3.3.0//lib/libglusterfs.so.0(synctask_wrap+0x4)[0x2b540f63c834] /lib64/libc.so.6[0x3470c419c0] --------- : :
I tried to reproduce this in my setup.. i get the same log [2011-08-19 22:41:41.595461] W [dict.c:356:dict_del] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/distribute.so(+0xbab2) [0x7f06d70abab2] (-->/usr/local/lib/libglusterfs.so.0(syncop_lookup+0x350) [0x7f06dce6fbc9] (-->/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_lookup+0x524) [0x7f06d73444a5]))) 0-dict: !this || key=gfid-req [2011-08-19 22:41:41.596551] W [dht-rebalance.c:244:__dht_check_free_space] 0-test-dht: data movement attempted from node (test-replicate-2) with higher disk space to a node (test-replicate-1) with lesser disk space (/testdir/16) [2011-08-19 22:41:41.596588] W [fuse-bridge.c:847:fuse_err_cbk] 0-glusterfs-fuse: 135: SETXATTR() /testdir/16 => -1 (Operation not supported) but, the crash doesn't happen.
Saurabh, can you share details of the machine login on which the rebalance failed? Want to have a look at the core.
With bug 765368 getting fixed this should be fixed. Please reopen if seen again.