Description of problem: In a distributed-replicate volume while rebalance is in progress if one of the node is rebooted then rebalance process will run forever Version-Release number of selected component (if applicable): [root@gqac023 ~]# rpm -qa |grep gluster glusterfs-fuse-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.17rhs-1.el6rhs.x86_64 gluster-swift-container-1.8.0-6.3.el6rhs.noarch glusterfs-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.17rhs-1.el6rhs.x86_64 gluster-swift-1.8.0-6.3.el6rhs.noarch gluster-swift-proxy-1.8.0-6.3.el6rhs.noarch gluster-swift-account-1.8.0-6.3.el6rhs.noarch vdsm-gluster-4.10.2-23.0.1.el6rhs.noarch glusterfs-libs-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.17rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.17rhs-1.el6rhs.x86_64 gluster-swift-object-1.8.0-6.3.el6rhs.noarch gluster-swift-plugin-1.8.0-2.el6rhs.noarch How reproducible: Steps to Reproduce: 1. I had a 11x2 distributed-replicate volume with 3 node cluster 2. added a pair of brick and started fix-layout 3. rebalance status was showing the "in progress" on all the nodes 4. Rebooted one of the node 5. when the node comes back rebalance will not be started automatically on that node 6. Now status on the remaining nodes shows "in progress" forever Actual results: Rebalance runs forever on the remaining nodes of the cluster Expected results: Additional info: ================= Volume Name: di-re Type: Distributed-Replicate Volume ID: 93edf2ff-9b12-48eb-8714-e2a2807cca43 Status: Started Number of Bricks: 12 x 2 = 24 Transport-type: tcp Bricks: Brick1: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire1 Brick2: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire1 Brick3: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire2 Brick4: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire2 Brick5: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire3 Brick6: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire3 Brick7: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire4 Brick8: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire4 Brick9: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire5 Brick10: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire5 Brick11: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire6 Brick12: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire6 Brick13: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire7 Brick14: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire7 Brick15: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire9 Brick16: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire9 Brick17: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire10 Brick18: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire10 Brick19: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire11 Brick20: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire11 Brick21: gqac022.sbu.lab.eng.bos.redhat.com:/home/dire12 Brick22: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire12 Brick23: gqac024.sbu.lab.eng.bos.redhat.com:/home/dire13 Brick24: gqac023.sbu.lab.eng.bos.redhat.com:/home/dire13 RHS nodes ---------- gqac022.sbu.lab.eng.bos.redhat.com gqac023.sbu.lab.eng.bos.redhat.com gqac024.sbu.lab.eng.bos.redhat.com Node which was brought down =========================== gqac023.sbu.lab.eng.bos.redhat.com Command ran from ================ gqac022.sbu.lab.eng.bos.redhat.com Mounted on ========== gqac024.sbu.lab.eng.bos.redhat.com Mount point ========== /mnt attached the sosreports
Targeting for 3.0.0 (Denali) release.
Shylesh, Was not able to reproduce the issue with the above steps. Created a 2 brick distribute volume : [root@localhost upstream]# gluster v i Volume Name: test1 Type: Distribute Volume ID: 84a3db85-9328-4d4d-adc2-045bddb55414 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 10.70.42.211:/brick/1 Brick2: 10.70.43.49:/brick/1 Created 10000 directories on mount point [root@localhost mnt]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 9.6G 6.7G 2.5G 74% / tmpfs 1004M 0 1004M 0% /dev/shm /dev/vda1 485M 32M 428M 7% /boot /dev/mapper/VolGroup-lv_brick 9.8G 33M 9.8G 1% /brick 10.70.42.211:test1 20G 88M 20G 1% /mnt [root@localhost mnt]# ls [root@localhost mnt]# mkdir dir{1..10000} From 10.70.42.211 started fix-layout rebalance [root@localhost mnt]# gluster volume rebalance test1 fix-layout start volume rebalance: test1: success: Starting rebalance on volume test1 has been successful. ID: 074a2a5f-8b60-4b33-8908-569cec40215e [root@localhost upstream]# gluster volume rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 fix-layout in progress 2.00 10.70.42.211 0 0Bytes 0 0 0 fix-layout in progress 1.00 In the middle of the operation rebooted 10.70.43.49 [root@localhost upstream]# reboot [root@localhost upstream]# Broadcast message from root (/dev/pts/0) at 7:31 ... The system is going down for reboot NOW! Connection to 10.70.43.49 closed by remote host. Connection to 10.70.43.49 closed. After the server came up checked rebalance status [root@localhost mnt]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 1 0 fix-layout failed 64.00 10.70.43.49 0 0Bytes 0 1 0 fix-layout failed 0.00 It Found the following in the log: [2014-03-04 07:37:28.828568] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir2123 [2014-03-04 07:37:28.838269] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir2124 [2014-03-04 07:37:28.852623] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir2125 [2014-03-04 07:37:28.856090] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir2126 [2014-03-04 07:37:28.865942] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir2127 [2014-03-04 07:38:12.083731] I [dht-rebalance.c:1826:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 56.00 secs [2014-03-04 07:38:12.083771] I [dht-rebalance.c:1829:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0 [2014-03-04 07:38:20.382415] W [socket.c:522:__socket_rwv] 0-test1-client-1: readv on 10.70.43.49:49156 failed (Connection reset by peer) [2014-03-04 07:38:20.382698] E [rpc-clnt.c:369:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147) [0x7f3d853525ae] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x139) [0x7f3d85351ba6] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f3d853516a3]))) 0-test1-client-1: forced unwinding frame type(GlusterFS 3.3) op(OPENDIR(20)) called at 2014-03-04 07:37:28.872648 (xid=0x361a) [2014-03-04 07:38:20.382723] W [client-rpc-fops.c:2689:client3_3_opendir_cbk] 0-test1-client-1: remote operation failed: Transport endpoint is not connected. Path: /dir2127 (b520279b-781b-4f3b-90b9-3018557b0360) [2014-03-04 07:38:20.383338] E [rpc-clnt.c:369:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147) [0x7f3d853525ae] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x139) [0x7f3d85351ba6] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f3d853516a3]))) 0-test1-client-1: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2014-03-04 07:37:58.574734 (xid=0x361b) [2014-03-04 07:38:20.383348] W [client-handshake.c:276:client_ping_cbk] 0-test1-client-1: timer must have expired [2014-03-04 07:38:20.383365] I [client.c:2208:client_rpc_notify] 0-test1-client-1: disconnected from 10.70.43.49:49156. Client process will keep trying to connect to glusterd until brick's port is available [2014-03-04 07:38:20.383376] W [dht-common.c:5172:dht_notify] 0-test1-dht: Received CHILD_DOWN. Exiting [2014-03-04 07:38:20.383383] I [dht-rebalance.c:1849:gf_defrag_stop] 0-: Received stop command on rebalance [2014-03-04 07:38:20.383443] E [dht-rebalance.c:1541:gf_defrag_fix_layout] 0-test1-dht: Fix layout failed for /dir2127 [2014-03-04 07:38:20.383583] I [dht-rebalance.c:1826:gf_defrag_status_get] 0-glusterfs: Rebalance is failed. Time taken is 64.00 secs [2014-03-04 07:38:20.383595] I [dht-rebalance.c:1829:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 1, skipped: 0 [2014-03-04 07:38:20.383809] W [glusterfsd.c:1128:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3ea50e890d] (-->/lib64/libpthread.so.0() [0x3ea5807851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe4) [0x407e55]))) 0-: received signum (15), shutting down Will confirm the behaviour with code.
Shylesh, Here is the code snippet no how a child down event is handled here. From dht_notify{ case GF_EVENT_CHILD_DOWN: subvol = data; if (conf->assert_no_child_down) { gf_log (this->name, GF_LOG_WARNING, "Received CHILD_DOWN. Exiting"); if (conf->defrag) { /* status is set to GF_DEFRAG_STATUS_FAILED in gf_defrag_stop*/ gf_defrag_stop (conf->defrag, GF_DEFRAG_STATUS_FAILED, NULL); } else { kill (getpid(), SIGTERM); } } } From gf_defrag_status_get (gf_defrag_info_t *defrag, dict_t *dict) { switch (defrag->defrag_status) { case GF_DEFRAG_STATUS_NOT_STARTED: status = "not started"; break; case GF_DEFRAG_STATUS_STARTED: status = "in progress"; break; case GF_DEFRAG_STATUS_STOPPED: status = "stopped"; break; case GF_DEFRAG_STATUS_COMPLETE: status = "completed"; break; case GF_DEFRAG_STATUS_FAILED: status = "failed"; /*Status is set to failed and logged*/ break; default: break; } gf_log (THIS->name, GF_LOG_INFO, "Rebalance is %s. Time taken is %.2f " "secs", status, elapsed); gf_log (THIS->name, GF_LOG_INFO, "Files migrated: %"PRIu64", size: %" PRIu64", lookups: %"PRIu64", failures: %"PRIu64", skipped: " "%"PRIu64, files, size, lookup, failures, skipped); } Can you confirm the behaviour with latest build.
Susant, Description says the volume type was distributed-replicate whereas you tried with a distribute volume. Please try with distributed-replicate volume and let me know
Shylesh, Able to see one issue with reboot. Reproducible : Always Volume: Volume Name: test1 Type: Distributed-Replicate Volume ID: 1dcd8dc0-3d7e-4e7f-b561-c3fadd053bf3 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.42.211:/brick/1 Brick2: 10.70.43.49:/brick/1 Brick3: 10.70.42.211:/brick/2 Brick4: 10.70.43.49:/brick/2 [root@localhost mnt]# *Created some data [root@localhost mnt]# ls dir1 dir11 dir13 dir15 dir17 dir19 dir20 dir4 dir6 dir8 zinkile1 zinkile2 zinkile4 zinkile6 zinkile8 dir10 dir12 dir14 dir16 dir18 dir2 dir3 dir5 dir7 dir9 zinkile10 zinkile3 zinkile5 zinkile7 zinkile9 [root@localhost mnt]# *added a pair of brick and started rebalance [root@localhost mnt]# gluster v add-brick test1 10.70.42.211:/brick/5 10.70.43.49:/brick/5 volume add-brick: success [root@localhost mnt]# [root@localhost mnt]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 10 0 3 completed 0.00 10.70.43.49 0 0Bytes 10 0 0 completed 0.00 volume rebalance: test1: success: Now rebooted 10.70.43.49 [root@localhost ~]# reboot Broadcast message from root (/dev/pts/0) at 18:18 ... The system is going down for reboot NOW! [root@localhost ~]# Connection to 10.70.43.49 closed by remote host. Connection to 10.70.43.49 closed. Now logged in to 10.70.43.49 and started glusterd [susant@admin ~]$ vm3 root.43.49's password: Last login: Tue Mar 4 13:49:40 2014 from dhcp-0-200.blr.redhat.com [root@localhost ~]# pgrep glusterd [root@localhost ~]# glusterd After glusterd started saw rebalance is started again on 10.70.43.49 [2014-03-04 18:20:08.269956] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir13 took 0.00 secs [2014-03-04 18:20:08.273536] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir14 [2014-03-04 18:20:08.276153] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir14 [2014-03-04 18:20:08.278087] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir14 took 0.00 secs [2014-03-04 18:20:08.281345] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir15 [2014-03-04 18:20:08.284134] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir15 [2014-03-04 18:20:08.286188] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir15 took 0.00 secs [2014-03-04 18:20:08.289099] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir16 [2014-03-04 18:20:08.291804] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir16 [2014-03-04 18:20:08.293951] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir16 took 0.00 secs [2014-03-04 18:20:08.297058] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir17 [2014-03-04 18:20:08.299245] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir17 [2014-03-04 18:20:08.300983] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir17 took 0.00 secs [2014-03-04 18:20:08.303776] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir18 [2014-03-04 18:20:08.306262] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir18 [2014-03-04 18:20:08.308070] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir18 took 0.00 secs [2014-03-04 18:20:08.311072] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir19 [2014-03-04 18:20:08.313643] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir19 [2014-03-04 18:20:08.315300] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir19 took 0.00 secs [2014-03-04 18:20:08.318545] I [dht-common.c:2601:dht_setxattr] 0-test1-dht: fixing the layout of /dir20 [2014-03-04 18:20:08.321079] I [dht-rebalance.c:1170:gf_defrag_migrate_data] 0-test1-dht: migrate data called on /dir20 [2014-03-04 18:20:08.323018] I [dht-rebalance.c:1393:gf_defrag_migrate_data] 0-test1-dht: Migration operation on dir /dir20 took 0.00 secs [2014-03-04 18:20:08.327121] I [dht-rebalance.c:1826:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 0.00 secs [2014-03-04 18:20:08.327163] I [dht-rebalance.c:1829:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 10, failures: 0, skipped: 0 It's a glusterd bug. Will check with Kaushal.
Dev ack to 3.0 RHS BZs
Hi shylesh, Here is my findings for this bug. [root@localhost mnt]# gvi Volume Name: test1 Type: Distribute Volume ID: 7552e742-d2e8-4282-82b7-c210c8f5ff2c Status: Started Snap Volume: no Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 192.168.122.11:/brick/a Brick2: 192.168.122.100:/brick/b Brick3: 192.168.122.100:/brick/c Added two bricks : [root@localhost mnt]# gvi Volume Name: test1 Type: Distribute Volume ID: 7552e742-d2e8-4282-82b7-c210c8f5ff2c Status: Started Snap Volume: no Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 192.168.122.11:/brick/a Brick2: 192.168.122.100:/brick/b Brick3: 192.168.122.100:/brick/c Brick4: 192.168.122.11:/brick/@ Brick5: 192.168.122.100:/brick/@ Started fix-layout , rebooted 192.168.122.100 After the server came up, rebalance started for only 192.168.122.100 ON 192.168.122.100 checked status after reboot [root@localhost ~]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 fix-layout in progress 13.00 192.168.122.11 0 0Bytes 0 1 0 fix-layout failed 3.00 volume rebalance: test1: success: [root@localhost ~]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 fix-layout completed 20.00 192.168.122.11 0 0Bytes 0 1 0 fix-layout failed 3.00 volume rebalance: test1: success: [root@localhost ~]# Hence, moving it to ON_QA as the behaviour has changed over time. If any server goes down in the middle of the rebalance process, after reboot only that server rebalance process should start. The issue I mentioned in comment 8 is handled in bug : https://bugzilla.redhat.com/show_bug.cgi?id=1046908
*** This bug has been marked as a duplicate of bug 1046908 ***