Bug 1286171
Summary: | Rebalance : Status lists failures on stopping rebalance while it is in progress | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Susant Kumar Palai <spalai> | |
Component: | distribute | Assignee: | Barak Sason Rofman <bsasonro> | |
Status: | CLOSED ERRATA | QA Contact: | Kshithij Iyer <kiyer> | |
Severity: | low | Docs Contact: | ||
Priority: | low | |||
Version: | rhgs-3.1 | CC: | bsasonro, kiyer, pprakash, puebele, rhs-bugs, rkothiya, saraut, senaik, sheggodu, storage-qa-internal | |
Target Milestone: | --- | Keywords: | Triaged, ZStream | |
Target Release: | RHGS 3.5.z Batch Update 3 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | dht-rebalance-usability, dht-rca-unknown | |||
Fixed In Version: | glusterfs-6.0-49 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1034173 | |||
: | 1800956 (view as bug list) | Environment: | ||
Last Closed: | 2020-12-17 04:50:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1034173 | |||
Bug Blocks: | 1800956 |
Comment 2
Nithya Balachandran
2016-01-19 14:51:46 UTC
Still exists in the latest code: [2017-08-11 09:08:58.109896] I [MSGID: 109029] [dht-rebalance.c:5186:gf_defrag_stop] 0-: Received stop command on rebalance [2017-08-11 09:08:58.110144] I [MSGID: 109028] [dht-rebalance.c:5000:gf_defrag_status_get] 0-glusterfs: Rebalance is stopped. Time taken is 45.00 secs [2017-08-11 09:08:58.110193] I [MSGID: 109028] [dht-rebalance.c:5004:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 1181, failures: 0, skipped: 106 [2017-08-11 09:08:58.118572] I [dht-rebalance.c:1513:dht_migrate_file] 0-vol1-dht: /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/file-17: attempting to move from vol1-client-2 to vol1-client-0 [2017-08-11 09:08:58.120142] I [dht-rebalance.c:3123:gf_defrag_process_dir] 0-vol1-dht: migrate data called on /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 [2017-08-11 09:08:58.128104] W [dht-rebalance.c:3297:gf_defrag_process_dir] 0-vol1-dht: Found error from gf_defrag_get_entry [2017-08-11 09:12:44.777354] E [MSGID: 109111] [dht-rebalance.c:3600:gf_defrag_fix_layout] 0-vol1-dht: gf_defrag_process_dir failed for directory: /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 ... [2017-08-11 09:12:44.809059] E [MSGID: 109016] [dht-rebalance.c:3811:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 [2017-08-11 09:12:44.809158] E [MSGID: 109016] [dht-rebalance.c:3811:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26 Tested with latest upstream code - could not reproduce bug. Test steps: 1) Created 3x3 vol: [root@Node1 ~]# gluster volume status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick Node1:/root/bricks/11 49152 0 Y 2088 Brick Node1:/root/bricks/12 49153 0 Y 2096 Brick Node1:/root/bricks/13 49154 0 Y 2105 Brick Node1:/root/bricks/21 49155 0 Y 2114 Brick Node1:/root/bricks/22 49156 0 Y 2134 Brick Node1:/root/bricks/23 49157 0 Y 2127 Brick Node1:/root/bricks/31 49158 0 Y 2151 Brick Node1:/root/bricks/32 49159 0 Y 2162 Brick Node1:/root/bricks/33 49160 0 Y 2169 Self-heal Daemon on localhost N/A N/A Y 2217 Task Status of Volume distrep ------------------------------------------------------------------------------ There are no active volume tasks 2) Mounted the volume using FUSE 3) Using a script, created a large number of small files through the mount point. 4) Added 3 more bricks to the vol: [root@Node1 ~]# gluster volume add-brick distrep Node1:/root/bricks/41 Node1:/root/bricks/42 Node1:/root/bricks/43 force volume add-brick: success [root@Node1 ~]# gluster volume status Status of volume: distrep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick Node1:/root/bricks/11 49161 0 Y 2754 Brick Node1:/root/bricks/12 49162 0 Y 2766 Brick Node1:/root/bricks/13 49163 0 Y 2777 Brick Node1:/root/bricks/21 49164 0 Y 2786 Brick Node1:/root/bricks/22 49165 0 Y 2793 Brick Node1:/root/bricks/23 49166 0 Y 2800 Brick Node1:/root/bricks/31 49167 0 Y 2811 Brick Node1:/root/bricks/32 49168 0 Y 2818 Brick Node1:/root/bricks/33 49169 0 Y 2831 Brick Node1:/root/bricks/41 49170 0 Y 3026 Brick Node1:/root/bricks/42 49171 0 Y 3046 Brick Node1:/root/bricks/43 49172 0 Y 3066 Self-heal Daemon on localhost N/A N/A Y 2854 Task Status of Volume distrep ------------------------------------------------------------------------------ There are no active volume tasks 5) Initiated rebalance: [root@Node1 ~]# gluster volume rebalance distrep start volume rebalance: distrep: success: Rebalance on distrep has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 4cd2946f-94c0-4c41-af80-da402c471243 6)Allowed rebalance to run for ~40 seconds: [root@Node1 ~]# gluster volume rebalance distrep status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 749 49.6KB 7253 0 0 in progress 0:00:35 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: distrep: success 7) Stopped rebalance: [root@Node1 ~]# gluster volume rebalance distrep stop Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 791 50.0KB 7253 0 0 completed 0:00:42 volume rebalance: distrep: success: rebalance process may be in the middle of a file migration. The process will be fully stopped once the migration of the file is complete. Please check rebalance process for completion before doing any further brick related tasks on the volume. 8) Checked rebalance status: [root@Node1 ~]# gluster volume rebalance distrep status volume rebalance: distrep: failed: Rebalance not started for volume distrep. Rebalance log ending: [2020-01-28 08:52:29.943902] I [dht-rebalance.c:1596:dht_migrate_file] 0-distrep-dht: /4720: attempting to move from distrep-replicate-1 to distrep-replicate-0 [2020-01-28 08:52:29.951851] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /2354 from subvolume distrep-replicate-1 to distrep-replicate-0 [2020-01-28 08:52:30.104331] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /1444 from subvolume distrep-replicate-1 to distrep-replicate-0 [2020-01-28 08:52:30.187361] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /2201 from subvolume distrep-replicate-1 to distrep-replicate-0 [2020-01-28 08:52:30.195028] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /4720 from subvolume distrep-replicate-1 to distrep-replicate-0 [2020-01-28 08:52:30.197315] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-distrep-dht: Rebalance is completed. Time taken is 42.00 secs [2020-01-28 08:52:30.197332] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-distrep-dht: Files migrated: 791, size: 51212, lookups: 7253, failures: 0, skipped: 0 [2020-01-28 08:52:30.197603] W [glusterfsd.c:1441:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x94e2) [0x7f486ad4b4e2] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0x95) [0x406b45] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x4069fb] ) 0-: received signum (15), shutting down Result - no failures appear. Managed to reproduce with upstream code by creating large amount of nested directories: [2020-01-28 14:31:42.411679] I [MSGID: 109029] [dht-rebalance.c:5241:gf_defrag_stop] 0-: Received stop command on rebalance [2020-01-28 14:31:42.411725] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-glusterfs: Rebalance is stopped. Time taken is 176.00 secs [2020-01-28 14:31:42.411733] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-glusterfs: Files migrated: 2833, size: 28330, lookups: 9218, failures: 0, skipped: 0 [2020-01-28 14:31:42.448100] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/313.txt from subvolume distrep-replicate-2 to distrep-replicate-0 [2020-01-28 14:31:42.452070] W [dht-rebalance.c:3447:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry [2020-01-28 14:31:42.452764] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31 [2020-01-28 14:31:42.453498] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30 [2020-01-28 14:31:42.454547] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29 [2020-01-28 14:31:42.455027] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28 [2020-01-28 14:31:42.455449] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27 [2020-01-28 14:31:42.456444] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26 [2020-01-28 14:31:42.457232] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25 [2020-01-28 14:31:42.457986] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24 [2020-01-28 14:31:42.459146] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23 [2020-01-28 14:31:42.460915] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22 [2020-01-28 14:31:42.461968] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21 [2020-01-28 14:31:42.463126] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20 [2020-01-28 14:31:42.464036] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 [2020-01-28 14:31:42.464749] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18 [2020-01-28 14:31:42.466331] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17 [2020-01-28 14:31:42.467066] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16 [2020-01-28 14:31:42.467972] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15 [2020-01-28 14:31:42.468470] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14 [2020-01-28 14:31:42.469363] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13 [2020-01-28 14:31:42.469960] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12 [2020-01-28 14:31:42.471430] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11 [2020-01-28 14:31:42.472479] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10 [2020-01-28 14:31:42.473932] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/240.txt from subvolume distrep-replicate-1 to distrep-replicate-3 [2020-01-28 14:31:42.474395] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9 [2020-01-28 14:31:42.475655] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8 [2020-01-28 14:31:42.476661] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7 [2020-01-28 14:31:42.477752] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6 [2020-01-28 14:31:42.478367] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5 [2020-01-28 14:31:42.479049] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4 [2020-01-28 14:31:42.479645] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3 [2020-01-28 14:31:42.480148] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2 [2020-01-28 14:31:42.480736] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1 [2020-01-28 14:31:42.481243] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0 [2020-01-28 14:31:42.482449] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-distrep-dht: Rebalance is failed. Time taken is 176.00 secs [2020-01-28 14:31:42.482463] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-distrep-dht: Files migrated: 2835, size: 28350, lookups: 9218, failures: 33, skipped: 0 [2020-01-28 14:31:42.482749] W [glusterfsd.c:1441:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x94e2) [0x7fbe90bd64e2] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0x95) [0x406b45] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x4069fb] ) 0-: received signum (15), shutting down Will proceed to debugging the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603 |