Bug 1286171
| Summary: | Rebalance : Status lists failures on stopping rebalance while it is in progress | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Susant Kumar Palai <spalai> | |
| Component: | distribute | Assignee: | Barak Sason Rofman <bsasonro> | |
| Status: | CLOSED ERRATA | QA Contact: | Kshithij Iyer <kiyer> | |
| Severity: | low | Docs Contact: | ||
| Priority: | low | |||
| Version: | rhgs-3.1 | CC: | bsasonro, kiyer, pprakash, puebele, rhs-bugs, rkothiya, saraut, senaik, sheggodu, storage-qa-internal | |
| Target Milestone: | --- | Keywords: | Triaged, ZStream | |
| Target Release: | RHGS 3.5.z Batch Update 3 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | dht-rebalance-usability, dht-rca-unknown | |||
| Fixed In Version: | glusterfs-6.0-49 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1034173 | |||
| : | 1800956 (view as bug list) | Environment: | ||
| Last Closed: | 2020-12-17 04:50:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1034173 | |||
| Bug Blocks: | 1800956 | |||
|
Comment 2
Nithya Balachandran
2016-01-19 14:51:46 UTC
Still exists in the latest code: [2017-08-11 09:08:58.109896] I [MSGID: 109029] [dht-rebalance.c:5186:gf_defrag_stop] 0-: Received stop command on rebalance [2017-08-11 09:08:58.110144] I [MSGID: 109028] [dht-rebalance.c:5000:gf_defrag_status_get] 0-glusterfs: Rebalance is stopped. Time taken is 45.00 secs [2017-08-11 09:08:58.110193] I [MSGID: 109028] [dht-rebalance.c:5004:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 1181, failures: 0, skipped: 106 [2017-08-11 09:08:58.118572] I [dht-rebalance.c:1513:dht_migrate_file] 0-vol1-dht: /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/file-17: attempting to move from vol1-client-2 to vol1-client-0 [2017-08-11 09:08:58.120142] I [dht-rebalance.c:3123:gf_defrag_process_dir] 0-vol1-dht: migrate data called on /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 [2017-08-11 09:08:58.128104] W [dht-rebalance.c:3297:gf_defrag_process_dir] 0-vol1-dht: Found error from gf_defrag_get_entry [2017-08-11 09:12:44.777354] E [MSGID: 109111] [dht-rebalance.c:3600:gf_defrag_fix_layout] 0-vol1-dht: gf_defrag_process_dir failed for directory: /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 ... [2017-08-11 09:12:44.809059] E [MSGID: 109016] [dht-rebalance.c:3811:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26/dir-27 [2017-08-11 09:12:44.809158] E [MSGID: 109016] [dht-rebalance.c:3811:gf_defrag_fix_layout] 0-vol1-dht: Fix layout failed for /dir-1/dir-2/dir-3/dir-4/dir-5/dir-6/dir-7/dir-8/dir-9/dir-10/dir-11/dir-12/dir-13/dir-14/dir-15/dir-16/dir-17/dir-18/dir-19/dir-20/dir-21/dir-22/dir-23/dir-24/dir-25/dir-26 Tested with latest upstream code - could not reproduce bug.
Test steps:
1) Created 3x3 vol:
[root@Node1 ~]# gluster volume status
Status of volume: distrep
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick Node1:/root/bricks/11 49152 0 Y 2088
Brick Node1:/root/bricks/12 49153 0 Y 2096
Brick Node1:/root/bricks/13 49154 0 Y 2105
Brick Node1:/root/bricks/21 49155 0 Y 2114
Brick Node1:/root/bricks/22 49156 0 Y 2134
Brick Node1:/root/bricks/23 49157 0 Y 2127
Brick Node1:/root/bricks/31 49158 0 Y 2151
Brick Node1:/root/bricks/32 49159 0 Y 2162
Brick Node1:/root/bricks/33 49160 0 Y 2169
Self-heal Daemon on localhost N/A N/A Y 2217
Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks
2) Mounted the volume using FUSE
3) Using a script, created a large number of small files through the mount point.
4) Added 3 more bricks to the vol:
[root@Node1 ~]# gluster volume add-brick distrep Node1:/root/bricks/41 Node1:/root/bricks/42 Node1:/root/bricks/43 force
volume add-brick: success
[root@Node1 ~]# gluster volume status
Status of volume: distrep
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick Node1:/root/bricks/11 49161 0 Y 2754
Brick Node1:/root/bricks/12 49162 0 Y 2766
Brick Node1:/root/bricks/13 49163 0 Y 2777
Brick Node1:/root/bricks/21 49164 0 Y 2786
Brick Node1:/root/bricks/22 49165 0 Y 2793
Brick Node1:/root/bricks/23 49166 0 Y 2800
Brick Node1:/root/bricks/31 49167 0 Y 2811
Brick Node1:/root/bricks/32 49168 0 Y 2818
Brick Node1:/root/bricks/33 49169 0 Y 2831
Brick Node1:/root/bricks/41 49170 0 Y 3026
Brick Node1:/root/bricks/42 49171 0 Y 3046
Brick Node1:/root/bricks/43 49172 0 Y 3066
Self-heal Daemon on localhost N/A N/A Y 2854
Task Status of Volume distrep
------------------------------------------------------------------------------
There are no active volume tasks
5) Initiated rebalance:
[root@Node1 ~]# gluster volume rebalance distrep start
volume rebalance: distrep: success: Rebalance on distrep has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 4cd2946f-94c0-4c41-af80-da402c471243
6)Allowed rebalance to run for ~40 seconds:
[root@Node1 ~]# gluster volume rebalance distrep status
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 749 49.6KB 7253 0 0 in progress 0:00:35
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: distrep: success
7) Stopped rebalance:
[root@Node1 ~]# gluster volume rebalance distrep stop
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 791 50.0KB 7253 0 0 completed 0:00:42
volume rebalance: distrep: success: rebalance process may be in the middle of a file migration.
The process will be fully stopped once the migration of the file is complete.
Please check rebalance process for completion before doing any further brick related tasks on the volume.
8) Checked rebalance status:
[root@Node1 ~]# gluster volume rebalance distrep status
volume rebalance: distrep: failed: Rebalance not started for volume distrep.
Rebalance log ending:
[2020-01-28 08:52:29.943902] I [dht-rebalance.c:1596:dht_migrate_file] 0-distrep-dht: /4720: attempting to move from distrep-replicate-1 to distrep-replicate-0
[2020-01-28 08:52:29.951851] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /2354 from subvolume distrep-replicate-1 to distrep-replicate-0
[2020-01-28 08:52:30.104331] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /1444 from subvolume distrep-replicate-1 to distrep-replicate-0
[2020-01-28 08:52:30.187361] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /2201 from subvolume distrep-replicate-1 to distrep-replicate-0
[2020-01-28 08:52:30.195028] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /4720 from subvolume distrep-replicate-1 to distrep-replicate-0
[2020-01-28 08:52:30.197315] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-distrep-dht: Rebalance is completed. Time taken is 42.00 secs
[2020-01-28 08:52:30.197332] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-distrep-dht: Files migrated: 791, size: 51212, lookups: 7253, failures: 0, skipped: 0
[2020-01-28 08:52:30.197603] W [glusterfsd.c:1441:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x94e2) [0x7f486ad4b4e2] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0x95) [0x406b45] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x4069fb] ) 0-: received signum (15), shutting down
Result - no failures appear.
Managed to reproduce with upstream code by creating large amount of nested directories: [2020-01-28 14:31:42.411679] I [MSGID: 109029] [dht-rebalance.c:5241:gf_defrag_stop] 0-: Received stop command on rebalance [2020-01-28 14:31:42.411725] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-glusterfs: Rebalance is stopped. Time taken is 176.00 secs [2020-01-28 14:31:42.411733] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-glusterfs: Files migrated: 2833, size: 28330, lookups: 9218, failures: 0, skipped: 0 [2020-01-28 14:31:42.448100] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/313.txt from subvolume distrep-replicate-2 to distrep-replicate-0 [2020-01-28 14:31:42.452070] W [dht-rebalance.c:3447:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry [2020-01-28 14:31:42.452764] E [MSGID: 109111] [dht-rebalance.c:3971:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31 [2020-01-28 14:31:42.453498] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30 [2020-01-28 14:31:42.454547] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29 [2020-01-28 14:31:42.455027] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28 [2020-01-28 14:31:42.455449] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27 [2020-01-28 14:31:42.456444] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26 [2020-01-28 14:31:42.457232] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25 [2020-01-28 14:31:42.457986] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24 [2020-01-28 14:31:42.459146] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23 [2020-01-28 14:31:42.460915] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22 [2020-01-28 14:31:42.461968] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21 [2020-01-28 14:31:42.463126] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20 [2020-01-28 14:31:42.464036] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 [2020-01-28 14:31:42.464749] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18 [2020-01-28 14:31:42.466331] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17 [2020-01-28 14:31:42.467066] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16 [2020-01-28 14:31:42.467972] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15 [2020-01-28 14:31:42.468470] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14 [2020-01-28 14:31:42.469363] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12/13 [2020-01-28 14:31:42.469960] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11/12 [2020-01-28 14:31:42.471430] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10/11 [2020-01-28 14:31:42.472479] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9/10 [2020-01-28 14:31:42.473932] I [MSGID: 109022] [dht-rebalance.c:2231:dht_migrate_file] 0-distrep-dht: completed migration of /0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/240.txt from subvolume distrep-replicate-1 to distrep-replicate-3 [2020-01-28 14:31:42.474395] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8/9 [2020-01-28 14:31:42.475655] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7/8 [2020-01-28 14:31:42.476661] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6/7 [2020-01-28 14:31:42.477752] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5/6 [2020-01-28 14:31:42.478367] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4/5 [2020-01-28 14:31:42.479049] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3/4 [2020-01-28 14:31:42.479645] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2/3 [2020-01-28 14:31:42.480148] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1/2 [2020-01-28 14:31:42.480736] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0/1 [2020-01-28 14:31:42.481243] E [MSGID: 109016] [dht-rebalance.c:3906:gf_defrag_fix_layout] 0-distrep-dht: Fix layout failed for /0 [2020-01-28 14:31:42.482449] I [MSGID: 109028] [dht-rebalance.c:5062:gf_defrag_status_get] 0-distrep-dht: Rebalance is failed. Time taken is 176.00 secs [2020-01-28 14:31:42.482463] I [MSGID: 109028] [dht-rebalance.c:5064:gf_defrag_status_get] 0-distrep-dht: Files migrated: 2835, size: 28350, lookups: 9218, failures: 33, skipped: 0 [2020-01-28 14:31:42.482749] W [glusterfsd.c:1441:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x94e2) [0x7fbe90bd64e2] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0x95) [0x406b45] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x4069fb] ) 0-: received signum (15), shutting down Will proceed to debugging the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603 |