Bug 1762438 - DHT- gluster rebalance status shows wrong data size after rebalance is completed successfully
Summary: DHT- gluster rebalance status shows wrong data size after rebalance is comple...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On: 1761486
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-16 18:04 UTC by Sanju
Modified: 2019-10-18 05:22 UTC (History)
11 users (show)

Fixed In Version:
Clone Of: 1761486
Environment:
Last Closed: 2019-10-18 05:22:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23560 0 None Merged glusterd: display correct rebalance data size after glusterd restart 2019-10-18 05:22:12 UTC

Comment 1 Worker Ant 2019-10-16 18:15:33 UTC
REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalnce data size after glusterd restart) posted (#1) for review on master by Sanju Rakonde

Comment 2 Nithya Balachandran 2019-10-17 02:50:30 UTC
Description of problem:
=======================
While doing a gluster re-balance from 1*(4+2) to 2*(4+2) and then to 3*(4+2),the re-balance status shows inconsistency (wrong data size) after the re-balance is completed. The re-balance status shows proper data till 4~6 hours of completion, then it mismatches. This was seen in both 2*(4+2) and 3*(4+2) configuration. Though the size was in GB, after the completion of re-balance it shows in PB.

(Made the initial description private as it contained internal data on server hostnames).

Comment 3 Nithya Balachandran 2019-10-17 02:51:20 UTC
RCA:
--- Additional comment from Sanju on 2019-10-16 15:28:22 IST ---

Looks like this is a day1 issue.

From gdb:
glusterd_store_retrieve_node_state (volinfo=volinfo@entry=0x555555e67fc0) at glusterd-store.c:2980
2980	            volinfo->rebal.rebalance_data = atoi(value);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 0
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) n
3048	        GF_FREE(key);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 18446744072560312320
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) p atoi(value)
$20 = -1149239296
(gdb) p/u atoi(value)
$21 = 3145728000

The issue here is, atoi() is returning negative value because of overflow. Below statements from gdb proves it.

(gdb) set volinfo->rebal.rebalance_data=atoi("314572800")
(gdb) p volinfo->rebal.rebalance_data 
$41 = 314572800
(gdb) set volinfo->rebal.rebalance_data=atoi("3145728000")
(gdb) p volinfo->rebal.rebalance_data 
$42 = 18446744072560312320            <-- wrong value
(gdb)

Comment 4 Worker Ant 2019-10-18 05:22:13 UTC
REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalance data size after glusterd restart) merged (#4) on master by Atin Mukherjee


Note You need to log in before you can comment on or make changes to this bug.