Bug 1762438

Summary: DHT- gluster rebalance status shows wrong data size after rebalance is completed successfully
Product: [Community] GlusterFS Reporter: Sanju <srakonde>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bmekala, bshetty, bugs, nbalacha, nchilaka, rhs-bugs, saraut, spalai, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1761486 Environment:
Last Closed: 2019-10-18 05:22:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1761486    
Bug Blocks:    

Comment 1 Worker Ant 2019-10-16 18:15:33 UTC
REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalnce data size after glusterd restart) posted (#1) for review on master by Sanju Rakonde

Comment 2 Nithya Balachandran 2019-10-17 02:50:30 UTC
Description of problem:
=======================
While doing a gluster re-balance from 1*(4+2) to 2*(4+2) and then to 3*(4+2),the re-balance status shows inconsistency (wrong data size) after the re-balance is completed. The re-balance status shows proper data till 4~6 hours of completion, then it mismatches. This was seen in both 2*(4+2) and 3*(4+2) configuration. Though the size was in GB, after the completion of re-balance it shows in PB.

(Made the initial description private as it contained internal data on server hostnames).

Comment 3 Nithya Balachandran 2019-10-17 02:51:20 UTC
RCA:
--- Additional comment from Sanju on 2019-10-16 15:28:22 IST ---

Looks like this is a day1 issue.

From gdb:
glusterd_store_retrieve_node_state (volinfo=volinfo@entry=0x555555e67fc0) at glusterd-store.c:2980
2980	            volinfo->rebal.rebalance_data = atoi(value);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 0
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) n
3048	        GF_FREE(key);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 18446744072560312320
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) p atoi(value)
$20 = -1149239296
(gdb) p/u atoi(value)
$21 = 3145728000

The issue here is, atoi() is returning negative value because of overflow. Below statements from gdb proves it.

(gdb) set volinfo->rebal.rebalance_data=atoi("314572800")
(gdb) p volinfo->rebal.rebalance_data 
$41 = 314572800
(gdb) set volinfo->rebal.rebalance_data=atoi("3145728000")
(gdb) p volinfo->rebal.rebalance_data 
$42 = 18446744072560312320            <-- wrong value
(gdb)

Comment 4 Worker Ant 2019-10-18 05:22:13 UTC
REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalance data size after glusterd restart) merged (#4) on master by Atin Mukherjee