Bug 1762438

Summary:	DHT- gluster rebalance status shows wrong data size after rebalance is completed successfully
Product:	[Community] GlusterFS	Reporter:	Sanju <srakonde>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	amukherj, bmekala, bshetty, bugs, nbalacha, nchilaka, rhs-bugs, saraut, spalai, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1761486	Environment:
Last Closed:	2019-10-18 05:22:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1761486
Bug Blocks:

Comment 1 Worker Ant 2019-10-16 18:15:33 UTC

REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalnce data size after glusterd restart) posted (#1) for review on master by Sanju Rakonde

Comment 2 Nithya Balachandran 2019-10-17 02:50:30 UTC

Description of problem:
=======================
While doing a gluster re-balance from 1*(4+2) to 2*(4+2) and then to 3*(4+2),the re-balance status shows inconsistency (wrong data size) after the re-balance is completed. The re-balance status shows proper data till 4~6 hours of completion, then it mismatches. This was seen in both 2*(4+2) and 3*(4+2) configuration. Though the size was in GB, after the completion of re-balance it shows in PB.

(Made the initial description private as it contained internal data on server hostnames).

Comment 3 Nithya Balachandran 2019-10-17 02:51:20 UTC

RCA:
--- Additional comment from Sanju on 2019-10-16 15:28:22 IST ---

Looks like this is a day1 issue.

From gdb:
glusterd_store_retrieve_node_state (volinfo=volinfo@entry=0x555555e67fc0) at glusterd-store.c:2980
2980	            volinfo->rebal.rebalance_data = atoi(value);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 0
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) n
3048	        GF_FREE(key);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 18446744072560312320
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) p atoi(value)
$20 = -1149239296
(gdb) p/u atoi(value)
$21 = 3145728000

The issue here is, atoi() is returning negative value because of overflow. Below statements from gdb proves it.

(gdb) set volinfo->rebal.rebalance_data=atoi("314572800")
(gdb) p volinfo->rebal.rebalance_data 
$41 = 314572800
(gdb) set volinfo->rebal.rebalance_data=atoi("3145728000")
(gdb) p volinfo->rebal.rebalance_data 
$42 = 18446744072560312320            <-- wrong value
(gdb)

Comment 4 Worker Ant 2019-10-18 05:22:13 UTC

REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalance data size after glusterd restart) merged (#4) on master by Atin Mukherjee