REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalnce data size after glusterd restart) posted (#1) for review on master by Sanju Rakonde
Description of problem: ======================= While doing a gluster re-balance from 1*(4+2) to 2*(4+2) and then to 3*(4+2),the re-balance status shows inconsistency (wrong data size) after the re-balance is completed. The re-balance status shows proper data till 4~6 hours of completion, then it mismatches. This was seen in both 2*(4+2) and 3*(4+2) configuration. Though the size was in GB, after the completion of re-balance it shows in PB. (Made the initial description private as it contained internal data on server hostnames).
RCA: --- Additional comment from Sanju on 2019-10-16 15:28:22 IST --- Looks like this is a day1 issue. From gdb: glusterd_store_retrieve_node_state (volinfo=volinfo@entry=0x555555e67fc0) at glusterd-store.c:2980 2980 volinfo->rebal.rebalance_data = atoi(value); 4: value = 0x5555557dfea0 "3145728000" 3: key = 0x5555557e41d0 "size" 2: volinfo->rebal.rebalance_data = 0 1: volinfo->volname = "test1", '\000' <repeats 250 times> (gdb) n 3048 GF_FREE(key); 4: value = 0x5555557dfea0 "3145728000" 3: key = 0x5555557e41d0 "size" 2: volinfo->rebal.rebalance_data = 18446744072560312320 1: volinfo->volname = "test1", '\000' <repeats 250 times> (gdb) p atoi(value) $20 = -1149239296 (gdb) p/u atoi(value) $21 = 3145728000 The issue here is, atoi() is returning negative value because of overflow. Below statements from gdb proves it. (gdb) set volinfo->rebal.rebalance_data=atoi("314572800") (gdb) p volinfo->rebal.rebalance_data $41 = 314572800 (gdb) set volinfo->rebal.rebalance_data=atoi("3145728000") (gdb) p volinfo->rebal.rebalance_data $42 = 18446744072560312320 <-- wrong value (gdb)
REVIEW: https://review.gluster.org/23560 (glusterd: display correct rebalance data size after glusterd restart) merged (#4) on master by Atin Mukherjee