+++ This bug was initially created as a clone of Bug #1638333 +++ Description of problem: After triggering the rebalance process, simultaneously, truncate command was passed on the mount point, rebalance completed successfully and truncate command did not throw any error, but on doing "# ll ." it was noticed that file size of many files was not truncated to zero as per the truncate command. Version-Release number of selected component (if applicable): 3.12.2-22 How reproducible: 2/2 Steps to Reproduce: 1. Create a distributed-replicated volume (e.g. 3*3) 2. Start and mount the volume on client node. 3. Add brick to the volume using # gluster v add-brick volname replica 3 brick10 brick11 brick12 4. From the client node create files on the mount point e.g. # for i in {1..8000}; do dd if=/dev/urandom of=file_$i bs=1M count=1; done 5. Trigger rebalance. 6. While rebalance is still in progress, start truncating the files from the mount point e.g. # for i in {1..8000}; do truncate -s 0 file_$i; done 7. Wait for the migration to complete. 8. Now from the mount point check the size of all the files. Actual results: File size for many files was not truncated to zero. Expected results: All the files should have size zero. Server rebalance log snippet: ============================ [2018-10-11 10:05:39.996406] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7860)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file. [2018-10-11 10:05:40.003945] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7865)} would result in dst node (cloud6-replicate-3:37983752) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file. [2018-10-11 10:05:40.009722] I [dht-rebalance.c:1516:dht_migrate_file] 0-cloud6-dht: /file_7923: attempting to move from cloud6-replicate-2 to cloud6-replicate-3 [2018-10-11 10:05:40.015101] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7860. [2018-10-11 10:05:40.021613] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7865. [2018-10-11 10:05:40.026830] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7905)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:38001728).Skipping file. [2018-10-11 10:05:40.039200] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7905. Mount point "# ll ." command output snippet: =========================================== -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2266 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2267 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2268 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2269 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_227 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2270 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2271 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2272 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2273 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2274 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2276 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_228 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2280 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2281 --- Additional comment from Nithya Balachandran on 2018-11-19 04:34:20 UTC --- This needs to be fixed. No analysis done yet so no RCA.
RCA: If a file was truncated during a migration, no error is returned by __dht_rebalance_migrate_data. Now, if the ia_size of the src file is less than the number of bytes written to the destination, we abort the data migration and error out.
REVIEW: https://review.gluster.org/23308 (cluster/dht: Handle file truncates during migration) posted (#1) for review on master by N Balachandran
REVIEW: https://review.gluster.org/23308 (cluster/dht: Handle file truncates during migration) merged (#3) on master by Susant Palai