Bug 1745967

Summary: File size was not truncated for all files when tried with rebalance in progress.
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, nbalacha, nchilaka, rhs-bugs, saraut, spalai, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1638333 Environment:
Last Closed: 2019-09-17 11:00:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1638333    
Bug Blocks:    

Description Nithya Balachandran 2019-08-27 10:59:27 UTC
+++ This bug was initially created as a clone of Bug #1638333 +++

Description of problem:
After triggering the rebalance process, simultaneously, truncate command was passed on the mount point, rebalance completed successfully and truncate command did not throw any error, but on doing "# ll ." it was noticed that file size of many files was not truncated to zero as per the truncate command.

Version-Release number of selected component (if applicable):
3.12.2-22

How reproducible:
2/2

Steps to Reproduce:
1. Create a distributed-replicated volume (e.g. 3*3)

2. Start and mount the volume on client node.

3. Add brick to the volume using
# gluster v add-brick volname replica 3 brick10 brick11 brick12

4. From the client node create files on the mount point
e.g.
# for i in {1..8000}; do dd if=/dev/urandom of=file_$i bs=1M count=1; done

5. Trigger rebalance.

6. While rebalance is still in progress, start truncating the files from the mount point
e.g.
# for i in {1..8000}; do truncate -s 0 file_$i; done

7. Wait for the migration to complete.
	
8. Now from the mount point check the size of all the files.


Actual results:
File size for many files was not truncated to zero.

Expected results:
All the files should have size zero.



Server rebalance log snippet:
============================

[2018-10-11 10:05:39.996406] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7860)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file.
[2018-10-11 10:05:40.003945] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7865)} would result in dst node (cloud6-replicate-3:37983752) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file.
[2018-10-11 10:05:40.009722] I [dht-rebalance.c:1516:dht_migrate_file] 0-cloud6-dht: /file_7923: attempting to move from cloud6-replicate-2 to cloud6-replicate-3
[2018-10-11 10:05:40.015101] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7860.
[2018-10-11 10:05:40.021613] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7865.
[2018-10-11 10:05:40.026830] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7905)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:38001728).Skipping file.
[2018-10-11 10:05:40.039200] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7905.


Mount point "# ll ." command output snippet:
===========================================

-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2266
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2267
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2268
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2269
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_227
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2270
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2271
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2272
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2273
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2274
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2276
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279
-rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_228
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2280
-rw-r--r--. 1 root root       0 Oct 11 15:34 file_2281

--- Additional comment from Nithya Balachandran on 2018-11-19 04:34:20 UTC ---

This needs to be fixed. No analysis done yet so no RCA.

Comment 1 Nithya Balachandran 2019-08-27 11:08:43 UTC
RCA:

If a file was truncated during a migration, no error is returned by __dht_rebalance_migrate_data. Now, if the ia_size of the src file is less than the number of bytes written to the destination, we abort the data migration and error out.

Comment 2 Worker Ant 2019-08-27 11:26:47 UTC
REVIEW: https://review.gluster.org/23308 (cluster/dht: Handle file truncates during migration) posted (#1) for review on master by N Balachandran

Comment 3 Worker Ant 2019-09-17 11:00:14 UTC
REVIEW: https://review.gluster.org/23308 (cluster/dht: Handle file truncates during migration) merged (#3) on master by Susant Palai