Description of problem: ======================= while large file was being copied to mount , renamed that file(after rename, hashed and cached sub-volumes were different) and started rebalance process. File went missing after rebalance was finished Version-Release number of selected component (if applicable): ============================================================= 3.6.0.27-1.el6rhs.x86_64 How reproducible: ================= intermittent Steps to Reproduce: =================== 1. create, start and FUSE mount Distributed volume having 2 bricks. 2. start cpying 3+GB file on that mount --> cp data /mnt/test1 3. while file copying is in progress rename that file twice --> [root@OVM1 test1]# du -sh data 683M data [root@OVM1 test1]# mv data rename [root@OVM1 test1]# ls rename [root@OVM1 test1]# du -sh rename 869M rename [root@OVM1 test1]# mv rename new [root@OVM1 test1]# du -sh rename du: cannot access `rename': No such file or directory [root@OVM1 test1]# du -sh new 2.5G new 4. now start rename before file copy operation is completed. [root@OVM3 brick0]# gluster volume rebalance test1 start force 5. keep checking file on moun tand rebalance status. Once rebalance is completed file went missing [root@OVM1 test1]# du -sh new 2.5G new [root@OVM1 test1]# du -sh new 2.7G new [root@OVM1 test1]# du -sh new 3.1G new [root@OVM1 test1]# du -sh new du: cannot access `new': No such file or directory [root@OVM1 test1]# ls -l total 0 [root@OVM3 brick0]# gluster volume rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.35.240 1 2.1GB 2 0 0 completed 95.00 10.70.35.172 0 0Bytes 2 0 0 completed 0.00 volume rebalance: test1: success: Actual results: ================ file is missing Expected results: ================ file creation + rename + rebalance should not end in data loss Additional info: ================ log doesn't have any entry for unlink of file
[root@unused 1127784]# gluster volume info dist Volume Name: dist Type: Distribute Volume ID: 33ffc81f-299e-4251-91e3-3fcd07a08cb4 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: booradley:/home/export/dist1 Brick2: booradley:/home/export/dist2 On terminal 1: [root@unused gfs]# cp -fv ../2 1 cp: overwrite `1'? y `../2' -> `1' On terminal 2: [root@unused 1127784]# gluster volume rebalance dist start force volume rebalance: dist: success: Initiated rebalance on volume dist. Execute "gluster volume rebalance <volume-name> status" to check status. ID: 718488d1-a37f-444c-8911-5448ef4beba5 On terminal 3: [root@unused gfs]# mv 1 2 [root@unused gfs]# ls /home/export/dist? /home/export/dist1: 2 /home/export/dist2: 2 [root@unused gfs]# ls /home/export/dist? /home/export/dist1: 2 /home/export/dist2: 2 [root@unused gfs]# du -hs 2 2.4G 2 [root@unused gfs]# du -hs 2 2.5G 2 [root@unused gfs]# du -hs 2 2.7G 2 [root@unused gfs]# du -hs 2 718M 2 ##### Note that size of 2 suddenly came down to 718M, though when last sampled it was 2.7G and no operations other than cp and rebalance was going on that file [root@unused gfs]# ls /home/export/dist? /home/export/dist1: /home/export/dist2: 2 After size of file came down to 718M, cp on terminal 1 failed with: cp: writing `1': Operation not permitted cp: failed to extend `1': Operation not permitted cp: closing `1': Operation not permitted At around the same time, I saw migration to be complete.
verified on glusterfs-3.6.0.28-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html