Bug 1127748

Summary: DHT + Rebalance + rename :- file is missing after rebalance is completed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: amainkar
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: bbandari, nbalacha, nsathyan, rgowdapp, ssamanta, vagarwal
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.28-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-22 19:45:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 969298, 1146895    
Bug Blocks:    

Description Rachana Patel 2014-08-07 13:05:52 UTC
Description of problem:
=======================
while large file was being copied to mount , renamed that file(after rename, hashed and cached sub-volumes were different) and started rebalance process.

File went missing after rebalance was finished


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-1.el6rhs.x86_64

How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. create, start and FUSE mount Distributed volume having 2 bricks. 
2. start cpying 3+GB file on that mount
-->
cp data /mnt/test1

3. while file copying is in progress rename that file twice
-->
[root@OVM1 test1]# du -sh data
683M	data
[root@OVM1 test1]# mv data rename
[root@OVM1 test1]# ls
rename
[root@OVM1 test1]# du -sh rename
869M	rename
[root@OVM1 test1]# mv rename new
[root@OVM1 test1]# du -sh rename
du: cannot access `rename': No such file or directory
[root@OVM1 test1]# du -sh new
2.5G	new

4. now start rename before file copy operation is completed.
[root@OVM3 brick0]# gluster volume rebalance test1 start force

5. keep checking file on moun tand rebalance status.
Once rebalance is completed file went missing

[root@OVM1 test1]# du -sh new
2.5G	new
[root@OVM1 test1]# du -sh new
2.7G	new
[root@OVM1 test1]# du -sh new
3.1G	new
[root@OVM1 test1]# du -sh new
du: cannot access `new': No such file or directory

[root@OVM1 test1]# ls -l
total 0


[root@OVM3 brick0]# gluster volume rebalance test1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.35.240                1         2.1GB             2             0             0            completed              95.00
                            10.70.35.172                0        0Bytes             2             0             0            completed               0.00
volume rebalance: test1: success: 



Actual results:
================
file is missing


Expected results:
================
file creation + rename + rebalance should not end in data loss


Additional info:
================
log doesn't have any entry for unlink of file

Comment 4 Raghavendra G 2014-08-12 04:41:20 UTC
[root@unused 1127784]# gluster volume info dist
 
Volume Name: dist
Type: Distribute
Volume ID: 33ffc81f-299e-4251-91e3-3fcd07a08cb4
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: booradley:/home/export/dist1
Brick2: booradley:/home/export/dist2

On terminal 1:
[root@unused gfs]# cp -fv ../2 1
cp: overwrite `1'? y
`../2' -> `1'

On terminal 2:
[root@unused 1127784]# gluster volume  rebalance dist start force
volume rebalance: dist: success: Initiated rebalance on volume dist.
Execute "gluster volume rebalance <volume-name> status" to check status.
ID: 718488d1-a37f-444c-8911-5448ef4beba5

On terminal 3:
[root@unused gfs]# mv 1 2
[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:
2

/home/export/dist2:
2
[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:
2

/home/export/dist2:
2
[root@unused gfs]# du -hs 2
2.4G	2
[root@unused gfs]# du -hs 2
2.5G	2
[root@unused gfs]# du -hs 2
2.7G	2
[root@unused gfs]# du -hs 2
718M	2

##### Note that size of 2 suddenly came down to 718M, though when last sampled it was 2.7G and no operations other than cp and rebalance was going on that file

[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:

/home/export/dist2:
2

After size of file came down to 718M, cp on terminal 1 failed with:
cp: writing `1': Operation not permitted
cp: failed to extend `1': Operation not permitted
cp: closing `1': Operation not permitted

At around the same time, I saw migration to be complete.

Comment 6 shylesh 2014-09-19 12:51:25 UTC
verified on glusterfs-3.6.0.28-1

Comment 8 errata-xmlrpc 2014-09-22 19:45:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html