1127748 – DHT + Rebalance + rename :- file is missing after rebalance is completed

Bug 1127748 - DHT + Rebalance + rename :- file is missing after rebalance is completed

Summary: DHT + Rebalance + rename :- file is missing after rebalance is completed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.0
Assignee:	Nithya Balachandran
QA Contact:	amainkar
Docs Contact:
URL:
Whiteboard:
Depends On:	969298 1146895
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-07 13:05 UTC by Rachana Patel
Modified:	2015-05-13 16:53 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.6.0.28-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-09-22 19:45:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:1278	0	normal	SHIPPED_LIVE	Red Hat Storage Server 3.0 bug fix and enhancement update	2014-09-22 23:26:55 UTC

Description Rachana Patel 2014-08-07 13:05:52 UTC

Description of problem:
=======================
while large file was being copied to mount , renamed that file(after rename, hashed and cached sub-volumes were different) and started rebalance process.

File went missing after rebalance was finished


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-1.el6rhs.x86_64

How reproducible:
=================
intermittent


Steps to Reproduce:
===================
1. create, start and FUSE mount Distributed volume having 2 bricks. 
2. start cpying 3+GB file on that mount
-->
cp data /mnt/test1

3. while file copying is in progress rename that file twice
-->
[root@OVM1 test1]# du -sh data
683M	data
[root@OVM1 test1]# mv data rename
[root@OVM1 test1]# ls
rename
[root@OVM1 test1]# du -sh rename
869M	rename
[root@OVM1 test1]# mv rename new
[root@OVM1 test1]# du -sh rename
du: cannot access `rename': No such file or directory
[root@OVM1 test1]# du -sh new
2.5G	new

4. now start rename before file copy operation is completed.
[root@OVM3 brick0]# gluster volume rebalance test1 start force

5. keep checking file on moun tand rebalance status.
Once rebalance is completed file went missing

[root@OVM1 test1]# du -sh new
2.5G	new
[root@OVM1 test1]# du -sh new
2.7G	new
[root@OVM1 test1]# du -sh new
3.1G	new
[root@OVM1 test1]# du -sh new
du: cannot access `new': No such file or directory

[root@OVM1 test1]# ls -l
total 0


[root@OVM3 brick0]# gluster volume rebalance test1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.35.240                1         2.1GB             2             0             0            completed              95.00
                            10.70.35.172                0        0Bytes             2             0             0            completed               0.00
volume rebalance: test1: success: 



Actual results:
================
file is missing


Expected results:
================
file creation + rename + rebalance should not end in data loss


Additional info:
================
log doesn't have any entry for unlink of file

Comment 4 Raghavendra G 2014-08-12 04:41:20 UTC

[root@unused 1127784]# gluster volume info dist
 
Volume Name: dist
Type: Distribute
Volume ID: 33ffc81f-299e-4251-91e3-3fcd07a08cb4
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: booradley:/home/export/dist1
Brick2: booradley:/home/export/dist2

On terminal 1:
[root@unused gfs]# cp -fv ../2 1
cp: overwrite `1'? y
`../2' -> `1'

On terminal 2:
[root@unused 1127784]# gluster volume  rebalance dist start force
volume rebalance: dist: success: Initiated rebalance on volume dist.
Execute "gluster volume rebalance <volume-name> status" to check status.
ID: 718488d1-a37f-444c-8911-5448ef4beba5

On terminal 3:
[root@unused gfs]# mv 1 2
[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:
2

/home/export/dist2:
2
[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:
2

/home/export/dist2:
2
[root@unused gfs]# du -hs 2
2.4G	2
[root@unused gfs]# du -hs 2
2.5G	2
[root@unused gfs]# du -hs 2
2.7G	2
[root@unused gfs]# du -hs 2
718M	2

##### Note that size of 2 suddenly came down to 718M, though when last sampled it was 2.7G and no operations other than cp and rebalance was going on that file

[root@unused gfs]# ls /home/export/dist?
/home/export/dist1:

/home/export/dist2:
2

After size of file came down to 718M, cp on terminal 1 failed with:
cp: writing `1': Operation not permitted
cp: failed to extend `1': Operation not permitted
cp: closing `1': Operation not permitted

At around the same time, I saw migration to be complete.

Comment 6 shylesh 2014-09-19 12:51:25 UTC

verified on glusterfs-3.6.0.28-1

Comment 8 errata-xmlrpc 2014-09-22 19:45:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Note You need to log in before you can comment on or make changes to this bug.