Bug 1136718 - DHT + AFR :- File is truncated to lower size, while rename and self heal is in progress and copied/renamed that file to another Directory
Summary: DHT + AFR :- File is truncated to lower size, while rename and self heal is i...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Prasad Desala
URL:
Whiteboard: dht-data-loss, dht-3.2.0-proposed
Depends On:
Blocks: 1087818
TreeView+ depends on / blocked
 
Reported: 2014-09-03 07:09 UTC by Rachana Patel
Modified: 2018-04-16 18:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The afr self-heal can leave behind a partially healed file if the brick containing afr self-heal source file goes down in the middle of heal operation. If this partially healed file is migrated before the brick that was down comes online again, the migrated file would have incorrect data and the original file would be deleted.
Clone Of:
Environment:
Last Closed: 2018-04-16 18:05:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Rachana Patel 2014-09-03 07:09:08 UTC
Description of problem:
=======================
File is truncated to lower size, while rename and self heal is in progress and copied/renamed that file to another Directory

Version-Release number of selected component (if applicable):
=============================================================
3.6.0.27-6.el6rhs.x86_64

How reproducible:
=================
intermittent

Steps to Reproduce:
==================
1. created 9X2 dist-rep volume using 4 server
2. fuse mounted that volume on 4 server and nfs mount on 2 server
3. killed one brick from each replica pair and created few files from fuse mount. 
62 files zero size, 62 files having size in mb
and created hard link for each file
4. start all brick using start force option. 
5. start renaming all files (from all 6 mount) in loop.
6. add-brick and did rebalance.
7. rebalance is done then kill brick on one of the server using pkill glusterfsd
8.  start all brick using start force option.
9. terminate rename loop. start rename loop again
10. repeat steps 6 to 8 for 2-3 times and killing bricks on different server
(usually I kill one brick from each replica pair , only once i brought replica pair down)
-- no data loss so far
11. now brinkg down  one brick from each replica pair  (dht16 and dht19)
12. keep renaming files from mount point
13. copy all file from all four FUSE mount  to one directory and then move files to some other Directory
[root@dht16 ks]#  cp * -f  test1/ ; for j in {1..62}; do mv -f test1/zero$j-* test2/ ; mv -f test1/mb$j-* test2/ ; mv -f test1/zeroln$j-* test2/ ;  mv -f  test1/mbln$j-* test2/ ; done
after a while again run same command from all FUSE mount as below
cp * -f  test1/ ;
for j in {1..62}; do mv -f test1/zero$j-* test2/ ; mv -f test1/mb$j-* test2/ ; mv -f test1/zeroln$j-* test2/ ;  mv -f  test1/mbln$j-* test2/ ; done

one file is tructed from 41943040 to 31981568
[root@dht16 test2]# md5sum /mnt/ks/mb1-* /mnt/ks/test1/mb1-* /mnt/ks/test2/mb1-*
b76bd9df3e5998d4959b56dcb7602a6e  /mnt/ks/mb1-85
md5sum: /mnt/ks/test1/mb1-*: No such file or directory
ffb6ac6fb57543748103f77b4df93215  /mnt/ks/test2/mb1-30
b76bd9df3e5998d4959b56dcb7602a6e  /mnt/ks/test2/mb1-31
[root@dht16 test2]# ls -li  /mnt/ks/mb1-* /mnt/ks/test1/mb1-* /mnt/ks/test2/mb1-*
ls: cannot access /mnt/ks/test1/mb1-*: No such file or directory
11123811650566741328 -rw-r--r-- 2 root root 41943040 Aug 31 09:34 /mnt/ks/mb1-85
10308136675813588637 -rw-r--r-- 1 root root 31981568 Aug 31 10:20 /mnt/ks/test2/mb1-30   <-----------
11266794036804784411 -rw-r--r-- 1 root root 41943040 Aug 31 10:21 /mnt/ks/test2/mb1-31

Actual results:
===============
one file is tructed from 41943040 to 31981568

Comment 4 Nithya Balachandran 2016-07-21 13:16:09 UTC
Assigning this to Pranith as it is dependant on having some AFR fixes.


Note You need to log in before you can comment on or make changes to this bug.