Bug 1260485 - DHT: Deletion of file while it is migrating from one brick to another leads to data inconsistency
Summary: DHT: Deletion of file while it is migrating from one brick to another leads t...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-07 05:13 UTC by RajeshReddy
Modified: 2016-07-13 22:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-30 16:27:14 UTC
Embargoed:


Attachments (Terms of Use)

Description RajeshReddy 2015-09-07 05:13:02 UTC
Description of problem:
=========================
Deletion of file while it is migrating from one brick to another leads to data inconsistency 


Version-Release number of selected component (if applicable):
==================
glusterfs-api-3.7.1-12

Steps to Reproduce:
==================
1. Create a distributed volume and mount it on client using FUSE and create large files (around 3GB)
2.Make sure large files are eligible for migration as part of the rebalance 
3.While large file in migration, create hard links to file and then delete the file though deletion from the mount point succeeded after some time ls showing the deleted file but the content is not same as original one

Snippet of the log file:
=====================
[2015-09-04 09:58:02.367760] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 19
[2015-09-04 09:58:02.367797] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 20
[2015-09-04 10:00:45.418961] W [MSGID: 109023] [dht-rebalance.c:1265:dht_migrate_file] 0-dht10-dht: /data/file297: failed to perform unlink on dht10-client-0 (No such file or directory)
[2015-09-04 10:02:08.854872] I [MSGID: 109022] [dht-rebalance.c:1282:dht_migrate_file] 0-dht10-dht: completed migration of /data/file710 from subvolume dht10-client-0 to dht10-client-2
[2015-09-04 10:02:08.855612] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-dht10-dht: Rebalance is completed. Time taken is 246.00 secs
[2015-09-04 10:02:08.855654] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-dht10-dht: Files migrated: 1, size: 7958036640, lookups: 2, failures: 0, skipped: 1
[2015-09-04 10:02:08.855877] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7df5) [0x7f666e716df5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f666fd7f785] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f666fd7f609] ) 0-: received signum (15), shutting down
?unlink

setup:
==============
[root@rhs-client9 glusterfs]# gluster vol status dht10
Status of volume: dht10
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhs-client9.lab.eng.blr.redhat.com:/r
hs/brick1/dht10                             49181     0          Y       8840 
Brick rhs-client39.lab.eng.blr.redhat.com:/
rhs/brick10/dht10                           49173     0          Y       9645 
Brick rhs-client39.lab.eng.blr.redhat.com:/
rhs/brick1/dht10                            49174     0          Y       9663 
NFS Server on localhost                     2049      0          Y       32610
NFS Server on rhs-client39.lab.eng.blr.redh
at.com                                      2049      0          Y       18996
 
Task Status of Volume dht10
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : 5908c287-47df-4662-8f56-a93ed7241f41
Removed bricks:     
rhs-client9.lab.eng.blr.redhat.com:/rhs/brick1/dht10
Status               : completed

Comment 2 Susant Kumar Palai 2015-09-07 07:38:17 UTC
Please detail the bug as much as possible. There is no information about how many hardlinks were created, which file is deleted, data checksum before unlink and post unlink, what is the ls output from mount point. Also upload the sos_report. 

Susant

Comment 3 RajeshReddy 2015-09-07 08:14:03 UTC
File name is :file297 and created 4 links and out put of the ls from mount is given below, as i mentioned earlier though i created file around 2 GB after rebalance size become 400 Bytes

[root@dhcp37-55 data]# ls
create.sh  file296  file703  file709  file715  link2
file291    file297  file704  file710  file716  link3
file292    file298  file705  file711  file717  link7_1
file293    file299  file706  file712  file718  link7_2
file294    file701  file707  file713  file719
file295    file702  file708  file714  link1
[root@dhcp37-55 data]# cat file297
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
Modified while rebalance is in progress
[root@dhcp37-55 data]#

Comment 4 Susant Kumar Palai 2015-09-07 12:53:46 UTC
Rajesh,
  Here is the analysis after analysing the command history in client.

This is the script you run:
<script>
[root@dhcp37-55 data]# pwd
/mnt/dht10/data
[root@dhcp37-55 data]# cat create.sh 
for i in {1..10}
do
echo "Modified while rebalance is in progress" >> /mnt/dht10/data/file297
echo "Modified while rebalance is in progress" >> /mnt/dht10/data/file710
sleep 1
done
</script>

Here is important part of cmd history.
  174  cd data
>>>We are directory data

  175  ls
  176  pwd
  177  vi create.sh 
  178  ./create.sh 
>>> File297 would have been created with 10 message as "Modified while 
rebalance is in progress"

  179  tail -10 file710
  180  ls 
  181  ls -lrt 
  182  pwd
  183  ./create.sh 
  184  mv file710 file710_rename
  185  ls
  186  ls -lrt 
  187  cat file297
  188  mv file297 asdf
>>> The file in question is renamed to asdf

  189  echo "asfdsdaf" >> file297
>>> This will create a new file with data "asfdsdaf"

  190  echo "asfdsdaf" >> file710
  191  ln file297 link1
  192  ln file297 link2
  193  ln file297 link3
>>> Created hardlinks

  194  ls file710 link1
  195  ls file710 link7_1
  196  ln file710 link7_1
  197  ln file710 link7_2
  198  ls
  199  rm -rf file297
>>> File unlinked

  200  ls -lrt 
  201  mv file710 file710_rename
  202  ./create.sh 
>>> This will again create the file with 10 messages as "Modified while 
rebalance is in progress"

  203  ls -lrt
  204  cat file297
  205  cat link1
  206  mount | grep dht10
  207  cd /mnt/dht10
  208  ls
  209  cd data
  210  ls
  211  rpm -qa | grep glusterfs
  212  mkdir /mnt/ECVOL3_one
  213  mount -t nfs rhs-client9.lab.eng.blr.redhat.com:/ECVOL3/one /mnt/ECVOL3_one/
  214  cd /mnt/ECVOL3_one/
  215  ls
  216  tar -xvf linux-3.19.tar.gz 
  217  history 


In my analysis there is no data corruption happened. Am I missing anything?

Comment 5 RajeshReddy 2015-09-09 06:10:07 UTC
Repeated the same steps but could not reproduce the reported problem, but after deletion of data file, even link files are missing in the mount and link files are not available in the back end too


Note You need to log in before you can comment on or make changes to this bug.