Description of problem: ========================= Deletion of file while it is migrating from one brick to another leads to data inconsistency Version-Release number of selected component (if applicable): ================== glusterfs-api-3.7.1-12 Steps to Reproduce: ================== 1. Create a distributed volume and mount it on client using FUSE and create large files (around 3GB) 2.Make sure large files are eligible for migration as part of the rebalance 3.While large file in migration, create hard links to file and then delete the file though deletion from the mount point succeeded after some time ls showing the deleted file but the content is not same as original one Snippet of the log file: ===================== [2015-09-04 09:58:02.367760] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 19 [2015-09-04 09:58:02.367797] I [dht-rebalance.c:1764:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 20 [2015-09-04 10:00:45.418961] W [MSGID: 109023] [dht-rebalance.c:1265:dht_migrate_file] 0-dht10-dht: /data/file297: failed to perform unlink on dht10-client-0 (No such file or directory) [2015-09-04 10:02:08.854872] I [MSGID: 109022] [dht-rebalance.c:1282:dht_migrate_file] 0-dht10-dht: completed migration of /data/file710 from subvolume dht10-client-0 to dht10-client-2 [2015-09-04 10:02:08.855612] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-dht10-dht: Rebalance is completed. Time taken is 246.00 secs [2015-09-04 10:02:08.855654] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-dht10-dht: Files migrated: 1, size: 7958036640, lookups: 2, failures: 0, skipped: 1 [2015-09-04 10:02:08.855877] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7df5) [0x7f666e716df5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f666fd7f785] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f666fd7f609] ) 0-: received signum (15), shutting down ?unlink setup: ============== [root@rhs-client9 glusterfs]# gluster vol status dht10 Status of volume: dht10 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client9.lab.eng.blr.redhat.com:/r hs/brick1/dht10 49181 0 Y 8840 Brick rhs-client39.lab.eng.blr.redhat.com:/ rhs/brick10/dht10 49173 0 Y 9645 Brick rhs-client39.lab.eng.blr.redhat.com:/ rhs/brick1/dht10 49174 0 Y 9663 NFS Server on localhost 2049 0 Y 32610 NFS Server on rhs-client39.lab.eng.blr.redh at.com 2049 0 Y 18996 Task Status of Volume dht10 ------------------------------------------------------------------------------ Task : Remove brick ID : 5908c287-47df-4662-8f56-a93ed7241f41 Removed bricks: rhs-client9.lab.eng.blr.redhat.com:/rhs/brick1/dht10 Status : completed
Please detail the bug as much as possible. There is no information about how many hardlinks were created, which file is deleted, data checksum before unlink and post unlink, what is the ls output from mount point. Also upload the sos_report. Susant
File name is :file297 and created 4 links and out put of the ls from mount is given below, as i mentioned earlier though i created file around 2 GB after rebalance size become 400 Bytes [root@dhcp37-55 data]# ls create.sh file296 file703 file709 file715 link2 file291 file297 file704 file710 file716 link3 file292 file298 file705 file711 file717 link7_1 file293 file299 file706 file712 file718 link7_2 file294 file701 file707 file713 file719 file295 file702 file708 file714 link1 [root@dhcp37-55 data]# cat file297 Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress Modified while rebalance is in progress [root@dhcp37-55 data]#
Rajesh, Here is the analysis after analysing the command history in client. This is the script you run: <script> [root@dhcp37-55 data]# pwd /mnt/dht10/data [root@dhcp37-55 data]# cat create.sh for i in {1..10} do echo "Modified while rebalance is in progress" >> /mnt/dht10/data/file297 echo "Modified while rebalance is in progress" >> /mnt/dht10/data/file710 sleep 1 done </script> Here is important part of cmd history. 174 cd data >>>We are directory data 175 ls 176 pwd 177 vi create.sh 178 ./create.sh >>> File297 would have been created with 10 message as "Modified while rebalance is in progress" 179 tail -10 file710 180 ls 181 ls -lrt 182 pwd 183 ./create.sh 184 mv file710 file710_rename 185 ls 186 ls -lrt 187 cat file297 188 mv file297 asdf >>> The file in question is renamed to asdf 189 echo "asfdsdaf" >> file297 >>> This will create a new file with data "asfdsdaf" 190 echo "asfdsdaf" >> file710 191 ln file297 link1 192 ln file297 link2 193 ln file297 link3 >>> Created hardlinks 194 ls file710 link1 195 ls file710 link7_1 196 ln file710 link7_1 197 ln file710 link7_2 198 ls 199 rm -rf file297 >>> File unlinked 200 ls -lrt 201 mv file710 file710_rename 202 ./create.sh >>> This will again create the file with 10 messages as "Modified while rebalance is in progress" 203 ls -lrt 204 cat file297 205 cat link1 206 mount | grep dht10 207 cd /mnt/dht10 208 ls 209 cd data 210 ls 211 rpm -qa | grep glusterfs 212 mkdir /mnt/ECVOL3_one 213 mount -t nfs rhs-client9.lab.eng.blr.redhat.com:/ECVOL3/one /mnt/ECVOL3_one/ 214 cd /mnt/ECVOL3_one/ 215 ls 216 tar -xvf linux-3.19.tar.gz 217 history In my analysis there is no data corruption happened. Am I missing anything?
Repeated the same steps but could not reproduce the reported problem, but after deletion of data file, even link files are missing in the mount and link files are not available in the back end too