Bug 1028977 - Dist-geo-rep : too many files failed to sync to slave, when the use-tarssh was on.
Dist-geo-rep : too many files failed to sync to slave, when the use-tarssh wa...
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
consistency
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-11 07:04 EST by Vijaykumar Koppad
Modified: 2015-11-25 03:52 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2013-11-11 07:04:54 EST
Description of problem: too many files failed to sync to slave, when the use-tarssh was on. On slave side one of the top level directory didn't have any files under it, all files were failed to sync. But the directory itself was created on slave with same gfid as of master.  

The client logs on slave for a directory creation under that directory had these logs 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-11 07:01:03.979991] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 227: MKDIR() <gfid:15119bf7-6398-44de-a622-ef9a671f401a>/level10 => -1 (File exists)
[2013-11-11 09:46:28.940468] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 842308: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:46:37.826078] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 842835: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:46:45.211938] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 843360: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:46:53.629883] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 843885: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:02.622469] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 844416: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:11.255020] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 844941: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:21.150586] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 845471: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:28.770612] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 845996: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:37.047748] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 846522: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
[2013-11-11 09:47:46.051069] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 847053: MKDIR() <gfid:8adf8971-50df-4063-b6c1-10d2e04a9f9a>/level10 => -1 (Invalid argument)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

even for the files also it had the same logs. 


Version-Release number of selected component (if applicable):glusterfs-3.4.0.42rhs-1


How reproducible: Didn't try to reproduce


Steps to Reproduce:
1.create and start a geo-rep relationship between master and slave. 
2.start creating some 10K files on master  and let them sync, and delete them
3.do this some 2 times
4.start creating some 100K files on master using the command, "./crefi.py -n 1000 --multi -b 10 -d 10 --random --max=500K --min=10   /mnt/master"

Actual results: it failed to sync many files to slave


Expected results: geo-rep should sync all the files created on master to slave. 


Additional info:
Comment 2 Vijaykumar Koppad 2013-11-13 05:00:48 EST
This I have it again in the build glusterfs-3.4.0.43rhs-1. 

One of the top level directory on the master didn't get any file underneath.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[root@redmoon slave]# for i in {0..9}; do echo level0$i ; find /mnt/slave/level0$i | wc -l ; done
level00
1010
level01
1
level02
1010
level03
510
level04
1010
level05
1010
level06
1010
level07
1010
level08
1010
level09
1010
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

level01 doesn't have anyfiles underneath. 

For the files under that directory, logs on the slave geo-rep client logs says,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-13 07:24:25.716620] I [fuse-bridge.c:3515:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create
 the entry <gfid:a1669059-75d2-4af4-a5cf-beb2bb248548>/52832924%%G04QJWF5JJ with gfid (4c403d8e-a7d4-49a1-931c-fdf1ca
668e19): No such file or directory
[2013-11-13 07:24:25.716646] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 98475: MKNOD() <gfid:a1669059-75d2
-4af4-a5cf-beb2bb248548>/52832924%%G04QJWF5JJ => -1 (No such file or directory)
[2013-11-13 07:24:25.716826] W [dht-layout.c:179:dht_layout_search] 0-slave-dht: no subvolume for hash (value) = 2125
144236
[2013-11-13 07:24:25.716859] I [fuse-bridge.c:3515:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:a1669059-75d2-4af4-a5cf-beb2bb248548>/52832924%%8VU1FLJ2NS with gfid (c09661dc-6741-42cd-8a09-c626faca3b9d): No such file or directory
[2013-11-13 07:24:25.716885] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 98476: MKNOD() <gfid:a1669059-75d2-4af4-a5cf-beb2bb248548>/52832924%%8VU1FLJ2NS => -1 (No such file or directory)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


and for the directory underneath that level01 directory has logs on slave like

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-13 07:24:25.720915] W [dht-layout.c:179:dht_layout_search] 0-slave-dht: no subvolume for hash (value) = 1564227000
[2013-11-13 07:24:25.720947] I [fuse-bridge.c:3515:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:a1669059-75d2-4af4-a5cf-beb2bb248548>/level11 with gfid (2bff3c66-4876-4533-b489-392be875c0b9): Invalid argument
[2013-11-13 07:24:25.720974] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 98492: MKDIR() <gfid:a1669059-75d2-4af4-a5cf-beb2bb248548>/level11 => -1 (Invalid argument)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


If you check above find command output, even level03 has some files missing
for those missing files the logs on slave side are like 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-11-13 07:26:16.053560] W [dht-layout.c:179:dht_layout_search] 0-slave-dht: no subvolume for hash (value) = 4196023558
[2013-11-13 07:26:16.053584] I [fuse-bridge.c:3515:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:c466cb9a-d353-4591-a4a1-c2036752b17a>/52832933%%QQ4YEAD116 with gfid (5e395fe6-a228-4646-9702-2b1c8d7198b9): No such file or directory
[2013-11-13 07:26:16.053610] W [fuse-bridge.c:1627:fuse_err_cbk] 0-glusterfs-fuse: 176519: MKNOD() <gfid:c466cb9a-d353-4591-a4a1-c2036752b17a>/52832933%%QQ4YEAD116 => -1 (No such file or directory)
:
]
[2013-11-13 07:26:16.052668] I [fuse-bridge.c:3515:fuse_auxgfid_newentry_cbk] 0-fuse-aux-gfid-mount: failed to create the entry <gfid:c466cb9a-d353-4591-a4a1-c2036752b17a>/52832933%%0MKGJO2P73 with gfid (908f3ff8-7053-4d75-92b9-85f76d72064b): No such file or directory
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Whenever I have freshly created the volume and geo-rep session, I have never hit this. Only after few operations of files creation and deletion I have hit these kind of missing files. This could be result of some stale data.(just suspicion)
Comment 5 Aravinda VK 2015-11-25 03:50:55 EST
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.
Comment 6 Aravinda VK 2015-11-25 03:52:07 EST
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Note You need to log in before you can comment on or make changes to this bug.