Bug 1120108

Summary: [Dist-geo-rep] after restore of hardlink snapshot in geo-rep setup, few files are zero byte files on slave.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED DUPLICATE QA Contact: amainkar
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: aavati, avishwan, csaba, david.macdonald, nlevinki, nsathyan, smohan, ssamanta, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-05 08:07:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2014-07-16 09:57:44 UTC
Description of problem: after restore of hardlink snapshot in geo-rep setup,(its the restoration of the snapshot taken during creation of hardlinks) few files are zero byte files on slave. 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# diff master_md5sum slave_md5sum
955d954
< 41c36d250d4cf41f578fbb355d8cde36  ./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%ZNI4V9BTM3
1363,1364d1361
< 5efee195089b1faa79e4824b3aed7bb9  ./thread3/level04/level14/level24/level34/level44/53c3caa1%%JMVYT3SIL8
< 5efee195089b1faa79e4824b3aed7bb9  ./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%7NDUF7STPD
3012a3010,3012
> d41d8cd98f00b204e9800998ecf8427e  ./thread3/level04/level14/level24/level34/level44/53c3caa1%%JMVYT3SIL8
> d41d8cd98f00b204e9800998ecf8427e  ./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%7NDUF7STPD
> d41d8cd98f00b204e9800998ecf8427e  ./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%ZNI4V9BTM3

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
above is the list of files which have empty files on the slave side. Two of them are linked to same data, but one is the stale link file, which means it was synced to slave as separate file, not as hardlink.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# ls -i /mnt/master/./thread3/level04/level14/level24/level34/level44/53c3caa1%%JMVYT3SIL8
9305855428730845832 /mnt/master/./thread3/level04/level14/level24/level34/level44/53c3caa1%%JMVYT3SIL8

# ls -i /mnt/master/./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%7NDUF7STPD
9305855428730845832 /mnt/master/./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%7NDUF7STPD
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

from above, /mnt/master/./thread3/level04/level14/level24/level34/level44/hardlink_to_files/53c3d433%%ZNI4V9BTM3 ..  is the stale hardlink on slave. 

If we check for this file, it has sticky bit entry in one sub-volume and normal entry in other sub-volume. The sticky bit file has the entry in XSYNC-CHANGELOG as 
============================================================================
# grep "29cad89f-8ffc-4f61-ad55-075f0358af84" *
E 29cad89f-8ffc-4f61-ad55-075f0358af84 MKNOD 33280 4788 31741 98aef124-f945-4fd4-8d51-3947520e8034%2F53c3d433%25%25ZNI4V9BTM3
M 29cad89f-8ffc-4f61-ad55-075f0358af84 SETATTR
D 29cad89f-8ffc-4f61-ad55-075f0358af84
============================================================================
and normal file has the entries in XSYNC-CHANGELOG as 
============================================================================
# grep "29cad89f-8ffc-4f61-ad55-075f0358af84" *
XSYNC-CHANGELOG.1405415795:E 29cad89f-8ffc-4f61-ad55-075f0358af84 LINK 57fd3d92-7b4e-469d-8c94-59e1b5907bcb%2F53c3caa1%25%25U6YJSI5IAB
XSYNC-CHANGELOG.1405415795:D 29cad89f-8ffc-4f61-ad55-075f0358af84
XSYNC-CHANGELOG.1405415795:E 29cad89f-8ffc-4f61-ad55-075f0358af84 LINK 98aef124-f945-4fd4-8d51-3947520e8034%2F53c3d433%25%25ZNI4V9BTM3
XSYNC-CHANGELOG.1405415795:D 29cad89f-8ffc-4f61-ad55-075f0358af84
XSYNC-CHANGELOG.1405415897:E 29cad89f-8ffc-4f61-ad55-075f0358af84 LINK 57fd3d92-7b4e-469d-8c94-59e1b5907bcb%2F53c3caa1%25%25U6YJSI5IAB
XSYNC-CHANGELOG.1405415897:D 29cad89f-8ffc-4f61-ad55-075f0358af84
XSYNC-CHANGELOG.1405415897:E 29cad89f-8ffc-4f61-ad55-075f0358af84 LINK 98aef124-f945-4fd4-8d51-3947520e8034%2F53c3d433%25%25ZNI4V9BTM3
XSYNC-CHANGELOG.1405415897:D 29cad89f-8ffc-4f61-ad55-075f0358af84
============================================================================

First of all, the stick bit file shouldn't have entry in XSYNC-CHANGELOG. Since there is entry for the sticky bit, there will be possibility that, rsync picks up that gfid and syncs data, which is empty in that sub-volume. 


one possible explanation  for the sticky bit entry in the XSYNC-CHANGELOG
==============================================================================
while creating hardlinks to files on master, if snapshot is taken, there could be possibility that, some of the link files fail to get link-to xattr in the backend, because setting link-to xattr in DHT is not atomic. When this snapshot is restored in a geo-rep setup, geo-rep xsync crawl does crawling in the backend and captures this sticky bit in XSYNC-CHANGELOG (since we only ignore those files which are sticky bit and has link-to xattr).
===============================================================================



Version-Release number of selected component (if applicable): glusterfs-3.6.0.24-1.el6rhs


How reproducible: didn't try to reproduce. 


Steps to Reproduce:
1.create and start a geo-rep relationship between master and slave.
2. create some data on master and let it sync to slave. 
3. then start creating hardlinks to those and parallely create snapshot.(follow steps to create snapshot in a geo-rep setup)
4. then restore that snapshot (follow steps to restore snapshot in a geo-rep setup)
5. Check the md5sum of all the files after it syncs data to slave.  


Actual results: after restore of hardlink snapshot in geo-rep setup, few files are zero byte files on slave.


Expected results: This is data loss on slave, which is not-acceptable.


Additional info:

Comment 2 Venky Shankar 2014-07-17 12:26:45 UTC
This is possibly due to snapshot of the brick being taken at the point where the linkfile (distribute's pointer file) being in "half-baked" state: link-to xattr not being set as the creation of the linkfile is a create and an xattr set. The snapped volume (the backend brick) would have a missing linkto xattr.

On a volume restore (snapshot restore and geo-rep init), gsyncd starts crawling the brick (xsync mode). There's logic in the file system crawl to ignore these pointer files which are identified via mode (01000) and link-to xattr. Since, the snapped volume did not have this xattr, gsyncd queued it as a file to be replicated. On the other hand, the subvolume on which the actual hardlink files were present, they too would be possible candidates for replication. Depending on which geo-rep daemon would win the race to start replicating the selected entities, we would end up with file on the slave being replicated as a normal file and not as a hardlink.

[NOTE: On a lookup, DHT would fix the xattr and things would be okay, but gsyncd crawls the backend, thereby not giving a change for a lookup() from the fuse mount]

The presence of zero-byte file on the slave could possibly be a side effect of the above when the sticky bit files shows up on the mount (hence rsync truncating the file on the slave).

I've asked vkoppad to reproduce this so that the xattrs could be inspected (before a lookup is trigerred).

Vkoppad,

Please let me know if you're able to reproduce this and keep the setup intact.

Comment 3 Vijaykumar Koppad 2014-07-17 12:38:57 UTC
sosreport of slaves and master nodes @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1120108/

Comment 4 Vijaykumar Koppad 2014-07-21 07:11:00 UTC
I tried 2 times on the build glusterfs-3.6.0.24-1.el6rhs, and I was not able to reproduce a situation wherein there are sticky bit files in the backend with no linkto xattr.

Comment 5 Vijaykumar Koppad 2014-07-21 07:13:33 UTC
Steps I followed, 

1.create and start a geo-rep relationship between master and slave.
2. create some data on master and let it sync to slave. 
3. then start creating hardlinks to those and parallely create snapshot.(follow steps to create snapshot in a geo-rep setup)
4. then restore that snapshot (follow steps to restore snapshot in a geo-rep setup)
5. Before starting master or slave volume, check for sticky bit files with no linkto xattr in the backend.

Comment 6 Satish Mohan 2014-07-25 18:02:04 UTC
Not able to reproduce the race condition, under investigation

Comment 7 Vijaykumar Koppad 2014-07-31 11:03:33 UTC
I was able to reproduce the issue with following steps in the build glusterfs-3.6.0.25-1, 

1. create and start a geo-rep relationship between master and slave.
2. create some data on master and let it sync to slave. 
3. then start creating hardlinks to those and parallely create snapshot.(follow steps to create snapshot in a geo-rep setup)
4. then restore that snapshot (follow steps to restore snapshot in a geo-rep setup)
5. Check the md5sum of all the files after it syncs data to slave.

Comment 8 Venky Shankar 2014-07-31 11:38:49 UTC
(In reply to Vijaykumar Koppad from comment #7)
> I was able to reproduce the issue with following steps in the build
> glusterfs-3.6.0.25-1, 
> 
> 1. create and start a geo-rep relationship between master and slave.
> 2. create some data on master and let it sync to slave. 
> 3. then start creating hardlinks to those and parallely create
> snapshot.(follow steps to create snapshot in a geo-rep setup)
> 4. then restore that snapshot (follow steps to restore snapshot in a geo-rep
> setup)
> 5. Check the md5sum of all the files after it syncs data to slave.

Hostnames where I can look into?

Comment 12 Aravinda VK 2015-01-05 08:07:01 UTC
Introduced barrier for all entry ops as part of bz 1127234, this issue will not happen since the root cause is same. Closing this bug since bz 1127234 is verified and closed.

*** This bug has been marked as a duplicate of bug 1127234 ***