1128155 – [Dist-geo-rep]: After snapshot restore in a goe-rep setup, the slave had directory entries which master didn't have on mount-point.

Bug 1128155 - [Dist-geo-rep]: After snapshot restore in a goe-rep setup, the slave had directory entries which master didn't have on mount-point.

Summary: [Dist-geo-rep]: After snapshot restore in a goe-rep setup, the slave had dir...

Keywords:
Status:	CLOSED DUPLICATE of bug 1127234
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-08 12:49 UTC by Vijaykumar Koppad
Modified:	2015-01-05 08:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Dht's mkdir algo creates directory on all sub-volumes first then the hashed one. Meanwhile if snapshot succeeds, the snapped volumes finds no directory at the mount because the directory is not available in the hashed sub-volume. Distributed geo-rep syncs changes from each sub-volume, hence creating the non-existent directory. Consequence: Few Directories show up on the snapped volume's geo-rep slave. Fix: The current changelog crash consistency approach blocks all entry fops. Hence no inconsistency found in case of mkdir. Result: Snapped master and its slave are consistent.
Clone Of:
Environment:
Last Closed:	2015-01-05 08:06:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vijaykumar Koppad 2014-08-08 12:49:57 UTC

Description of problem: After snapshot  restore in a goe-rep setup, the slave had directory entries which master didn't have on mount-point. But, master had those directories in the backend of the sub-volume which was non-hashed sub-volume for those directories. 


copying from the Bug 1090986
===============================================================================
Root Cause of the issue:

Due to snapshot and mkdir race, snapshot took the entry from the non-hashed and not hashed. As dht_readdirp fop depends on the entry being present on the hashed subvolume to be filtered, the entry was not shown on the mount point and also not healed for the master volume.

The changlogs are captured at the brick level at the master and since one of the bricks has the directory entry, geo-rep syncs the directory on to the slave. Now the inconsistency being the slave has an entry which the master does not have from the gluster mount point.
===============================================================================


Version-Release number of selected component (if applicable): glusterfs-cli-3.6.0.25-1


How reproducible: doesn't happen everytime.
 

Steps to Reproduce:
1.create and start a geo-rep relationship between master and slave.
2. create some data on master and let it sync to slave. 
3. then start creating symlinks to those and parallely create snapshot.(follow steps to create snapshot in a geo-rep setup)
4. then restore that snapshot (follow steps to restore snapshot in a geo-rep setup)
5. Check the number of files after it syncs data to slave.  


Actual results: the slave had directory entries which master didn't have on mount-point. 


Expected results: Master and slave should always be in sync. 


Additional info:

Comment 3 Aravinda VK 2015-01-05 08:06:04 UTC

Introduced barrier for all entry ops as part of bz 1127234, this issue will not happen since the root cause is same. Closing this bug since bz 1127234 is verified and closed.

*** This bug has been marked as a duplicate of bug 1127234 ***

Note You need to log in before you can comment on or make changes to this bug.