1147627 – dist-geo-rep: Few symlinks not synced to slave after an Active node got rebooted

Bug 1147627 - dist-geo-rep: Few symlinks not synced to slave after an Active node got rebooted

Summary: dist-geo-rep: Few symlinks not synced to slave after an Active node got rebooted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	consistency, node-failover
Depends On:
Blocks:	1202842 1223636
TreeView+	depends on / blocked

Reported:	2014-09-29 15:27 UTC by M S Vishwanath Bhat
Modified:	2016-06-01 01:56 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-07-29 04:36:26 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description M S Vishwanath Bhat 2014-09-29 15:27:26 UTC

Description of problem:
Few of the symlinks did not sync to the slave even after 12 hours. I was untarring the linux kernel and one of the Active node got rebooted while the files were being synced. Few symlinks are actually synced as an empty file (not linkto file)

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.29-1.el6rhs.x86_64

How reproducible:
Hit once. Not sure if reproducible.

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master and slave.
2. Now start untarring few files to the slave.
3. Take down a Passive node and bring it back after some time. Wait for the self-heal to sync all the files to other replica node.
4. Now bring down the Active node and bring it back after some time

Actual results:
Few symlinks are not synced properly.

Expected results:
All files should be synced properly and their arequal checksum should match.

Additional info:
I have provided the access to the dev team.

Comment 2 Aravinda VK 2014-11-12 11:06:52 UTC

This bug is not reproducible yet. Kotresh is working on the new logic to prevent a node becoming active when it gets rebooted. With this new logic, previously active node if rebooted will not become active when it comes back online. This helps to minimize the race with node reboots.

Deferring this bug from 3.0.3, we will plan to fix this issue in future releases.

Comment 7 Rahul Hinduja 2015-07-17 08:13:33 UTC

Verified with build: glusterfs-3.7.1-10.el6rhs.x86_64

Broughtdown passive/active node while data creation including hardlink/softlink were in progress. Eventually after data completion, the arequal at master and slave is same. Moving this bug to verified state.

[root@wingo scripts]# arequal-checksum -p /mnt/master

Entry counts
Regular files   : 11003
Directories     : 800
Symbolic links  : 172
Other           : 0
Total           : 11975

Metadata checksums
Regular files   : 47c0cc
Directories     : cd9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : a362688f8607e3b2eeb401f81892ff9e
Directories     : 271a530a29275034
Symbolic links  : 3638040000
Other           : 0
Total           : 6acc3a4b8fb64c18
[root@wingo scripts]# 


[root@wingo scripts]# arequal-checksum -p /mnt/slave

Entry counts
Regular files   : 11003
Directories     : 800
Symbolic links  : 172
Other           : 0
Total           : 11975

Metadata checksums
Regular files   : 47c0cc
Directories     : cd9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : a362688f8607e3b2eeb401f81892ff9e
Directories     : 271a530a29275034
Symbolic links  : 3638040000
Other           : 0
Total           : 6acc3a4b8fb64c18
[root@wingo scripts]#

Comment 9 errata-xmlrpc 2015-07-29 04:36:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.