Bug 1140183

Summary:	dist-geo-rep: Concurrent renames and node reboots results in slave having both source and destination of file with destination being 0 byte sticky file
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	M S Vishwanath Bhat <vbhat>
Component:	geo-replication	Assignee:	Kotresh HR <khiremat>
Status:	CLOSED ERRATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	aavati, annair, avishwan, bmohanra, csaba, khiremat, mzywusko, nlevinki, nsathyan, rhinduja, smanjara
Target Milestone:	---
Target Release:	RHGS 3.1.0
Hardware:	x86_64
OS:	Linux
Whiteboard:	node-failover, dht
Fixed In Version:	glusterfs-3.7.0-2.el6rhs	Doc Type:	Bug Fix
Doc Text:	Previously, concurrent renames and node reboots resulted in the slave having both the source and the destination of file, with destination being 0 byte sticky file. Due to this, Slave volume contained old data file and new file being zero byte sticky bit file. With this fix, the introduction of shared meta volume to correctly handle brick down scenarios along with enhancements in rename handling resolves this issue.	Story Points:	---
Clone Of:
Clones:	1196632 (view as bug list)		Environment:
Last Closed:	2015-07-29 04:35:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1196632, 1202842, 1223636

Description M S Vishwanath Bhat 2014-09-10 12:37:46 UTC

Description of problem:
The renames were being done from the master mount and meanwhile one of the node got rebooted. The node after coming back up, resulted in slave having more files than master. Slave actually had both source and destination name for few files. And the destination files were 0 byte sticky bit set, linkto files.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.28-1.el6rhs.x86_64

How reproducible:
Not sure. Seen once.

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 master and 2*2 slave.
2. Start renaming all the files from the master mount point.
find /mnt/master -type f -exec mv {} {}_renamed \;
3. Now reboot one of the "Active" nodes in the master

Actual results:
Slave has more number of files than the master.

[root@rhsauto029 ~]# find /mnt/master/ | wc -l
33494

[root@rhsauto029 ~]# find /mnt/slave/ | wc -l
33561

Also the both source and target files were present in the slave

[root@rhsauto029 ~]# ls -lh /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile
-rw-rw-r-- 1 root root 622 Jul 22 2011 /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile
[root@rhsauto029 ~]# ls -lh /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed
---------T 1 root root 0 Sep 9 05:40 /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed

As you can see, the destination file (*_renamed) has sticky bit set and has size zero.

The gfid of the files were also same.

[root@rhsauto029 ~]# getfattr -d -m . -n "glusterfs.gfid.string" /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile 2> /dev/null
# file: mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile
glusterfs.gfid.string="6d613003-a35a-489a-826f-14e4a964134f"

[root@rhsauto029 ~]# getfattr -d -m . -n "glusterfs.gfid.string" /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed 2> /dev/null
# file: mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed
glusterfs.gfid.string="6d613003-a35a-489a-826f-14e4a964134f"

Expected results:
All the renames should be synced to slave.

Additional info:

The changelog entried from the working-dir of the node which got rebooted.

[root@rhsauto048 7d805e4489617ef3f01d944e965cb309]# find . -type f | xargs grep "fa9725cf-e888-4b95-a33b-aa6bc6f83c62"
./.processed/CHANGELOG.1410264742:E fa9725cf-e888-4b95-a33b-aa6bc6f83c62 MKNOD 33280 0 0 bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c_renamed
./.processed/CHANGELOG.1410264742:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 NULL
./.processed/CHANGELOG.1410264742:D fa9725cf-e888-4b95-a33b-aa6bc6f83c62

The changelog entries from the working-dir of the node which was replica pair of the node which went down

[root@rhsauto049 cfdffea3581f40685f18a34384edc263]# find . -type f | xargs grep "fa9725cf-e888-4b95-a33b-aa6bc6f83c62"
./.processing/CHANGELOG.1410264734:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 SETATTR
./.processing/CHANGELOG.1410264719:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 NULL
./.processing/CHANGELOG.1410264719:E fa9725cf-e888-4b95-a33b-aa6bc6f83c62 RENAME bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c_renamed

I will keep the setup as it is for some time to debug.

Comment 2 Aravinda VK 2014-09-11 09:45:03 UTC

Root caused the issue.

Without node reboot, changelog entries are as follows.
touch f1
mv f1 f2 (Assuming f2 hashed subvolume is b2)

| log    | b1 | log    | b1 repl || log    | b2          | log    | b2 repl     |
| CREATE | f1 | CREATE | f1      || -      | -           | -      | -           |
| -      | f2 | -      | f2      || RENAME | f2 (sticky) | RENAME | f2 (sticky) |


When b2 replica is down during RENAME, and comes back

mv f1 f2 (Assuming f2 hashed subvolume is b2)

| log    | b1 | log    | b1 repl || log    | b2          | log   | b2 repl     |
| CREATE | f1 | CREATE | f1      || -      | -           | -     | -           |
| -      | f2 | -      | f2      || RENAME | f2 (sticky) |       |             |
| -      | f2 | -      | f2      || -      | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal

Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick.

Comment 3 Aravinda VK 2014-09-11 09:48:06 UTC

Reformatted.

Root caused the issue.

Without node reboot, changelog entries are as follows.
touch f1
mv f1 f2 (Assuming f2 hashed subvolume is b2)

Brick1
======
| log    | b1 | log    | b1 repl |
| CREATE | f1 | CREATE | f1      |
| -      | f2 | -      | f2      |

Brick2
======
| log    | b2          | log    | b2 repl     |
| -      | -           | -      | -           |
| RENAME | f2 (sticky) | RENAME | f2 (sticky) |

When b2 replica is down during RENAME, and comes back

mv f1 f2 (Assuming f2 hashed subvolume is b2)

Brick1
======
| log    | b1 | log    | b1 repl |
| CREATE | f1 | CREATE | f1      |
| -      | f2 | -      | f2      |
| -      | f2 | -      | f2      |

Brick2
======
| log    | b2          | log   | b2 repl     |
| -      | -           | -     | -           |
| RENAME | f2 (sticky) |       |             |
| -      | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal


Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick.

Comment 10 Rahul Hinduja 2015-07-17 14:28:30 UTC

Verified with the build: glusterfs-3.7.1-10.el6rhs.x86_64

While performing rename from master, brought down the active nodes. Passive nodes took over and after sync arequal matches for master and slave. Moving this bug to verified state. 

[root@wingo scripts]# arequal-checksum -p /mnt/master

Entry counts
Regular files   : 11706
Directories     : 883
Symbolic links  : 0
Other           : 0
Total           : 12589

Metadata checksums
Regular files   : 1174c
Directories     : 24c719
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e5f524b8a52abf1734a276d516762e4e
Directories     : 5c4b693641626139
Symbolic links  : 0
Other           : 0
Total           : 8d1c3b5bf23ef060
[root@wingo scripts]# 


[root@wingo scripts]# arequal-checksum -p /mnt/slave

Entry counts
Regular files   : 11706
Directories     : 883
Symbolic links  : 0
Other           : 0
Total           : 12589

Metadata checksums
Regular files   : 1174c
Directories     : 24c719
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e5f524b8a52abf1734a276d516762e4e
Directories     : 5c4b693641626139
Symbolic links  : 0
Other           : 0
Total           : 8d1c3b5bf23ef060
[root@wingo scripts]# 

[root@wingo scripts]# ls /mnt/slave
linux-3.4.2  linux-3.4.2.tar.bz2_renamed
[root@wingo scripts]# ls /mnt/slave/linux-3.4.2
arch             CREDITS_renamed  Kbuild_renamed   MAINTAINERS_renamed  README_renamed
COPYING_renamed  Documentation    Kconfig_renamed  Makefile_renamed     REPORTING-BUGS_renamed
[root@wingo scripts]#

Comment 11 Bhavana 2015-07-27 10:59:58 UTC

Hi Kotresh,

The doc text is updated. please review the same and sign off if it looks ok.

Comment 13 errata-xmlrpc 2015-07-29 04:35:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Comment 14 Kotresh HR 2015-08-11 05:29:19 UTC

Doc Text is fine.