Bug 1140183
Summary: | dist-geo-rep: Concurrent renames and node reboots results in slave having both source and destination of file with destination being 0 byte sticky file | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | M S Vishwanath Bhat <vbhat> | |
Component: | geo-replication | Assignee: | Kotresh HR <khiremat> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.0 | CC: | aavati, annair, avishwan, bmohanra, csaba, khiremat, mzywusko, nlevinki, nsathyan, rhinduja, smanjara | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.1.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | node-failover, dht | |||
Fixed In Version: | glusterfs-3.7.0-2.el6rhs | Doc Type: | Bug Fix | |
Doc Text: |
Previously, concurrent renames and node reboots resulted in the slave having both the source and the destination of file, with destination being 0 byte sticky file. Due to this, Slave volume contained old data file and new file being zero byte sticky bit file. With this fix, the introduction of shared meta volume to correctly handle brick down scenarios along with enhancements in rename handling resolves this issue.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1196632 (view as bug list) | Environment: | ||
Last Closed: | 2015-07-29 04:35:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1196632, 1202842, 1223636 |
Description
M S Vishwanath Bhat
2014-09-10 12:37:46 UTC
Root caused the issue. Without node reboot, changelog entries are as follows. touch f1 mv f1 f2 (Assuming f2 hashed subvolume is b2) | log | b1 | log | b1 repl || log | b2 | log | b2 repl | | CREATE | f1 | CREATE | f1 || - | - | - | - | | - | f2 | - | f2 || RENAME | f2 (sticky) | RENAME | f2 (sticky) | When b2 replica is down during RENAME, and comes back mv f1 f2 (Assuming f2 hashed subvolume is b2) | log | b1 | log | b1 repl || log | b2 | log | b2 repl | | CREATE | f1 | CREATE | f1 || - | - | - | - | | - | f2 | - | f2 || RENAME | f2 (sticky) | | | | - | f2 | - | f2 || - | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick. Reformatted. Root caused the issue. Without node reboot, changelog entries are as follows. touch f1 mv f1 f2 (Assuming f2 hashed subvolume is b2) Brick1 ====== | log | b1 | log | b1 repl | | CREATE | f1 | CREATE | f1 | | - | f2 | - | f2 | Brick2 ====== | log | b2 | log | b2 repl | | - | - | - | - | | RENAME | f2 (sticky) | RENAME | f2 (sticky) | When b2 replica is down during RENAME, and comes back mv f1 f2 (Assuming f2 hashed subvolume is b2) Brick1 ====== | log | b1 | log | b1 repl | | CREATE | f1 | CREATE | f1 | | - | f2 | - | f2 | | - | f2 | - | f2 | Brick2 ====== | log | b2 | log | b2 repl | | - | - | - | - | | RENAME | f2 (sticky) | | | | - | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick. Verified with the build: glusterfs-3.7.1-10.el6rhs.x86_64 While performing rename from master, brought down the active nodes. Passive nodes took over and after sync arequal matches for master and slave. Moving this bug to verified state. [root@wingo scripts]# arequal-checksum -p /mnt/master Entry counts Regular files : 11706 Directories : 883 Symbolic links : 0 Other : 0 Total : 12589 Metadata checksums Regular files : 1174c Directories : 24c719 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e5f524b8a52abf1734a276d516762e4e Directories : 5c4b693641626139 Symbolic links : 0 Other : 0 Total : 8d1c3b5bf23ef060 [root@wingo scripts]# [root@wingo scripts]# arequal-checksum -p /mnt/slave Entry counts Regular files : 11706 Directories : 883 Symbolic links : 0 Other : 0 Total : 12589 Metadata checksums Regular files : 1174c Directories : 24c719 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e5f524b8a52abf1734a276d516762e4e Directories : 5c4b693641626139 Symbolic links : 0 Other : 0 Total : 8d1c3b5bf23ef060 [root@wingo scripts]# [root@wingo scripts]# ls /mnt/slave linux-3.4.2 linux-3.4.2.tar.bz2_renamed [root@wingo scripts]# ls /mnt/slave/linux-3.4.2 arch CREDITS_renamed Kbuild_renamed MAINTAINERS_renamed README_renamed COPYING_renamed Documentation Kconfig_renamed Makefile_renamed REPORTING-BUGS_renamed [root@wingo scripts]# Hi Kotresh, The doc text is updated. please review the same and sign off if it looks ok. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html Doc Text is fine. |