Description of problem: The renames were being done from the master mount and meanwhile one of the node got rebooted. The node after coming back up, resulted in slave having more files than master. Slave actually had both source and destination name for few files. And the destination files were 0 byte sticky bit set, linkto files. Version-Release number of selected component (if applicable): glusterfs-3.6.0.28-1.el6rhs.x86_64 How reproducible: Not sure. Seen once. Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 master and 2*2 slave. 2. Start renaming all the files from the master mount point. find /mnt/master -type f -exec mv {} {}_renamed \; 3. Now reboot one of the "Active" nodes in the master Actual results: Slave has more number of files than the master. [root@rhsauto029 ~]# find /mnt/master/ | wc -l 33494 [root@rhsauto029 ~]# find /mnt/slave/ | wc -l 33561 Also the both source and target files were present in the slave [root@rhsauto029 ~]# ls -lh /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile -rw-rw-r-- 1 root root 622 Jul 22 2011 /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile [root@rhsauto029 ~]# ls -lh /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed ---------T 1 root root 0 Sep 9 05:40 /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed As you can see, the destination file (*_renamed) has sticky bit set and has size zero. The gfid of the files were also same. [root@rhsauto029 ~]# getfattr -d -m . -n "glusterfs.gfid.string" /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile 2> /dev/null # file: mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile glusterfs.gfid.string="6d613003-a35a-489a-826f-14e4a964134f" [root@rhsauto029 ~]# getfattr -d -m . -n "glusterfs.gfid.string" /mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed 2> /dev/null # file: mnt/slave/linux-3.0/drivers/media/dvb/mantis/Makefile_renamed glusterfs.gfid.string="6d613003-a35a-489a-826f-14e4a964134f" Expected results: All the renames should be synced to slave. Additional info: The changelog entried from the working-dir of the node which got rebooted. [root@rhsauto048 7d805e4489617ef3f01d944e965cb309]# find . -type f | xargs grep "fa9725cf-e888-4b95-a33b-aa6bc6f83c62" ./.processed/CHANGELOG.1410264742:E fa9725cf-e888-4b95-a33b-aa6bc6f83c62 MKNOD 33280 0 0 bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c_renamed ./.processed/CHANGELOG.1410264742:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 NULL ./.processed/CHANGELOG.1410264742:D fa9725cf-e888-4b95-a33b-aa6bc6f83c62 The changelog entries from the working-dir of the node which was replica pair of the node which went down [root@rhsauto049 cfdffea3581f40685f18a34384edc263]# find . -type f | xargs grep "fa9725cf-e888-4b95-a33b-aa6bc6f83c62" ./.processing/CHANGELOG.1410264734:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 SETATTR ./.processing/CHANGELOG.1410264719:M fa9725cf-e888-4b95-a33b-aa6bc6f83c62 NULL ./.processing/CHANGELOG.1410264719:E fa9725cf-e888-4b95-a33b-aa6bc6f83c62 RENAME bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c bafa54a2-d7b6-4124-a0c6-6e1e9bee8442%2Fmantis_dma.c_renamed I will keep the setup as it is for some time to debug.
Root caused the issue. Without node reboot, changelog entries are as follows. touch f1 mv f1 f2 (Assuming f2 hashed subvolume is b2) | log | b1 | log | b1 repl || log | b2 | log | b2 repl | | CREATE | f1 | CREATE | f1 || - | - | - | - | | - | f2 | - | f2 || RENAME | f2 (sticky) | RENAME | f2 (sticky) | When b2 replica is down during RENAME, and comes back mv f1 f2 (Assuming f2 hashed subvolume is b2) | log | b1 | log | b1 repl || log | b2 | log | b2 repl | | CREATE | f1 | CREATE | f1 || - | - | - | - | | - | f2 | - | f2 || RENAME | f2 (sticky) | | | | - | f2 | - | f2 || - | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick.
Reformatted. Root caused the issue. Without node reboot, changelog entries are as follows. touch f1 mv f1 f2 (Assuming f2 hashed subvolume is b2) Brick1 ====== | log | b1 | log | b1 repl | | CREATE | f1 | CREATE | f1 | | - | f2 | - | f2 | Brick2 ====== | log | b2 | log | b2 repl | | - | - | - | - | | RENAME | f2 (sticky) | RENAME | f2 (sticky) | When b2 replica is down during RENAME, and comes back mv f1 f2 (Assuming f2 hashed subvolume is b2) Brick1 ====== | log | b1 | log | b1 repl | | CREATE | f1 | CREATE | f1 | | - | f2 | - | f2 | | - | f2 | - | f2 | Brick2 ====== | log | b2 | log | b2 repl | | - | - | - | - | | RENAME | f2 (sticky) | | | | - | f2 (sticky) | MKNOD | f2 (sticky) | <-- self heal Once b2 replica comes back, if it becomes active then processing RENAME is missed, instead it creates sticky file in Slave since MKNOD is recorded in that brick.
Verified with the build: glusterfs-3.7.1-10.el6rhs.x86_64 While performing rename from master, brought down the active nodes. Passive nodes took over and after sync arequal matches for master and slave. Moving this bug to verified state. [root@wingo scripts]# arequal-checksum -p /mnt/master Entry counts Regular files : 11706 Directories : 883 Symbolic links : 0 Other : 0 Total : 12589 Metadata checksums Regular files : 1174c Directories : 24c719 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e5f524b8a52abf1734a276d516762e4e Directories : 5c4b693641626139 Symbolic links : 0 Other : 0 Total : 8d1c3b5bf23ef060 [root@wingo scripts]# [root@wingo scripts]# arequal-checksum -p /mnt/slave Entry counts Regular files : 11706 Directories : 883 Symbolic links : 0 Other : 0 Total : 12589 Metadata checksums Regular files : 1174c Directories : 24c719 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : e5f524b8a52abf1734a276d516762e4e Directories : 5c4b693641626139 Symbolic links : 0 Other : 0 Total : 8d1c3b5bf23ef060 [root@wingo scripts]# [root@wingo scripts]# ls /mnt/slave linux-3.4.2 linux-3.4.2.tar.bz2_renamed [root@wingo scripts]# ls /mnt/slave/linux-3.4.2 arch CREDITS_renamed Kbuild_renamed MAINTAINERS_renamed README_renamed COPYING_renamed Documentation Kconfig_renamed Makefile_renamed REPORTING-BUGS_renamed [root@wingo scripts]#
Hi Kotresh, The doc text is updated. please review the same and sign off if it looks ok.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html
Doc Text is fine.