Hide Forgot
Description of problem: Created geo-rep session between 282 distributed-replicated master and slave volume started geo-rep session in between them. But I did a very simple glusterfs untar and the files don't get synced. It get's into a infinite loop as can be seen from the logs. I am using the latest version of glusterfs. This used to work before. This is a regression. Version-Release number of selected component (if applicable): [root@ramanujan ~]# rpm -q glusterfs glusterfs-3.4.0.23rhs-1.el6rhs.x86_64 How reproducible: I did this three times and hit all three times. And although I have confirmed it in 23rhs version, I suspect it's present even in version 22rhs-2. Steps to Reproduce: 1. Create and start a 2*2 distributed-replicated master and slave volume and setup a geo-rep session between them. 2. Copy the glusterfs tarball to the mount point and untar it. time tar -xzvf /mnt/master/glusterfs-3.4.0.23rhs.tar.gz -C /mnt/master/ Actual results: Only the intial file created (which is tarball in our case) is synced. The untar of the tarball is not synced. Expected results: Untar if the tarball should be synced as well. Additional info: I see that it has entered infinite loop from the log messages below. [2013-08-26 19:24:09.10138] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/dcd117c8-0f64-449d-9279-d692d8c194fe [errcode: 23] [2013-08-26 19:24:09.12722] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/3af0be86-31c7-43a7-a60f-2c93809a120b [errcode: 23] [2013-08-26 19:24:09.14480] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/4f361c15-86cc-4d9c-a885-bce8617e9041 [errcode: 23] [2013-08-26 19:24:09.16259] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/a26436c5-f310-4db3-abd3-17fc14c9d749 [errcode: 23] [2013-08-26 19:24:09.17718] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/2255b375-d95d-457a-a5e7-7c6b8c34a157 [errcode: 23] [2013-08-26 19:24:09.19309] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/fe70eb24-d241-4194-aa06-45852997203e [errcode: 23] [2013-08-26 19:24:09.20987] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/442f6561-8bb7-4d41-b1d6-e7ef95c67c00 [errcode: 23] [2013-08-26 19:24:09.26143] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/e3acd8f7-fd6b-4840-a393-9cecce1542ad [errcode: 23] [2013-08-26 19:24:09.28171] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/c002f1ce-5340-43ce-b85d-1c8a18906c2f [errcode: 23] [2013-08-26 19:24:09.31312] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/f56d54c4-f5c7-4880-b9e0-77608d4fc89f [errcode: 23] [2013-08-26 19:24:09.32800] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/88b1982d-fae8-4c2c-b416-a87d56973b79 [errcode: 23] [2013-08-26 19:24:09.36612] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/11a8eeb0-d1d7-46de-a90d-6535fb8991e1 [errcode: 23] [2013-08-26 19:24:09.39102] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/404f754a-e8e4-4c46-ac98-b1a569ba72ae [errcode: 23] [2013-08-26 19:24:09.40699] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/43c89dcc-2e67-4ca0-92ba-00f3259dfc65 [errcode: 23] [2013-08-26 19:24:09.42232] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/b9006d8c-6c87-4eb2-a63f-8c0443df7b6e [errcode: 23] [2013-08-26 19:24:09.43760] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/883aa39d-2def-4850-9c72-2e5439bb3591 [errcode: 23] [2013-08-26 19:24:09.45663] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/a43eaa50-c1b4-4fa6-9db8-3040547914d5 [errcode: 23] [2013-08-26 19:24:09.47601] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/b4f08821-6fbd-4d84-bd96-03f046df6dc4 [errcode: 23] [2013-08-26 19:24:09.48479] W [master(/rhs/bricks/brick2):748:process] _GMaster: incomplete sync, retrying changelog: /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.35.90%3Agluster%3A%2F%2F127.0.0.1%3Aslave/68fa5cc90f61530aea097cdc78c2b376/.processing/CHANGELOG.1377524162 This message keeps on repeating in the logs continuously. Also, now when you start geo-replication an option called "geo-replication.ignore-pid-check" is set. This was not the case before. [root@pythagoras ~]# gluster v i Volume Name: master Type: Distributed-Replicate Volume ID: 111b3e60-e224-41b5-83f7-67daa9ff5445 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: pythagoras:/rhs/bricks/brick0 Brick2: ramanujan:/rhs/bricks/brick1 Brick3: pythagoras:/rhs/bricks/brick2 Brick4: ramanujan:/rhs/bricks/brick3 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on diagnostics.count-fop-hits: on diagnostics.latency-measurement: on changelog.fsync-interval: 5 changelog.rollover-time: 20 I'm not sure if that is the issue here. I am not sure what that option does and how it impacts.
This issue persists in both fuse and glusterfs mount.
https://code.engineering.redhat.com/gerrit/#/c/12027/ This is one patch fixing many similar issues and hence you see just one bug number in commit msg.
This is working now. Now the simple linux kernel untar and glusterfs source untar is being synced to slave volume.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html