Bug 1001224 - Dist-geo-rep: Even the simple glusterfs tarball untar gets into a infinite loop and doesn't sync files.
Summary: Dist-geo-rep: Even the simple glusterfs tarball untar gets into a infinite lo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-26 19:28 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:56 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0.24rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:29:54 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description M S Vishwanath Bhat 2013-08-26 19:28:11 UTC
Description of problem:
Created geo-rep session between 282 distributed-replicated master and slave volume started geo-rep session in between them. But I did a very simple glusterfs untar and the files don't get synced. It get's into a infinite loop as can be seen from the logs. I am using the latest version of glusterfs. This used to work before. This is a regression.

Version-Release number of selected component (if applicable):
[root@ramanujan ~]# rpm -q glusterfs
glusterfs-3.4.0.23rhs-1.el6rhs.x86_64


How reproducible:
I did this three times and hit all three times. And although I have confirmed it in 23rhs version, I suspect it's present even in version 22rhs-2.

Steps to Reproduce:
1. Create and start a 2*2 distributed-replicated master and slave volume and setup a geo-rep session between them.
2. Copy the glusterfs tarball to the mount point and untar it.
 time tar -xzvf /mnt/master/glusterfs-3.4.0.23rhs.tar.gz -C /mnt/master/

Actual results:
Only the intial file created (which is tarball in our case) is synced. The untar of the tarball is not synced.

Expected results:
Untar if the tarball should be synced as well.

Additional info:
I see that it has entered infinite loop from the log messages below.

[2013-08-26 19:24:09.10138] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/dcd117c8-0f64-449d-9279-d692d8c194fe [errcode: 23]
[2013-08-26 19:24:09.12722] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/3af0be86-31c7-43a7-a60f-2c93809a120b [errcode: 23]
[2013-08-26 19:24:09.14480] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/4f361c15-86cc-4d9c-a885-bce8617e9041 [errcode: 23]
[2013-08-26 19:24:09.16259] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/a26436c5-f310-4db3-abd3-17fc14c9d749 [errcode: 23]
[2013-08-26 19:24:09.17718] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/2255b375-d95d-457a-a5e7-7c6b8c34a157 [errcode: 23]
[2013-08-26 19:24:09.19309] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/fe70eb24-d241-4194-aa06-45852997203e [errcode: 23]
[2013-08-26 19:24:09.20987] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/442f6561-8bb7-4d41-b1d6-e7ef95c67c00 [errcode: 23]
[2013-08-26 19:24:09.26143] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/e3acd8f7-fd6b-4840-a393-9cecce1542ad [errcode: 23]
[2013-08-26 19:24:09.28171] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/c002f1ce-5340-43ce-b85d-1c8a18906c2f [errcode: 23]
[2013-08-26 19:24:09.31312] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/f56d54c4-f5c7-4880-b9e0-77608d4fc89f [errcode: 23]
[2013-08-26 19:24:09.32800] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/88b1982d-fae8-4c2c-b416-a87d56973b79 [errcode: 23]
[2013-08-26 19:24:09.36612] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/11a8eeb0-d1d7-46de-a90d-6535fb8991e1 [errcode: 23]
[2013-08-26 19:24:09.39102] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/404f754a-e8e4-4c46-ac98-b1a569ba72ae [errcode: 23]
[2013-08-26 19:24:09.40699] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/43c89dcc-2e67-4ca0-92ba-00f3259dfc65 [errcode: 23]
[2013-08-26 19:24:09.42232] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/b9006d8c-6c87-4eb2-a63f-8c0443df7b6e [errcode: 23]
[2013-08-26 19:24:09.43760] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/883aa39d-2def-4850-9c72-2e5439bb3591 [errcode: 23]
[2013-08-26 19:24:09.45663] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/a43eaa50-c1b4-4fa6-9db8-3040547914d5 [errcode: 23]
[2013-08-26 19:24:09.47601] W [master(/rhs/bricks/brick2):618:regjob] <top>: Rsync: .gfid/b4f08821-6fbd-4d84-bd96-03f046df6dc4 [errcode: 23]
[2013-08-26 19:24:09.48479] W [master(/rhs/bricks/brick2):748:process] _GMaster: incomplete sync, retrying changelog: /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.35.90%3Agluster%3A%2F%2F127.0.0.1%3Aslave/68fa5cc90f61530aea097cdc78c2b376/.processing/CHANGELOG.1377524162


This message keeps on repeating in the logs continuously.

Also, now when you start geo-replication an option called "geo-replication.ignore-pid-check" is set. This was not the case before.

[root@pythagoras ~]# gluster v i
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: 111b3e60-e224-41b5-83f7-67daa9ff5445
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: pythagoras:/rhs/bricks/brick0
Brick2: ramanujan:/rhs/bricks/brick1
Brick3: pythagoras:/rhs/bricks/brick2
Brick4: ramanujan:/rhs/bricks/brick3
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
changelog.fsync-interval: 5
changelog.rollover-time: 20


I'm not sure if that is the issue here. I am not sure what that option does and how it impacts.

Comment 1 M S Vishwanath Bhat 2013-08-26 19:48:48 UTC
This issue persists in both fuse and glusterfs mount.

Comment 3 Amar Tumballi 2013-08-27 10:32:11 UTC
https://code.engineering.redhat.com/gerrit/#/c/12027/

This is one patch fixing many similar issues and hence you see just one bug number in commit msg.

Comment 4 M S Vishwanath Bhat 2013-08-27 16:13:30 UTC
This is working now. Now the simple linux kernel untar and glusterfs source untar is being synced to slave volume.

Comment 5 Scott Haines 2013-09-23 22:29:54 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.