Bug 1646896 - [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.
Summary: [Geo-Replication] Geo-rep faulty sesion because of the directories are not s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 5
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On: 1638069 1643402
Blocks: 1644518
TreeView+ depends on / blocked
 
Reported: 2018-11-06 09:22 UTC by Kotresh HR
Modified: 2018-11-29 15:20 UTC (History)
11 users (show)

Fixed In Version: glusterfs-5.1
Clone Of: 1643402
Environment:
Last Closed: 2018-11-28 05:59:41 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21498 0 None None None 2018-11-06 09:22:08 UTC
Gluster.org Gerrit 21566 0 None Merged geo-rep: Add more intelligence to automatic error handling 2018-11-09 18:47:34 UTC

Description Kotresh HR 2018-11-06 09:22:09 UTC
+++ This bug was initially created as a clone of Bug #1643402 +++

+++ This bug was initially created as a clone of Bug #1638069 +++

Description of problem:

geo-replication becomes 'Faulty' with ENTRY FAILUREs.


----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.052415, 'gid': 0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS', 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.225416, 'gid': 0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS', 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry': '.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------


Analysis:

The error number logged is 2, which is ENOENT of parent on the slave and hence failed to create file under it on slave. As you had mentioned, there are some directories missing in slave and that is why these errors.

And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of parent, it will go to Faulty until it gets fixed. I will check out how can we handle in this kind of errors code itself. But for now we have to create those directories on slave with exact proper gfid as logged in errors. I will come up with the steps to create them on slave and share with you.

Note that automatic gfid conflict resolution only handles gfid mismatch scenarios and this does not fall under that. Please change the topic to

"geo-rep faulty  because of the directories are not synced to slave"

Version-Release number of selected component (if applicable):

mainline


How reproducible:

--- Additional comment from Worker Ant on 2018-10-26 04:04:21 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#1) for review on master by Kotresh HR

--- Additional comment from Worker Ant on 2018-10-30 09:14:15 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on master by Amar Tumballi

Comment 1 Worker Ant 2018-11-06 09:28:00 UTC
REVIEW: https://review.gluster.org/21566 (geo-rep: Add more intelligence to automatic error handling) posted (#1) for review on release-5 by Kotresh HR

Comment 2 Worker Ant 2018-11-09 18:47:31 UTC
REVIEW: https://review.gluster.org/21566 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on release-5 by Shyamsundar Ranganathan

Comment 3 Kotresh HR 2018-11-28 05:59:41 UTC
Fixed in v5.1

Comment 4 Shyamsundar 2018-11-29 15:20:56 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.1, please open a new bug report.

glusterfs-5.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.