Bug 1644518

Summary: [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.1CC: abhishku, avishwan, bkunal, bugs, csaba, khiremat, rhinduja, rhs-bugs, sankarshan, sarora, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-4.1.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1643402 Environment:
Last Closed: 2018-11-28 06:16:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1638069, 1643402, 1646896    
Bug Blocks:    

Description Kotresh HR 2018-10-31 04:29:20 UTC
+++ This bug was initially created as a clone of Bug #1643402 +++

+++ This bug was initially created as a clone of Bug #1638069 +++

Description of problem:

geo-replication becomes 'Faulty' with ENTRY FAILUREs.


----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.052415, 'gid': 0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS', 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.225416, 'gid': 0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS', 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry': '.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------


Analysis:

The error number logged is 2, which is ENOENT of parent on the slave and hence failed to create file under it on slave. As you had mentioned, there are some directories missing in slave and that is why these errors.

And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of parent, it will go to Faulty until it gets fixed. I will check out how can we handle in this kind of errors code itself. But for now we have to create those directories on slave with exact proper gfid as logged in errors. I will come up with the steps to create them on slave and share with you.

Note that automatic gfid conflict resolution only handles gfid mismatch scenarios and this does not fall under that. Please change the topic to

"geo-rep faulty  because of the directories are not synced to slave"

Version-Release number of selected component (if applicable):

mainline


How reproducible:

--- Additional comment from Worker Ant on 2018-10-26 04:04:21 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#1) for review on master by Kotresh HR

--- Additional comment from Worker Ant on 2018-10-30 09:14:15 EDT ---

REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on master by Amar Tumballi

Comment 1 Worker Ant 2018-10-31 04:32:26 UTC
REVIEW: https://review.gluster.org/21519 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on release-4.1 by Kotresh HR

Comment 2 Worker Ant 2018-11-05 19:10:35 UTC
REVIEW: https://review.gluster.org/21519 (geo-rep: Add more intelligence to automatic error handling) posted (#3) for review on release-4.1 by Shyamsundar Ranganathan

Comment 3 Kotresh HR 2018-11-28 06:16:45 UTC
Fixed in 4.1.6

Comment 4 Shyamsundar 2018-11-29 15:26:07 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.6, please open a new bug report.

glusterfs-4.1.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/