Bug 1643402

Summary: [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: mainlineCC: abhishku, avishwan, bkunal, csaba, khiremat, rhinduja, rhs-bugs, sankarshan, sarora, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: d9faa0fab651135bfc706175c815a421b320cb10
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1638069
: 1644518 1646896 (view as bug list) Environment:
Last Closed: 2018-11-28 04:43:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1638069    
Bug Blocks: 1644518, 1646896    

Description Kotresh HR 2018-10-26 07:56:48 UTC
+++ This bug was initially created as a clone of Bug #1638069 +++

Description of problem:

geo-replication becomes 'Faulty' with ENTRY FAILUREs.


----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.052415, 'gid': 0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS', 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.225416, 'gid': 0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS', 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry': '.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------


Analysis:

The error number logged is 2, which is ENOENT of parent on the slave and hence failed to create file under it on slave. As you had mentioned, there are some directories missing in slave and that is why these errors.

And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of parent, it will go to Faulty until it gets fixed. I will check out how can we handle in this kind of errors code itself. But for now we have to create those directories on slave with exact proper gfid as logged in errors. I will come up with the steps to create them on slave and share with you.

Note that automatic gfid conflict resolution only handles gfid mismatch scenarios and this does not fall under that. Please change the topic to

"geo-rep faulty  because of the directories are not synced to slave"

Version-Release number of selected component (if applicable):

mainline


How reproducible:

Comment 1 Worker Ant 2018-10-26 08:04:21 UTC
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#1) for review on master by Kotresh HR

Comment 2 Worker Ant 2018-10-30 13:14:15 UTC
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on master by Amar Tumballi

Comment 4 Shyamsundar 2018-11-29 15:20:34 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.1, please open a new bug report.

glusterfs-5.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 5 Shyamsundar 2019-03-25 16:31:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/