Bug 1643402 - [Geo-Replication] Geo-rep faulty sesion because of the directories are not synced to slave.
Summary: [Geo-Replication] Geo-rep faulty sesion because of the directories are not s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL: d9faa0fab651135bfc706175c815a421b320cb10
Whiteboard:
Depends On: 1638069
Blocks: 1644518 1646896
TreeView+ depends on / blocked
 
Reported: 2018-10-26 07:56 UTC by Kotresh HR
Modified: 2019-03-25 16:31 UTC (History)
10 users (show)

Fixed In Version: glusterfs-6.0
Clone Of: 1638069
: 1644518 1646896 (view as bug list)
Environment:
Last Closed: 2018-11-28 04:43:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21498 0 None Merged geo-rep: Add more intelligence to automatic error handling 2018-10-30 13:14:18 UTC

Description Kotresh HR 2018-10-26 07:56:48 UTC
+++ This bug was initially created as a clone of Bug #1638069 +++

Description of problem:

geo-replication becomes 'Faulty' with ENTRY FAILUREs.


----------------------------
[2018-10-08 12:24:23.669809] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'gid': 0, 'mode': 33152, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
[2018-10-08 12:24:23.670046] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.052415, 'gid': 0, 'mtime': 1512030372.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/B77357E56302BB9311E7D5A81DAC4349.XLS', 'gfid': '2f863cd4-bb15-488c-bb35-d2df007b689c', 'link': None, 'entry': '.gfid/ecb73262-b6a7-417e-81d6-b7b2ab8eef06/.B77357E56302BB9311E7D5A81DAC4349.XLS.0nepGW', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671149] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'gid': 0, 'mode': 33152, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671329] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'stat': {'atime': 1512662436.225416, 'gid': 0, 'mtime': 1512611725.0, 'mode': 33188, 'uid': 0}, 'entry1': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/2970861B16119A5611E7DAF1AEDCD19B.XLS', 'gfid': 'a1c70b62-ae66-4f66-9f18-f620c543526e', 'link': None, 'entry': '.gfid/dd818b54-335c-4108-9c93-b748e9d61fc5/.2970861B16119A5611E7DAF1AEDCD19B.XLS.40iqzc', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2018-10-08 12:24:23.671480] E [master(/rhgs/brick2/data):785:log_failures] _GMaster: ENTRY FAILED      data=({'uid': 0, 'gfid': '63e4a4ec-b5aa-4f14-834c-2751247a1262', 'gid': 0, 'mode': 16832, 'entry': '.gfid/35e4b3ce-0485-47ae-8163-df8a5b45bb3f/201712', 'op': 'MKDIR'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})
-------------------------------------------


Analysis:

The error number logged is 2, which is ENOENT of parent on the slave and hence failed to create file under it on slave. As you had mentioned, there are some directories missing in slave and that is why these errors.

And I see this in all nodes, geo-rep logs the ENTRY FAILURE for files and proceeds further but if it's a directory ENTRY FAILURE because of ENOENT of parent, it will go to Faulty until it gets fixed. I will check out how can we handle in this kind of errors code itself. But for now we have to create those directories on slave with exact proper gfid as logged in errors. I will come up with the steps to create them on slave and share with you.

Note that automatic gfid conflict resolution only handles gfid mismatch scenarios and this does not fall under that. Please change the topic to

"geo-rep faulty  because of the directories are not synced to slave"

Version-Release number of selected component (if applicable):

mainline


How reproducible:

Comment 1 Worker Ant 2018-10-26 08:04:21 UTC
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#1) for review on master by Kotresh HR

Comment 2 Worker Ant 2018-10-30 13:14:15 UTC
REVIEW: https://review.gluster.org/21498 (geo-rep: Add more intelligence to automatic error handling) posted (#2) for review on master by Amar Tumballi

Comment 4 Shyamsundar 2018-11-29 15:20:34 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.1, please open a new bug report.

glusterfs-5.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 5 Shyamsundar 2019-03-25 16:31:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.