Bug 1644163 - geo-rep: geo-replication gets stuck after file rename and gfid conflict
Summary: geo-rep: geo-replication gets stuck after file rename and gfid conflict
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 4.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On: 1640347 1642865
Blocks: 1644158
TreeView+ depends on / blocked
 
Reported: 2018-10-30 06:26 UTC by Kotresh HR
Modified: 2018-11-29 15:26 UTC (History)
12 users (show)

Fixed In Version: glusterfs-4.1.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1642865
Environment:
Last Closed: 2018-11-28 06:13:56 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21512 0 None Merged geo-rep: Fix issue in gfid-conflict-resolution 2018-10-30 19:21:28 UTC

Description Kotresh HR 2018-10-30 06:26:57 UTC
+++ This bug was initially created as a clone of Bug #1642865 +++

+++ This bug was initially created as a clone of Bug #1640347 +++

Description of problem:


Version-Release number of selected component (if applicable):

master

How reproducible:

Rename the file on master while geo-replication is in place

Steps to Reproduce:
1. Create file
2. Geo-replicate the file
3. Rename file on master

Actual results:

Geo-replication gets stuck with errors:

[2018-10-17 14:59:44.454014] I [master(/gluster/brick1/brick1):814:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry        retry_count=1   entry=({'stat': {'atime': 1539323311.2722738, 'gid': 0, 'mtime': 1539323311.2792735, 'mode': 33277, 'uid': 0}, 'entry1': '.gfid/08bcd5e4-b2f9-459d-a549-3fd4a303aa25/koala.jpg', 'gfid': '917fd4ff-c476-46b2-a805-b8212bd3635a', 'link': None, 'entry': '.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala.jpg', 'op': 'RENAME'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': '61e447a7-ad37-4acf-a0ba-3d570368803d', 'name_mismatch': False, 'dst': False})
[2018-10-17 14:59:44.455204] I [master(/gluster/brick1/brick1):814:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry        retry_count=1   entry=({'stat': {'atime': 1539617414.1750674, 'gid': 0, 'mtime': 1539617414.1960669, 'mode': 33204, 'uid': 0}, 'entry1': '.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala.jpg', 'gfid': '8899e426-7709-4351-8181-b663eb57a6a7', 'link': None, 'entry': '.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala_0.jpg', 'op': 'RENAME'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': '61e447a7-ad37-4acf-a0ba-3d570368803d', 'name_mismatch': False, 'dst': True})
[2018-10-17 14:59:44.457129] I [master(/gluster/brick1/brick1):834:fix_possible_entry_failures] _GMaster: Fixing gfid mismatch in slave.  Safe to ignore, take out entry        retry_count=1   entry=({'stat': {'atime': 1539323311.2722738, 'gid': 0, 'mtime': 1539323311.2792735, 'mode': 33277, 'uid': 0}, 'entry1': '.gfid/d2de43dd-07c1-4413-b7eb-e793e66dc610/koala_0.jpg', 'gfid': '917fd4ff-c476-46b2-a805-b8212bd3635a', 'link': None, 'entry': '.gfid/08bcd5e4-b2f9-459d-a549-3fd4a303aa25/koala.jpg', 'op': 'RENAME'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': '8899e426-7709-4351-8181-b663eb57a6a7', 'name_mismatch': False, 'dst': True})
[2018-10-17 14:59:44.457343] E [syncdutils(/gluster/brick1/brick1):349:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 805, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1588, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1535, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1435, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1269, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1171, in process_change
    self.handle_entry_failures(failures, entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 926, in handle_entry_failures
    failures1, retries, entry_ops1)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 835, in fix_possible_entry_failures
    entries.remove(failure[0])
ValueError: list.remove(x): x not in list


Expected results:

Geo-replication proceeds and not getting stuck

Additional info:

--- Additional comment from Worker Ant on 2018-10-25 05:01:34 EDT ---

REVIEW: https://review.gluster.org/21483 (geo-rep: Fix issue in gfid-conflict-resolution) posted (#1) for review on master by Kotresh HR

--- Additional comment from Kotresh HR on 2018-10-26 01:00 EDT ---

Testcase:

1. Setup geo-rep session and mount the master volume at "/mastermnt"
2. Create a directory and change ownership to normal user
      mkdir -p /mastermnt/logrotate
      chown geoaccount:geoaccount /mastermnt/logrotate
3. Login as normal user (geoaccount) and run the logrotate_simulate.sh script

Observation:
  geo-rep will crash without the fix.

--- Additional comment from Worker Ant on 2018-10-26 05:26:21 EDT ---

COMMIT: https://review.gluster.org/21483 committed in master by "Sunny Kumar" <sunkumar@redhat.com> with a commit message- geo-rep: Fix issue in gfid-conflict-resolution

Problem:
During gfid-conflict-resolution, geo-rep crashes
with 'ValueError: list.remove(x): x not in list'

Cause and Analysis:
During gfid-conflict-resolution, the entry blob is
passed back to master along with additional
information to verify it's integrity. If everything
looks fine, the entry creation is ignored and is
deleted from the original list.  But it is crashing
during removal of entry from the list saying entry
not in list. The reason is that the stat information
in the entry blob was modified and sent back to
master if present.

Fix:
Send back the correct stat information for
gfid-conflict-resolution.

fixes: bz#1642865
Change-Id: I47a6aa60b2a495465aa9314eebcb4085f0b1c4fd
Signed-off-by: Kotresh HR <khiremat@redhat.com>

Comment 1 Worker Ant 2018-10-30 06:29:10 UTC
REVIEW: https://review.gluster.org/21512 (geo-rep: Fix issue in gfid-conflict-resolution) posted (#2) for review on release-4.1 by Kotresh HR

Comment 2 Worker Ant 2018-10-30 19:21:24 UTC
REVIEW: https://review.gluster.org/21512 (geo-rep: Fix issue in gfid-conflict-resolution) posted (#3) for review on release-4.1 by Shyamsundar Ranganathan

Comment 3 Kotresh HR 2018-11-28 06:13:56 UTC
Fixed in 4.1.6

Comment 4 Shyamsundar 2018-11-29 15:26:07 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.6, please open a new bug report.

glusterfs-4.1.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.