Bug 1694637 - Geo-rep: Rename to an existing file name destroys its content on slave
Summary: Geo-rep: Rename to an existing file name destroys its content on slave
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 5
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: Sunny Kumar
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-01 10:03 UTC by homma
Modified: 2019-06-18 09:29 UTC (History)
3 users (show)

Fixed In Version: glusterfs-6.x
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-18 09:29:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description homma 2019-04-01 10:03:52 UTC
Description of problem:
Renaming a file to an existing file name on master results in an empty file on slave.

Version-Release number of selected component (if applicable):
glusterfs 5.5-1.el7 from centos-gluster5 repository

How reproducible:
Always

Steps to Reproduce:
1. On geo-rep master, create a temporary files and rename them to existing files repeatedly:
for n in {0..9}; do for i in {0..9}; do printf "%04d\n" $n > file$i.tmp; mv file$i.tmp file$i; done; done

2. List the created files on master and slave.

Actual results:

On master
$ ls -l
total 6
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file0
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file1
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file2
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file3
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file4
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file5
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file6
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file7
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file8
-rw-rw-r-- 1 centos centos   5 Apr  1 18:08 file9

On slave
$ ls -l
total 1
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file0
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file0.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file1
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file1.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file2
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file2.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file3
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file3.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file4
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file4.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file5
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file5.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file6
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file6.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file7
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file7.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file8
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file8.tmp
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file9
-rw-rw-r-- 1 centos centos   0 Apr  1 18:08 file9.tmp


Expected results:

Files are successfully renamed with correct contents on slave.

Additional info:

I have a 2-node replicated volume on master, and a single-node volume on slave.

Master volume:

Volume Name: www
Type: Replicate
Volume ID: bc99bbd2-20f9-4440-b51e-a1e105adfdf3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs01.localdomain:/glusterfs/www/brick1/brick
Brick2: fs02.localdomain:/glusterfs/www/brick1/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.build-pgfid: on
server.manage-gids: on
network.ping-timeout: 3
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on


Slave volume:

Volume Name: www
Type: Distribute
Volume ID: 026a58f5-9696-4d9e-9674-74771526e880
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: fs21.localdomain:/glusterfs/www/brick1/brick
Options Reconfigured:
storage.build-pgfid: on
server.manage-gids: on
network.ping-timeout: 3
transport.address-family: inet
nfs.disable: on


Many messages as follows appear in gsyncd.log on master:

[2019-04-01 09:08:06.994154] I [master(worker /glusterfs/www/brick1/brick):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry    retry_count=1   entry=({'stat': {}, 'entry1': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0', 'gfid': '54ff5e4c-8565-4246-aa1d-0b2b59a8d577', 'link': None, 'entry': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0.tmp', 'op': 'RENAME'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': 'df891073-b19c-481c-9916-f96790ff4d31', 'name_mismatch': False, 'dst': True})

[2019-04-01 09:08:07.33778] I [master(worker /glusterfs/www/brick1/brick):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry     retry_count=1   entry=({'uid': 1000, 'gfid': 'c2836641-1000-48b0-865e-2c9ea6815baf', 'gid': 1000, 'mode': 4294934964, 'entry': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0.tmp', 'op': 'CREATE'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': '54ff5e4c-8565-4246-aa1d-0b2b59a8d577', 'name_mismatch': False, 'dst': False})

[2019-04-01 09:08:07.319814] I [master(worker /glusterfs/www/brick1/brick):904:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Create parent directory on slave.   retry_count=1   entry=({'stat': {'atime': 1554109682.6345513, 'gid': 1000, 'mtime': 1554109682.6455512, 'mode': 33204, 'uid': 1000}, 'entry1': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0', 'gfid': '5755b878-9ba6-4da4-aa27-28cf6defd06e', 'link': None, 'entry': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0.tmp', 'op': 'RENAME'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False})

[2019-04-01 09:08:13.855005] E [master(worker /glusterfs/www/brick1/brick):784:log_failures] _GMaster: ENTRY FAILED     data=({'uid': 1000, 'gfid': '5755b878-9ba6-4da4-aa27-28cf6defd06e', 'gid': 1000, 'mode': 4294934964, 'entry': '.gfid/1915ab69-f1cd-42bf-8e75-0507ac765b58/file0.tmp', 'op': 'CREATE'}, 17, {'slave_isdir': False, 'gfid_mismatch': True, 'slave_name': None, 'slave_gfid': '54ff5e4c-8565-4246-aa1d-0b2b59a8d577', 'name_mismatch': False, 'dst': False})

Comment 1 Sunny Kumar 2019-04-15 05:30:59 UTC
We are working on this issue and a bug is already in place for mainline which can be tracked here:

https://bugzilla.redhat.com/show_bug.cgi?id=1694820


Note You need to log in before you can comment on or make changes to this bug.