Bug 1366818 - rename of a file can cause data loss in an arbiter volume configuration
Summary: rename of a file can cause data loss in an arbiter volume configuration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: arbiter
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1357000
Blocks: 1362129
TreeView+ depends on / blocked
 
Reported: 2016-08-13 02:04 UTC by Pranith Kumar K
Modified: 2018-08-29 03:36 UTC (History)
3 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1357000
Environment:
Last Closed: 2018-08-29 03:36:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2016-08-13 02:04:25 UTC
+++ This bug was initially created as a clone of Bug #1357000 +++

Description of problem:
=========================
there is a case where rename of a file leads to data loss.



Steps to Reproduce:
===================
1.create a 1x(2+1) volume with bricks as say db1,db2 and ab1
2.now mount the vol by fuse
3.create a directory say dir1
4. Now bring down the first data brick(db1) 
5. create a file say f1 under dir1 with some contents 
6. note down the getfattr details from both db2 and ab1 
7. now bring down db2 and bring up db1
8. trigger a heal 
9. now rename f1 to f2
10. now bring up db2 and trigger a heal
11. from mount do a cat of f2

We get EIO
[root@dhcp42-93 db1_Down]# cat renamdatafile 
cat: renamdatafile: Input/output error

client logs:
[2016-07-15 12:25:40.299090] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-arbit-replicate-0: Unreadable subvolume -1 found with event generation 7 for gfid 091d29dd-f4e1-49da-8353-1686e59818de. (Possible split-brain)
[2016-07-15 12:25:40.301196] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-arbit-replicate-0: Failing FGETXATTR on gfid 091d29dd-f4e1-49da-8353-1686e59818de: split-brain observed. [Input/output error]
[2016-07-15 12:25:40.302017] W [MSGID: 108027] [afr-common.c:2245:afr_discover_done] 0-arbit-replicate-0: no read subvols for (null)
[2016-07-15 12:25:40.305693] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 796: READ => -1 gfid=091d29dd-f4e1-49da-8353-1686e59818de fd=0x7fcbf801579c (Input/output error)
[2016-07-15 12:25:40.303768] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-arbit-replicate-0: Unreadable subvolume -1 found with event generation 7 for gfid 091d29dd-f4e1-49da-8353-1686e59818de. (Possible split-brain)
[2016-07-15 12:25:40.305666] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-arbit-replicate-0: Failing READ on gfid 091d29dd-f4e1-49da-8353-1686e59818de: split-brain observed. [Input/output error]



db1 getfattr:
root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick2/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.arbit-client-0=0x000000030000000000000000
trusted.afr.arbit-client-1=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005788cb3000085f14
trusted.gfid=0x091d29ddf4e149da83531686e59818de


db2:[root@dhcp43-153 ~]# getfattr -d -m . -e hex /bricks/brick1/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x091d29ddf4e149da83531686e59818de

ab1:

[root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick0/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x091d29ddf4e149da83531686e59818de


Volume Name: arbit
Type: Replicate
Volume ID: 0069b5a7-bfdf-4f59-86ec-851f500ed902
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.43.129:/bricks/brick0/arbit
Brick2: 10.70.43.153:/bricks/brick1/arbit
Brick3: 10.70.43.129:/bricks/brick2/arbit (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
[root@dhcp43-157 ~]# 





Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-07-27 00:20:31 EDT ---

REVIEW: http://review.gluster.org/15017 (afr: some coverity fixes) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)

--- Additional comment from Ravishankar N on 2016-07-27 00:22:42 EDT ---

Ignore comment #1, that patch is for a different bug.

Comment 1 Vijay Bellur 2016-08-21 17:53:59 UTC
REVIEW: http://review.gluster.org/15226 (afr/posix: anoninode logic for entyr-self-heal) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2016-09-14 11:03:36 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Worker Ant 2016-09-15 11:40:00 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 4 Worker Ant 2016-09-20 11:48:46 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#4) for review on master by Ravishankar N (ravishankar)

Comment 5 Worker Ant 2016-09-20 11:55:37 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#5) for review on master by Ravishankar N (ravishankar)

Comment 6 Worker Ant 2016-09-20 12:06:48 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#6) for review on master by Ravishankar N (ravishankar)

Comment 7 Worker Ant 2016-09-20 12:19:14 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#7) for review on master by Ravishankar N (ravishankar)

Comment 8 Worker Ant 2016-09-20 12:29:21 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#8) for review on master by Ravishankar N (ravishankar)

Comment 9 Worker Ant 2016-09-26 04:46:55 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#9) for review on master by Ravishankar N (ravishankar)

Comment 10 Worker Ant 2016-10-04 10:24:34 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#10) for review on master by Ravishankar N (ravishankar)

Comment 11 Worker Ant 2016-10-13 13:53:09 UTC
REVIEW: http://review.gluster.org/15226 (afr, posix: anoninode logic for entry selfheal) posted (#11) for review on master by Ravishankar N (ravishankar)

Comment 12 Amar Tumballi 2018-08-29 03:36:29 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.