Bug 1357000

Summary: rename of a file can cause data loss in an arbiter volume configuration
Product: [Community] GlusterFS Reporter: Nag Pavan Chilakam <nchilaka>
Component: arbiterAssignee: Ravishankar N <ravishankar>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.7.9CC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1362129 1366818 (view as bug list) Environment:
Last Closed: 2017-03-08 10:52:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1362129, 1366818    

Description Nag Pavan Chilakam 2016-07-15 13:05:18 UTC
Description of problem:
=========================
there is a case where rename of a file leads to data loss.



Steps to Reproduce:
===================
1.create a 1x(2+1) volume with bricks as say db1,db2 and ab1
2.now mount the vol by fuse
3.create a directory say dir1
4. Now bring down the first data brick(db1) 
5. create a file say f1 under dir1 with some contents 
6. note down the getfattr details from both db2 and ab1 
7. now bring down db2 and bring up db1
8. trigger a heal 
9. now rename f1 to f2
10. now bring up db2 and trigger a heal
11. from mount do a cat of f2

We get EIO
[root@dhcp42-93 db1_Down]# cat renamdatafile 
cat: renamdatafile: Input/output error

client logs:
[2016-07-15 12:25:40.299090] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-arbit-replicate-0: Unreadable subvolume -1 found with event generation 7 for gfid 091d29dd-f4e1-49da-8353-1686e59818de. (Possible split-brain)
[2016-07-15 12:25:40.301196] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-arbit-replicate-0: Failing FGETXATTR on gfid 091d29dd-f4e1-49da-8353-1686e59818de: split-brain observed. [Input/output error]
[2016-07-15 12:25:40.302017] W [MSGID: 108027] [afr-common.c:2245:afr_discover_done] 0-arbit-replicate-0: no read subvols for (null)
[2016-07-15 12:25:40.305693] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 796: READ => -1 gfid=091d29dd-f4e1-49da-8353-1686e59818de fd=0x7fcbf801579c (Input/output error)
[2016-07-15 12:25:40.303768] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-arbit-replicate-0: Unreadable subvolume -1 found with event generation 7 for gfid 091d29dd-f4e1-49da-8353-1686e59818de. (Possible split-brain)
[2016-07-15 12:25:40.305666] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-arbit-replicate-0: Failing READ on gfid 091d29dd-f4e1-49da-8353-1686e59818de: split-brain observed. [Input/output error]



db1 getfattr:
root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick2/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.arbit-client-0=0x000000030000000000000000
trusted.afr.arbit-client-1=0x000000010000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005788cb3000085f14
trusted.gfid=0x091d29ddf4e149da83531686e59818de


db2:[root@dhcp43-153 ~]# getfattr -d -m . -e hex /bricks/brick1/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x091d29ddf4e149da83531686e59818de

ab1:

[root@dhcp43-157 ~]#  getfattr -d -m . -e hex /bricks/brick0/arbit/db1_Down/renamdatafile 
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/arbit/db1_Down/renamdatafile
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x091d29ddf4e149da83531686e59818de


Volume Name: arbit
Type: Replicate
Volume ID: 0069b5a7-bfdf-4f59-86ec-851f500ed902
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.43.129:/bricks/brick0/arbit
Brick2: 10.70.43.153:/bricks/brick1/arbit
Brick3: 10.70.43.129:/bricks/brick2/arbit (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
[root@dhcp43-157 ~]# 





Expected results:


Additional info:

Comment 1 Vijay Bellur 2016-07-27 04:20:31 UTC
REVIEW: http://review.gluster.org/15017 (afr: some coverity fixes) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 2 Ravishankar N 2016-07-27 04:22:42 UTC
Ignore comment #1, that patch is for a different bug.

Comment 3 Kaushal 2017-03-08 10:52:33 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.