1459166 – [Stress] : Files appear to be brain-split , when the brick that the file is hashed to is full.

Bug 1459166 - [Stress] : Files appear to be brain-split , when the brick that the file is hashed to is full.

Summary: [Stress] : Files appear to be brain-split , when the brick that the file is h...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Karthik U S
QA Contact:	Anees Patel
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1474736 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-06 12:51 UTC by Ambarish
Modified:	2019-07-06 16:53 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-13 07:39:56 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ambarish 2017-06-06 12:51:58 UTC

Description of problem:
-----------------------

Attempted to write a 200G file on a FUSE mount.

The brick that it was hashed to became full.

IO errored on the mount point :

[root@gqac019 gluster-mount]#
[root@gqac019 gluster-mount]# sh abc1.sh
dd: writing to ‘100GClient21’: Input/output error
dd: closing output file ‘100GClient21’: Input/output error
^C225195651+0 records in
225195651+0 records out
115300173312 bytes (115 GB) copied, 33243.9 s, 3.5 MB/s


The file got deleted by the test script.


*From mount logs* :

[2017-06-05 19:47:07.617492] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 9-khal-client-20: remote operation failed [No space left on device]
[2017-06-05 19:47:07.670204] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 9-khal-client-21: remote operation failed [No space left on device]
[2017-06-05 19:47:07.707305] E [MSGID: 108008] [afr-transaction.c:2616:afr_write_txn_refresh_done] 9-khal-replicate-10: Failing WRITE on gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54: split-brain observed. [Input/output error]
[2017-06-05 19:47:07.707763] W [MSGID: 108008] [afr-read-txn.c:229:afr_read_txn] 9-khal-replicate-10: Unreadable subvolume -1 found with event generation 2 for gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54. (Possible split-brain)
[2017-06-05 19:47:07.724757] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 9-khal-replicate-10: Failing FGETXATTR on gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54: split-brain observed. [Input/output error]



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-26.el7rhgs.x86_64


How reproducible:
-----------------

Reporting the first occurence.


Actual results:
----------------

EIO.

Expected results:
-----------------

ENOSPACE.

Additional info:
----------------

[root@gqas013 d_009]# 
[root@gqas013 d_009]# gluster v info
 
Volume Name: khal
Type: Distributed-Replicate
Volume ID: 86c9b338-70dd-407d-ab69-a40184064ce7
Status: Started
Snapshot Count: 0
Number of Bricks: 16 x 2 = 32
Transport-type: tcp
Bricks:
Brick1: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick2: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1
Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick4: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1
Brick5: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick6: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1
Brick7: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick8: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1
Brick9: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick10: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1
Brick11: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick12: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1
Brick13: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick14: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1
Brick15: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick16: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1
Brick17: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick18: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick19: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick20: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick21: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick22: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick23: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick24: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick25: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick26: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1
Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1
Brick29: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick30: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1
Brick31: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Brick32: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
network.inode-lru-limit: 50000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: off
transport.address-family: inet
nfs.disable: on
[root@gqas013 d_009]#

Comment 6 Karthik U S 2017-06-12 11:31:54 UTC

The error EIO is genuine here. The file was in data split brain when he was getting the Input/Output error message on the mount. Since the file got deleted by the test script, there was no split brain reported by the heal info command. 

Cause of data split brain:
I reproduced the bug with a 1*2 volume. When the data in the bricks nearing its threshold, I observer some short writes on alternate bricks, marking pending xattr on the respective bricks. Thus reporting EIO for the next writes, when the space was exhausted.

Fixing this behaviour may need some changes in AFR. We don't know the complexity of the solution yet, and might take some time to decide on it and fix the issue.

Comment 8 Ambarish 2017-06-19 10:47:22 UTC

Giving  a meaningful summary.

Comment 10 Karthik U S 2017-08-24 09:18:02 UTC

*** Bug 1474736 has been marked as a duplicate of this bug. ***

Comment 13 Atin Mukherjee 2018-11-11 21:24:32 UTC

Karthik - We'd need to have an update/decision on this bug on the plan. Are we going to work on this in coming future? Is it that critical to be addressed considering we now have the storage.reserved option where we reserve 1% of disk space?

Comment 16 Anees Patel 2019-07-06 16:53:27 UTC

Tested the Brick full scenario on 2X(2+1) arbiter volume.
Gluster version: 6.0.7

Did not notice split-brain when disk was full.
However noticed meta-data entries still happening on arbiter brick for failed ops with the error "no sufficient space" at mount point, this issue is tracked at BZ#1589829

Note You need to log in before you can comment on or make changes to this bug.