Description of problem: ----------------------- Attempted to write a 200G file on a FUSE mount. The brick that it was hashed to became full. IO errored on the mount point : [root@gqac019 gluster-mount]# [root@gqac019 gluster-mount]# sh abc1.sh dd: writing to ‘100GClient21’: Input/output error dd: closing output file ‘100GClient21’: Input/output error ^C225195651+0 records in 225195651+0 records out 115300173312 bytes (115 GB) copied, 33243.9 s, 3.5 MB/s The file got deleted by the test script. *From mount logs* : [2017-06-05 19:47:07.617492] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 9-khal-client-20: remote operation failed [No space left on device] [2017-06-05 19:47:07.670204] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 9-khal-client-21: remote operation failed [No space left on device] [2017-06-05 19:47:07.707305] E [MSGID: 108008] [afr-transaction.c:2616:afr_write_txn_refresh_done] 9-khal-replicate-10: Failing WRITE on gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54: split-brain observed. [Input/output error] [2017-06-05 19:47:07.707763] W [MSGID: 108008] [afr-read-txn.c:229:afr_read_txn] 9-khal-replicate-10: Unreadable subvolume -1 found with event generation 2 for gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54. (Possible split-brain) [2017-06-05 19:47:07.724757] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 9-khal-replicate-10: Failing FGETXATTR on gfid 0bc295e6-d97d-4337-8817-6c2cffa75f54: split-brain observed. [Input/output error] Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.8.4-26.el7rhgs.x86_64 How reproducible: ----------------- Reporting the first occurence. Actual results: ---------------- EIO. Expected results: ----------------- ENOSPACE. Additional info: ---------------- [root@gqas013 d_009]# [root@gqas013 d_009]# gluster v info Volume Name: khal Type: Distributed-Replicate Volume ID: 86c9b338-70dd-407d-ab69-a40184064ce7 Status: Started Snapshot Count: 0 Number of Bricks: 16 x 2 = 32 Transport-type: tcp Bricks: Brick1: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick2: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/A1 Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick4: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/A1 Brick5: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick6: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/A1 Brick7: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick8: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/A1 Brick9: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick10: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/A1 Brick11: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick12: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/A1 Brick13: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick14: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/A1 Brick15: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick16: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/A1 Brick17: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick18: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick19: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick20: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick21: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick22: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick23: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick24: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick25: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick26: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/A1 Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/A1 Brick29: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick30: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/A1 Brick31: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Brick32: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/A1 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on network.inode-lru-limit: 50000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: off transport.address-family: inet nfs.disable: on [root@gqas013 d_009]#
The error EIO is genuine here. The file was in data split brain when he was getting the Input/Output error message on the mount. Since the file got deleted by the test script, there was no split brain reported by the heal info command. Cause of data split brain: I reproduced the bug with a 1*2 volume. When the data in the bricks nearing its threshold, I observer some short writes on alternate bricks, marking pending xattr on the respective bricks. Thus reporting EIO for the next writes, when the space was exhausted. Fixing this behaviour may need some changes in AFR. We don't know the complexity of the solution yet, and might take some time to decide on it and fix the issue.
Giving a meaningful summary.
*** Bug 1474736 has been marked as a duplicate of this bug. ***
Karthik - We'd need to have an update/decision on this bug on the plan. Are we going to work on this in coming future? Is it that critical to be addressed considering we now have the storage.reserved option where we reserve 1% of disk space?
Tested the Brick full scenario on 2X(2+1) arbiter volume. Gluster version: 6.0.7 Did not notice split-brain when disk was full. However noticed meta-data entries still happening on arbiter brick for failed ops with the error "no sufficient space" at mount point, this issue is tracked at BZ#1589829