Description of problem: Noticed a split brain on volume hosting VM images. All RHS servers were up for more than 58 days. volume type : 6X2 distributed replicate client was using Fuse mount Version-Release number of selected component (if applicable): glusterfs-3.3.0.6rhs-2.el6 How reproducible: Steps to Reproduce: Actual results: Expected results: Additional info:
Additional information ====================== 1. volume information [root@rhsvm1 ~]# gluster volume info Volume Name: vmstore0 Type: Distributed-Replicate Volume ID: 2a718178-98a6-4cdf-90e5-f97632cc32fc Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: rhsvm1:/data/store-0 Brick2: rhsvm2:/data/store-0 Brick3: rhsvm1:/data/store-1 Brick4: rhsvm2:/data/store-1 Brick5: rhsvm1:/data/store-2 Brick6: rhsvm2:/data/store-2 Brick7: rhsvm3:/data/store-0 Brick8: rhsvm4:/data/store-0 Brick9: rhsvm3:/data/store-1 Brick10: rhsvm4:/data/store-1 Brick11: rhsvm3:/data/store-2 Brick12: rhsvm4:/data/store-2 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on storage.owner-uid: 36 storage.owner-gid: 36 cluster.subvols-per-directory: 1 2. gluster volume status [root@rhsvm1 ~]# gluster volume status Status of volume: vmstore0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhsvm1:/data/store-0 24009 Y 1441 Brick rhsvm2:/data/store-0 24009 Y 1425 Brick rhsvm1:/data/store-1 24010 Y 1447 Brick rhsvm2:/data/store-1 24010 Y 1432 Brick rhsvm1:/data/store-2 24011 Y 1452 Brick rhsvm2:/data/store-2 24011 Y 1438 Brick rhsvm3:/data/store-0 24009 Y 1422 Brick rhsvm4:/data/store-0 24009 Y 1441 Brick rhsvm3:/data/store-1 24010 Y 1427 Brick rhsvm4:/data/store-1 24010 Y 1446 Brick rhsvm3:/data/store-2 24011 Y 1433 Brick rhsvm4:/data/store-2 24011 Y 1452 NFS Server on localhost 38467 Y 1459 Self-heal Daemon on localhost N/A Y 1465 NFS Server on rhsvm2 38467 Y 1445 Self-heal Daemon on rhsvm2 N/A Y 1451 NFS Server on rhsvm4 38467 Y 1459 Self-heal Daemon on rhsvm4 N/A Y 1465 NFS Server on rhsvm3 38467 Y 1441 Self-heal Daemon on rhsvm3 N/A Y 1447 3. Volume is mounted at /var/lib/libvirt/images/ 4. sosreports available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/974183/
Noticing error messages in /var/log/glusterfs/glustershd.log [2013-06-13 20:38:24.424964] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:9ea38a1b-d26e-4d47-bd5a-d79ecb5e9f8c> [2013-06-13 20:38:24.425886] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:385f0c89-480e-4532-bbd3-8473437deb19> [2013-06-13 20:38:24.428804] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-2: no active sinks for performing self-heal on file <gfid:6406d17c-258b-492a-a447-1cb2e48cb868> [2013-06-13 20:38:24.429595] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:44d59a64-dde1-4aba-9f87-ec0445781aa4> [2013-06-13 20:38:24.430737] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:bcd38330-1de5-40a1-a444-861fb6f129ae> [2013-06-13 20:38:24.434542] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:b8aee156-3a0e-4daf-bd3a-7267c3d9b4ae> [2013-06-13 20:38:24.435415] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:b4c9504e-e004-4f2d-aef0-8c747637bafd> [2013-06-13 20:48:24.625174] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:a108421f-1da6-46d0-a29d-39ef22e2195d> [2013-06-13 20:48:24.625387] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:51e27ea6-ce0b-4148-8586-ca6cd39974f5> [2013-06-13 20:48:24.627056] E [afr-self-heal-data.c:765:afr_sh_data_fxattrop_fstat_done] 0-vmstore0-replicate-2: Unable to self-heal contents of '<gfid:7602e77a-6db4-4f00-bbd8-16c5a8ec6db8>' (possible split-brain). Please delete the file from all but the preferred subvolume.
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html