Bug 974183

Summary:	Split-brain occurred while using RHS volume as image store
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	SATHEESARAN <sasundar>
Component:	replicate	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED WONTFIX	QA Contact:	SATHEESARAN <sasundar>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.0	CC:	lmohanty, nsathyan, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-23 07:37:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description SATHEESARAN 2013-06-13 15:14:44 UTC

Description of problem:
Noticed a split brain on volume hosting VM images. All RHS servers were up for more than 58 days.

volume type : 6X2 distributed replicate
client was using Fuse mount

Version-Release number of selected component (if applicable):
glusterfs-3.3.0.6rhs-2.el6


How reproducible:


Steps to Reproduce:

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2013-06-13 15:20:39 UTC

Additional information
======================

1. volume information
[root@rhsvm1 ~]# gluster volume info
 
Volume Name: vmstore0
Type: Distributed-Replicate
Volume ID: 2a718178-98a6-4cdf-90e5-f97632cc32fc
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: rhsvm1:/data/store-0
Brick2: rhsvm2:/data/store-0
Brick3: rhsvm1:/data/store-1
Brick4: rhsvm2:/data/store-1
Brick5: rhsvm1:/data/store-2
Brick6: rhsvm2:/data/store-2
Brick7: rhsvm3:/data/store-0
Brick8: rhsvm4:/data/store-0
Brick9: rhsvm3:/data/store-1
Brick10: rhsvm4:/data/store-1
Brick11: rhsvm3:/data/store-2
Brick12: rhsvm4:/data/store-2
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
storage.owner-uid: 36
storage.owner-gid: 36
cluster.subvols-per-directory: 1

2. gluster volume status

[root@rhsvm1 ~]# gluster volume status
Status of volume: vmstore0
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick rhsvm1:/data/store-0                              24009   Y       1441
Brick rhsvm2:/data/store-0                              24009   Y       1425
Brick rhsvm1:/data/store-1                              24010   Y       1447
Brick rhsvm2:/data/store-1                              24010   Y       1432
Brick rhsvm1:/data/store-2                              24011   Y       1452
Brick rhsvm2:/data/store-2                              24011   Y       1438
Brick rhsvm3:/data/store-0                              24009   Y       1422
Brick rhsvm4:/data/store-0                              24009   Y       1441
Brick rhsvm3:/data/store-1                              24010   Y       1427
Brick rhsvm4:/data/store-1                              24010   Y       1446
Brick rhsvm3:/data/store-2                              24011   Y       1433
Brick rhsvm4:/data/store-2                              24011   Y       1452
NFS Server on localhost                                 38467   Y       1459
Self-heal Daemon on localhost                           N/A     Y       1465
NFS Server on rhsvm2                                    38467   Y       1445
Self-heal Daemon on rhsvm2                              N/A     Y       1451
NFS Server on rhsvm4                                    38467   Y       1459
Self-heal Daemon on rhsvm4                              N/A     Y       1465
NFS Server on rhsvm3                                    38467   Y       1441
Self-heal Daemon on rhsvm3                              N/A     Y       1447

3. Volume is mounted at /var/lib/libvirt/images/

4. sosreports available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/974183/

Comment 2 SATHEESARAN 2013-06-13 15:22:11 UTC

Noticing error messages in /var/log/glusterfs/glustershd.log

[2013-06-13 20:38:24.424964] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:9ea38a1b-d26e-4d47-bd5a-d79ecb5e9f8c>
[2013-06-13 20:38:24.425886] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:385f0c89-480e-4532-bbd3-8473437deb19>
[2013-06-13 20:38:24.428804] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-2: no active sinks for performing self-heal on file <gfid:6406d17c-258b-492a-a447-1cb2e48cb868>
[2013-06-13 20:38:24.429595] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:44d59a64-dde1-4aba-9f87-ec0445781aa4>
[2013-06-13 20:38:24.430737] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:bcd38330-1de5-40a1-a444-861fb6f129ae>
[2013-06-13 20:38:24.434542] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:b8aee156-3a0e-4daf-bd3a-7267c3d9b4ae>
[2013-06-13 20:38:24.435415] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:b4c9504e-e004-4f2d-aef0-8c747637bafd>
[2013-06-13 20:48:24.625174] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:a108421f-1da6-46d0-a29d-39ef22e2195d>
[2013-06-13 20:48:24.625387] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:51e27ea6-ce0b-4148-8586-ca6cd39974f5>
[2013-06-13 20:48:24.627056] E [afr-self-heal-data.c:765:afr_sh_data_fxattrop_fstat_done] 0-vmstore0-replicate-2: Unable to self-heal contents of '<gfid:7602e77a-6db4-4f00-bbd8-16c5a8ec6db8>' (possible split-brain). Please delete the file from all but the preferred subvolume.

Comment 4 Vivek Agarwal 2015-03-23 07:37:51 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 5 Vivek Agarwal 2015-03-23 07:39:34 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html