974183 – Split-brain occurred while using RHS volume as image store

Bug 974183 - Split-brain occurred while using RHS volume as image store

Summary: Split-brain occurred while using RHS volume as image store

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-13 15:14 UTC by SATHEESARAN
Modified:	2016-09-17 12:11 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-23 07:37:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description SATHEESARAN 2013-06-13 15:14:44 UTC

Description of problem:
Noticed a split brain on volume hosting VM images. All RHS servers were up for more than 58 days.

volume type : 6X2 distributed replicate
client was using Fuse mount

Version-Release number of selected component (if applicable):
glusterfs-3.3.0.6rhs-2.el6


How reproducible:


Steps to Reproduce:

Actual results:


Expected results:


Additional info:

Comment 1 SATHEESARAN 2013-06-13 15:20:39 UTC

Additional information
======================

1. volume information
[root@rhsvm1 ~]# gluster volume info
 
Volume Name: vmstore0
Type: Distributed-Replicate
Volume ID: 2a718178-98a6-4cdf-90e5-f97632cc32fc
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: rhsvm1:/data/store-0
Brick2: rhsvm2:/data/store-0
Brick3: rhsvm1:/data/store-1
Brick4: rhsvm2:/data/store-1
Brick5: rhsvm1:/data/store-2
Brick6: rhsvm2:/data/store-2
Brick7: rhsvm3:/data/store-0
Brick8: rhsvm4:/data/store-0
Brick9: rhsvm3:/data/store-1
Brick10: rhsvm4:/data/store-1
Brick11: rhsvm3:/data/store-2
Brick12: rhsvm4:/data/store-2
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
storage.owner-uid: 36
storage.owner-gid: 36
cluster.subvols-per-directory: 1

2. gluster volume status

[root@rhsvm1 ~]# gluster volume status
Status of volume: vmstore0
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick rhsvm1:/data/store-0                              24009   Y       1441
Brick rhsvm2:/data/store-0                              24009   Y       1425
Brick rhsvm1:/data/store-1                              24010   Y       1447
Brick rhsvm2:/data/store-1                              24010   Y       1432
Brick rhsvm1:/data/store-2                              24011   Y       1452
Brick rhsvm2:/data/store-2                              24011   Y       1438
Brick rhsvm3:/data/store-0                              24009   Y       1422
Brick rhsvm4:/data/store-0                              24009   Y       1441
Brick rhsvm3:/data/store-1                              24010   Y       1427
Brick rhsvm4:/data/store-1                              24010   Y       1446
Brick rhsvm3:/data/store-2                              24011   Y       1433
Brick rhsvm4:/data/store-2                              24011   Y       1452
NFS Server on localhost                                 38467   Y       1459
Self-heal Daemon on localhost                           N/A     Y       1465
NFS Server on rhsvm2                                    38467   Y       1445
Self-heal Daemon on rhsvm2                              N/A     Y       1451
NFS Server on rhsvm4                                    38467   Y       1459
Self-heal Daemon on rhsvm4                              N/A     Y       1465
NFS Server on rhsvm3                                    38467   Y       1441
Self-heal Daemon on rhsvm3                              N/A     Y       1447

3. Volume is mounted at /var/lib/libvirt/images/

4. sosreports available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/974183/

Comment 2 SATHEESARAN 2013-06-13 15:22:11 UTC

Noticing error messages in /var/log/glusterfs/glustershd.log

[2013-06-13 20:38:24.424964] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:9ea38a1b-d26e-4d47-bd5a-d79ecb5e9f8c>
[2013-06-13 20:38:24.425886] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:385f0c89-480e-4532-bbd3-8473437deb19>
[2013-06-13 20:38:24.428804] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-2: no active sinks for performing self-heal on file <gfid:6406d17c-258b-492a-a447-1cb2e48cb868>
[2013-06-13 20:38:24.429595] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:44d59a64-dde1-4aba-9f87-ec0445781aa4>
[2013-06-13 20:38:24.430737] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:bcd38330-1de5-40a1-a444-861fb6f129ae>
[2013-06-13 20:38:24.434542] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:b8aee156-3a0e-4daf-bd3a-7267c3d9b4ae>
[2013-06-13 20:38:24.435415] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:b4c9504e-e004-4f2d-aef0-8c747637bafd>
[2013-06-13 20:48:24.625174] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-1: no active sinks for performing self-heal on file <gfid:a108421f-1da6-46d0-a29d-39ef22e2195d>
[2013-06-13 20:48:24.625387] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-vmstore0-replicate-0: no active sinks for performing self-heal on file <gfid:51e27ea6-ce0b-4148-8586-ca6cd39974f5>
[2013-06-13 20:48:24.627056] E [afr-self-heal-data.c:765:afr_sh_data_fxattrop_fstat_done] 0-vmstore0-replicate-2: Unable to self-heal contents of '<gfid:7602e77a-6db4-4f00-bbd8-16c5a8ec6db8>' (possible split-brain). Please delete the file from all but the preferred subvolume.

Comment 4 Vivek Agarwal 2015-03-23 07:37:51 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Comment 5 Vivek Agarwal 2015-03-23 07:39:34 UTC

The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version.







[1] https://rhn.redhat.com/errata/RHSA-2014-0821.html

Note You need to log in before you can comment on or make changes to this bug.