861314 – [RHEV-RHS]: split-brains while self-healing the Virtual Machines

Bug 861314 - [RHEV-RHS]: split-brains while self-healing the Virtual Machines

Summary: [RHEV-RHS]: split-brains while self-healing the Virtual Machines

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	865324
TreeView+	depends on / blocked

Reported:	2012-09-28 06:55 UTC by spandura
Modified:	2012-11-15 07:21 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-11-15 07:21:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glustershd log. (315.00 KB, text/x-log) 2012-09-28 06:55 UTC, spandura	no flags	Details
View All

Description spandura 2012-09-28 06:55:20 UTC

Created attachment 618437 [details]
glustershd log.

Description of problem:
------------------------
In a pure replicate volume , volume being VM store, found split-brains when self-heal of virtual machines was in progress which subsequently led to VM's becoming non-functional. 

[2012-09-28 12:00:32.327877] E [afr-self-heal-data.c:763:afr_sh_data_fxattrop_fstat_done] 0-replicate-rhevh2-replicate-0: Unable to self-heal contents of '<gfid:c2e13dc5-572d-4ee1-8bb9-ce0e32177a74>' (possible split-brain). Please delete the file from all but the preferred subvolume.
[2012-09-28 12:00:32.330369] E [afr-self-heal-data.c:763:afr_sh_data_fxattrop_fstat_done] 0-replicate-rhevh2-replicate-0: Unable to self-heal contents of '<gfid:2844ea62-845e-42ce-b33f-dcb93edb4700>' (possible split-brain). Please delete the file from all but the preferred subvolume.

[09/28/12 - 12:01:32 root@rhs-client7 ~]# getfattr -d -e hex -m . /replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 
getfattr: Removing leading '/' from absolute path names
# file: replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.replicate-rhevh2-client-0=0x000446170000000000000000
trusted.afr.replicate-rhevh2-client-1=0x000000000000000000000000
trusted.gfid=0xc2e13dc5572d4ee18bb9ce0e32177a74

[09/28/12 - 12:00:36 root@rhs-client6 ~]# getfattr -d -e hex -m . /replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 
getfattr: Removing leading '/' from absolute path names
# file: replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.replicate-rhevh2-client-0=0x000000000000000000000000
trusted.afr.replicate-rhevh2-client-1=0x000001c60000000000000000
trusted.gfid=0xc2e13dc5572d4ee18bb9ce0e32177a74


Version-Release number of selected component (if applicable):
---------------------------------------------------------------
[09/28/12 - 12:02:53 root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch


Steps to Reproduce:
------------------
1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume.
2.Set-up the KVM to use the volume as VM store. 
3.Create 2 virtual machines (vm1 and vm2) 
4.power off server1  (one of the server from each replicate pair)
5.delete the virtual machines vm2.
6.perform the following operations when server1 is down: 
   a. rhn_register vm1, 
   b. yum update on vm1
   c. create new virtual machine vm3
   d. start the virtual machine vm3
   f. rhn_register vm3
   g. create new virtual machine vm4

7. power on server1. 
8. when the server1 is up, perform the following operations:
   a. yum update vm3.
   b. start vm4

Actual results:-
---------------
vm1, vm3 and vm4 moved to paused status from running state. 

Expected results:
----------------
Virtuals machines should run smoothly. 

Additional info:-
---------------
Volume Name: replicate-rhevh2
Type: Replicate
Volume ID: 1e697968-2e90-4589-8225-f596fee8af97
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/replicate-disk
Brick2: rhs-client7.lab.eng.blr.redhat.com:/replicate-disk
Options Reconfigured:
performance.quick-read: disable
performance.io-cache: disable
performance.stat-prefetch: disable
performance.read-ahead: disable
cluster.eager-lock: enable
storage.linux-aio: disable

Comment 2 Pranith Kumar K 2012-10-03 15:45:39 UTC

Shwetha,
   Could you please confirm if the same happens with eager-lock off?. We would like it if we can isolate this problem to eager-locking.

Pranith.

Comment 6 Pranith Kumar K 2012-10-11 07:09:36 UTC

Shwetha,
    Could you attach client, brick logs for this?

Pranith.

Comment 7 spandura 2012-10-18 06:52:35 UTC

With the steps mentioned above to recreate the bug, we are not able to recreate the bug once again. 

Even performing graph changes during the self-heal we are unable to recreate the bug.

Comment 8 Vidya Sakar 2012-10-19 07:33:28 UTC

Since it is not reproducible anymore, removing the blocker flag.

Comment 9 spandura 2012-10-19 09:13:05 UTC

VS, 

yesterday while recreating Bug 866459 and Bug 863333, we were able to recreate this bug too along with the above mentioned bug. Waiting for more information about the root cause from Pranith.

Comment 12 Brian Foster 2012-10-23 19:35:58 UTC

I ran a similar sequence to that described in the report:

- Create a rep 2 volume.
- Create a couple VMs, start one.
- Kill glusterfsd on one server.
- Delete a VM, update the other, create two more new VMs (start one).
- Re-enable glusterfsd.
- With self-heal in progress, update the running VM and start the other.

The self-heal contributed most of the load to the server (I'm also running the VMs on one server, FWIW), yet I didn't reproduce any major problems with the update. It was sluggish at times, but not paused or locked.

The VM that was started with self-heal in progress paused and did not start until the heal completed (presumably on that image file). According to a task state dump, this appears to be the same flush fop behavior described in bug 853680 (i.e., libvirtd is waiting on a fuse flush request).

Comment 13 Brian Foster 2012-10-24 13:43:30 UTC

The blocked VM start appears to be an intentional side effect of self-heal locking. Depending on whether I'm in a debugger or not, I see a pause either in flush or setattr in libvirtd, perhaps simply due to timing. From the locks tracing logs, I see behavior akin to the following on the source side of a self-heal:

- All self-heal ranges == 131072.
- User driven traffic enclosed in [].

... self-heal starts
lock offset 0
lock offset 131072
unlock offset 0
lock offset 262144
unlock offset 131072
[lock offset 0, len 0, BLOCKED]
... self heal continues, completes
[lock offset 0, len 0, GRANTED]
... VM starts

The lock requested due to either the flush or setattr ends up blocked across the entire self-heal, as the self-heal algorithm only unlocks one region of the file after acquiring the lock for the next (sh_loop_lock_success() invokes sh_loop_finish() on the old_loop_frame, which includes an unlock).

Comment 15 Pranith Kumar K 2012-11-15 07:21:24 UTC

We are not able to re-create the issue. Closing as works for me.

Note You need to log in before you can comment on or make changes to this bug.