Created attachment 618437 [details] glustershd log. Description of problem: ------------------------ In a pure replicate volume , volume being VM store, found split-brains when self-heal of virtual machines was in progress which subsequently led to VM's becoming non-functional. [2012-09-28 12:00:32.327877] E [afr-self-heal-data.c:763:afr_sh_data_fxattrop_fstat_done] 0-replicate-rhevh2-replicate-0: Unable to self-heal contents of '<gfid:c2e13dc5-572d-4ee1-8bb9-ce0e32177a74>' (possible split-brain). Please delete the file from all but the preferred subvolume. [2012-09-28 12:00:32.330369] E [afr-self-heal-data.c:763:afr_sh_data_fxattrop_fstat_done] 0-replicate-rhevh2-replicate-0: Unable to self-heal contents of '<gfid:2844ea62-845e-42ce-b33f-dcb93edb4700>' (possible split-brain). Please delete the file from all but the preferred subvolume. [09/28/12 - 12:01:32 root@rhs-client7 ~]# getfattr -d -e hex -m . /replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 getfattr: Removing leading '/' from absolute path names # file: replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.replicate-rhevh2-client-0=0x000446170000000000000000 trusted.afr.replicate-rhevh2-client-1=0x000000000000000000000000 trusted.gfid=0xc2e13dc5572d4ee18bb9ce0e32177a74 [09/28/12 - 12:00:36 root@rhs-client6 ~]# getfattr -d -e hex -m . /replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 getfattr: Removing leading '/' from absolute path names # file: replicate-disk/.glusterfs/c2//e1/c2e13dc5-572d-4ee1-8bb9-ce0e32177a74 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.replicate-rhevh2-client-0=0x000000000000000000000000 trusted.afr.replicate-rhevh2-client-1=0x000001c60000000000000000 trusted.gfid=0xc2e13dc5572d4ee18bb9ce0e32177a74 Version-Release number of selected component (if applicable): --------------------------------------------------------------- [09/28/12 - 12:02:53 root@rhs-client6 ~]# rpm -qa | grep gluster glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch Steps to Reproduce: ------------------ 1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume. 2.Set-up the KVM to use the volume as VM store. 3.Create 2 virtual machines (vm1 and vm2) 4.power off server1 (one of the server from each replicate pair) 5.delete the virtual machines vm2. 6.perform the following operations when server1 is down: a. rhn_register vm1, b. yum update on vm1 c. create new virtual machine vm3 d. start the virtual machine vm3 f. rhn_register vm3 g. create new virtual machine vm4 7. power on server1. 8. when the server1 is up, perform the following operations: a. yum update vm3. b. start vm4 Actual results:- --------------- vm1, vm3 and vm4 moved to paused status from running state. Expected results: ---------------- Virtuals machines should run smoothly. Additional info:- --------------- Volume Name: replicate-rhevh2 Type: Replicate Volume ID: 1e697968-2e90-4589-8225-f596fee8af97 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs-client6.lab.eng.blr.redhat.com:/replicate-disk Brick2: rhs-client7.lab.eng.blr.redhat.com:/replicate-disk Options Reconfigured: performance.quick-read: disable performance.io-cache: disable performance.stat-prefetch: disable performance.read-ahead: disable cluster.eager-lock: enable storage.linux-aio: disable
Shwetha, Could you please confirm if the same happens with eager-lock off?. We would like it if we can isolate this problem to eager-locking. Pranith.
Shwetha, Could you attach client, brick logs for this? Pranith.
With the steps mentioned above to recreate the bug, we are not able to recreate the bug once again. Even performing graph changes during the self-heal we are unable to recreate the bug.
Since it is not reproducible anymore, removing the blocker flag.
VS, yesterday while recreating Bug 866459 and Bug 863333, we were able to recreate this bug too along with the above mentioned bug. Waiting for more information about the root cause from Pranith.
I ran a similar sequence to that described in the report: - Create a rep 2 volume. - Create a couple VMs, start one. - Kill glusterfsd on one server. - Delete a VM, update the other, create two more new VMs (start one). - Re-enable glusterfsd. - With self-heal in progress, update the running VM and start the other. The self-heal contributed most of the load to the server (I'm also running the VMs on one server, FWIW), yet I didn't reproduce any major problems with the update. It was sluggish at times, but not paused or locked. The VM that was started with self-heal in progress paused and did not start until the heal completed (presumably on that image file). According to a task state dump, this appears to be the same flush fop behavior described in bug 853680 (i.e., libvirtd is waiting on a fuse flush request).
The blocked VM start appears to be an intentional side effect of self-heal locking. Depending on whether I'm in a debugger or not, I see a pause either in flush or setattr in libvirtd, perhaps simply due to timing. From the locks tracing logs, I see behavior akin to the following on the source side of a self-heal: - All self-heal ranges == 131072. - User driven traffic enclosed in []. ... self-heal starts lock offset 0 lock offset 131072 unlock offset 0 lock offset 262144 unlock offset 131072 [lock offset 0, len 0, BLOCKED] ... self heal continues, completes [lock offset 0, len 0, GRANTED] ... VM starts The lock requested due to either the flush or setattr ends up blocked across the entire self-heal, as the self-heal algorithm only unlocks one region of the file after acquiring the lock for the next (sh_loop_lock_success() invokes sh_loop_finish() on the old_loop_frame, which includes an unlock).
We are not able to re-create the issue. Closing as works for me.