Bug 865672

Summary: [RHEV-RHS] Can't pause VMs while remove-brick is running and even after commit
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: glusterfsAssignee: shishir gowda <sgowda>
Status: CLOSED WONTFIX QA Contact: shylesh <shmohan>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: amarts, asriram, grajaiya, nsathyan, rhs-bugs, sgowda, shaines, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: Changing a volume with just one brick to multiple bricks (using add-brick) is not supported. Consequence: After add-brick and rebalance, we will have volume not function properly (ie, many operations on the volume fails). Workaround (if any): Start with volume with 2 bricks at the minimum. Result: If the volume has at least 2 bricks to start with, we will never have this issue.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-18 06:16:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mnt, vdsm, rebalance, brick logs none

Description shylesh 2012-10-12 05:44:58 UTC
Created attachment 625803 [details]
mnt, vdsm, rebalance, brick logs

Description of problem:
Started remove-brick on a pure distribute volume , while it's running can't pause VMs and even after remove-brick commit can't pause Vms

Version-Release number of selected component (if applicable):

[root@rhs-gp-srv4 ~]# rpm -qa | grep gluster
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-devel-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch

How reproducible:


Steps to Reproduce:
1. created a distribute volume with one brick 
2. created VMs on this volume
3. kept on add-brick and rebalancing, upto 4 bricks
4. tried remove-brick start on one of the brick
5. while remove-brick is running tried to pause the VM
  
Actual results:

Pausing the VM failed with the error "Error while executing action: Cannot hibernate VM. Low disk space on relevant Storage Domain."

after remove-brick is completed i tried 
1. Restart the volume
2. Rebalance 
3. remove-brick commit

but still the problem persists.
 

Additional info:

Volume Name: vmstore
Type: Distribute
Volume ID: 91aa3e01-6330-44b7-acf1-9e5a20570cc8
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/another
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/another
Brick3: rhs-gp-srv15.lab.eng.blr.redhat.com:/another
Options Reconfigured:
cluster.eager-lock: enable
storage.linux-aio: off
performance.read-ahead: disable
performance.stat-prefetch: disable
performance.io-cache: disable
performance.quick-read: disable
performance.write-behind: enable

The brick which was actually removed is 
"rhs-gp-srv12.lab.eng.blr.redhat.com:/another"

Attached the logs

Comment 3 shylesh 2012-11-19 10:43:15 UTC
I tried this and able to reproduce , this is because of the bug 
https://bugzilla.redhat.com/show_bug.cgi?id=875076



[2012-11-19 15:36:28.633688] I [fuse-bridge.c:4222:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-11-19 15:36:28.634269] I [client-handshake.c:453:client_set_lk_version_cbk] 0-vmstore-client-2: Server lk version = 1
[2012-11-19 15:36:28.634431] I [fuse-bridge.c:3405:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-11-19 15:36:28.635474] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /. holes=0 overlaps=2
[2012-11-19 15:36:28.648024] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176. holes=1 overlaps=1
[2012-11-19 15:36:28.649690] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/dom_md. holes=1 overlaps=2
[2012-11-19 15:36:28.665534] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images. holes=1 overlaps=2
[2012-11-19 15:36:32.386537] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/master. holes=1 overlaps=2
[2012-11-19 15:36:32.388456] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/master/vms. holes=1 overlaps=1
[2012-11-19 15:36:32.390367] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/master/tasks. holes=1 overlaps=2
[2012-11-19 15:36:50.603522] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/70fd5fac-801d-47d9-8616-5d8ddb24fe72. holes=1 overlaps=2
[2012-11-19 15:37:01.891797] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/7562ef05-154a-4810-80ac-c4a26e21131d. holes=0 overlaps=3
[2012-11-19 15:37:05.748705] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/765bcf39-49b2-426c-b730-6711ab09cf1a. holes=1 overlaps=2
[2012-11-19 15:37:11.262812] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/e1374745-babc-4aa6-bfb9-521194baaa92. holes=1 overlaps=2
[2012-11-19 15:37:18.855592] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/c1b734dc-a146-422c-b418-5e6c40719476. holes=0 overlaps=2
[2012-11-19 15:37:27.908198] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/507048f5-0ca7-445a-a06a-15d281c7fe2e. holes=0 overlaps=2
[2012-11-19 15:37:30.363179] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/f167dc29-bfd8-4763-9fb9-654069c4a2f9. holes=1 overlaps=2
[2012-11-19 15:37:36.364333] I [dht-layout.c:593:dht_layout_normalize] 0-vmstore-dht: found anomalies in /13a3d358-65bd-4d03-bfcf-e6bcb6c8a176/images/8d8a07f3-2f6a-42a5-a658-7e873b457ce5. holes=1 overlaps=2

Comment 4 shishir gowda 2012-11-21 03:33:57 UTC
In release 3.3 a single brick is not a distribute volume. A fix to include distribute as a default xlator was merged in upstream with the bug 815227.
That might explain the failures seen here.

Comment 5 Amar Tumballi 2012-11-26 14:17:52 UTC
as per commit #4, reducing the priority of the bug. Also marked bug for known issues, which would take care of the issue.

Comment 6 Vijay Bellur 2012-12-11 06:32:31 UTC
Anjana: Can you please document this known issue?

Comment 7 Amar Tumballi 2012-12-18 06:16:24 UTC
Documented as Known issue for 2.0plus Beta, not a valid bug on RHS-2.1.0 stream. Closing as WontFix for 2.0.z