Bug 1255691 - When vol creation detects a problem, it already assigns the local bricks to the new/unexisting volume
When vol creation detects a problem, it already assigns the local bricks to t...
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-21 06:51 EDT by Chris Blum
Modified: 2016-03-29 05:26 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-29 05:26:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chris Blum 2015-08-21 06:51:38 EDT
Description of problem:
When executing vol create with something that causes a error message - it will already assign the local bricks to the new vol - even though that vol is never created. This causes the following commands to fail, since gluster thinks the bricks are already assigned to a vol.

Version-Release number of selected component (if applicable):
RHGS 3.1 on RHEL7

nagios-plugins-1.4.16-12.el7rhgs.x86_64
vdsm-xmlrpc-4.16.20-1.2.el7rhgs.noarch
nrpe-2.15-4.1.el7rhgs.x86_64
nagios-plugins-ide_smart-1.4.16-12.el7rhgs.x86_64
gluster-nagios-common-0.2.0-2.el7rhgs.noarch
nfs-ganesha-gluster-2.2.0-5.el7rhgs.x86_64
vdsm-yajsonrpc-4.16.20-1.2.el7rhgs.noarch
libtdb-1.3.4-1.el7rhgs.x86_64
vdsm-cli-4.16.20-1.2.el7rhgs.noarch
glusterfs-cli-3.7.1-11.el7rhgs.x86_64
vdsm-reg-4.16.20-1.2.el7rhgs.noarch
libmcrypt-2.5.8-9.3.el7rhgs.x86_64
libtalloc-2.1.1-4.el7rhgs.x86_64
vdsm-python-zombiereaper-4.16.20-1.2.el7rhgs.noarch
glusterfs-client-xlators-3.7.1-11.el7rhgs.x86_64
tdb-tools-1.3.4-1.el7rhgs.x86_64
vdsm-4.16.20-1.2.el7rhgs.x86_64
glusterfs-ganesha-3.7.1-11.el7rhgs.x86_64
userspace-rcu-0.7.9-2.el7rhgs.x86_64
libtevent-0.9.23-1.el7rhgs.x86_64
vdsm-python-4.16.20-1.2.el7rhgs.noarch
gluster-nagios-addons-0.2.4-4.el7rhgs.x86_64
glusterfs-3.7.1-11.el7rhgs.x86_64
glusterfs-fuse-3.7.1-11.el7rhgs.x86_64
glusterfs-geo-replication-3.7.1-11.el7rhgs.x86_64
nsca-client-2.9.1-11.2.el7rhgs.x86_64
glusterfs-rdma-3.7.1-11.el7rhgs.x86_64
ldb-tools-1.1.20-1.el7rhgs.x86_64
nfs-ganesha-2.2.0-5.el7rhgs.x86_64
glusterfs-libs-3.7.1-11.el7rhgs.x86_64
glusterfs-server-3.7.1-11.el7rhgs.x86_64
libldb-1.1.20-1.el7rhgs.x86_64
vdsm-gluster-4.16.20-1.2.el7rhgs.noarch
redhat-storage-server-3.1.0.4-1.el7rhgs.noarch
nfs-ganesha-nullfs-2.2.0-5.el7rhgs.x86_64
redhat-storage-logos-70.0.4-1.el7rhgs.noarch
nagios-plugins-procs-1.4.16-12.el7rhgs.x86_64
glusterfs-api-3.7.1-11.el7rhgs.x86_64
swiftonfile-1.13.1-2.el7rhgs.noarch
vdsm-jsonrpc-4.16.20-1.2.el7rhgs.noarch

How reproducible:
Always

Steps to Reproduce:
1. gluster vol create vol1 disperse-data 4 redundancy 2 transport tcp 192.168.35.{100..102}:/rhs/brick1/vol1 192.168.35.200:/rhs/brick1/vol1 192.168.35.200:/rhs/brick2/vol1 192.168.35.100:/rhs/brick2/vol1
2. gluster vol create vol1 disperse-data 4 redundancy 2 transport tcp 192.168.35.{100..102}:/rhs/brick1/vol1 192.168.35.200:/rhs/brick1/vol1 192.168.35.200:/rhs/brick2/vol1 192.168.35.100:/rhs/brick2/vol1
3.

Actual results:
First execution correctly detects that there are disperse volume bricks on the same server - which should not be the case:

volume create: vol1: failed: Multiple bricks of a disperse volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.

Second execution sees the local bricks as being part of a volume:
volume create: vol1: failed: /rhs/brick1/vol1 is already part of a volume

Expected results:
Bricks shouldn't be allocated to a not existing vol and the second execution should output the same error/warning

Additional info:

Seems like only the local bricks get allocated to the new/unexisting vol. I cannot see any /rhs/brick1/vol1 folders on the other hosts and when I reformat the bricks, I can start over again.

[root@rhgs1 ~]# ls -al /rhs/brick{1..2}
/rhs/brick1:
total 0
drwxr-xr-x. 3 root root 17 Aug 21 12:36 .
drwxr-xr-x. 5 root root 45 Aug 21 12:18 ..
drwxr-xr-x. 2 root root  6 Aug 21 12:36 vol1

/rhs/brick2:
total 0
drwxr-xr-x. 3 root root 17 Aug 21 12:36 .
drwxr-xr-x. 5 root root 45 Aug 21 12:18 ..
drwxr-xr-x. 2 root root  6 Aug 21 12:36 vol1

[root@rhgs1 ~]# ls -al /rhs/brick{1..2}/vol1
/rhs/brick1/vol1:
total 0
drwxr-xr-x. 2 root root  6 Aug 21 12:36 .
drwxr-xr-x. 3 root root 17 Aug 21 12:36 ..

/rhs/brick2/vol1:
total 0
drwxr-xr-x. 2 root root  6 Aug 21 12:36 .
drwxr-xr-x. 3 root root 17 Aug 21 12:36 ..

Host RHGS1 is IP 192.168.35.200, RHGS2 is 192.168.35.100, RHGS3 is 192.168.35.101, RHGS4 is 192.168.35.102
Comment 2 Atin Mukherjee 2016-03-29 05:26:25 EDT
Currently GlusterD doesn't have any rollback mechanism. So if a volume creation request fails post setting the xattrs on the bricks we don't unset it and that's why you can't reuse them until and unless a force option is explicitly provided.

GlusterD 2.0 should be able to deal this and have rollback mechanism in place and you can expect not to see this behaviour once GlusterD 2.0 lands. 

We don't have any plans to bring in rollback mechanism in current glusterd codebase.

Note You need to log in before you can comment on or make changes to this bug.