1299432 – Glusterd: Creation of volume is failing if one of the brick is down on the server

Bug 1299432 - Glusterd: Creation of volume is failing if one of the brick is down on the server

Summary: Glusterd: Creation of volume is failing if one of the brick is down on the se...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1299184 1299710 1312878
TreeView+	depends on / blocked

Reported:	2016-01-18 11:22 UTC by RajeshReddy
Modified:	2016-09-17 16:44 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1299710 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:02:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description RajeshReddy 2016-01-18 11:22:40 UTC

Description of problem:
===============
Glusterd: Creation of volume is failing if one of the brick is down on the server 

Version-Release number of selected component (if applicable):
=========


How reproducible:


Steps to Reproduce:
============
1. Make sure one of the brick is down due to XFS crash 
2. Create new volume with other existing bricks but creation of volume is failing with 

volume create:test_123: failed: Staging failed on transformers.lab.eng.blr.redhat.com. Error: Brick: transformers:/rhs/brick7/dv2-3_rajesh_22 not available. Brick may be containing or be contained by an existing brick
3.

Actual results:


Expected results:


Additional info:
==============


Breakpoint 1, glusterd_is_brickpath_available (uuid=0x7f2b4000d8c0 "Z\323\066^n\020J\b\202\273\361\346\tQ\344\026", path=0x7f2b40009fb0 "/rhs/brick11/test123") at glusterd-utils.c:1166
1166    {
(gdb) n
1171            char                    tmp_path[PATH_MAX+1] = {0};
(gdb)
1166    {
(gdb)
1171            char                    tmp_path[PATH_MAX+1] = {0};
(gdb)
1172            char                    tmp_brickpath[PATH_MAX+1] = {0};
(gdb)
1176            strncpy (tmp_path, path, PATH_MAX);
(gdb)
1171            char                    tmp_path[PATH_MAX+1] = {0};
(gdb)
1172            char                    tmp_brickpath[PATH_MAX+1] = {0};
(gdb)
1174            priv = THIS->private;
(gdb)
1176            strncpy (tmp_path, path, PATH_MAX);
(gdb)
1174            priv = THIS->private;
(gdb)
1176            strncpy (tmp_path, path, PATH_MAX);
(gdb)
1178            if (!realpath (path, tmp_path)) {
(gdb)
1179                    if (errno != ENOENT) {
(gdb)
1183                    strncpy(tmp_path,path,PATH_MAX);
(gdb)
1186            cds_list_for_each_entry (volinfo, &priv->volumes, vol_list) {
(gdb) p tmp_path
$1 = "/rhs/brick11/test123", '\000' <repeats 4076 times>
(gdb) n
1200                            if (_is_prefix (tmp_brickpath, tmp_path))
(gdb)
1186            cds_list_for_each_entry (volinfo, &priv->volumes, vol_list) {
(gdb)
1187                    cds_list_for_each_entry (brickinfo, &volinfo->bricks,
(gdb)
1189                            if (gf_uuid_compare (uuid, brickinfo->uuid))
(gdb)
1189                            if (gf_uuid_compare (uuid, brickinfo->uuid))
(gdb) p brickinfo
$2 = (glusterd_brickinfo_t *) 0x7f2b6d5d1120
(gdb) p brickinfo.hostname
$3 = "transformers.lab.eng.blr.redhat.com", '\000' <repeats 988 times>
(gdb) p brickinfo.path
$4 = "/rhs/brick1/afr1x2_attach_hot", '\000' <repeats 4066 times>
(gdb) n
1192                            if (!realpath (brickinfo->path, tmp_brickpath)) {
(gdb) n
1193                                if (errno == ENOENT)
(gdb) p errno
$5 = 5
(gdb) n
1170            gf_boolean_t            available  = _gf_false;
(gdb)
1207    }
(gdb) p available
$6 = _gf_false
(gdb) 

Filesystem                                              Size  Used Avail Use% Mounted on
/dev/mapper/rhel_transformers-root                       50G   20G   31G  39% /
devtmpfs                                                 32G     0   32G   0% /dev
tmpfs                                                    32G   36M   32G   1% /dev/shm
tmpfs                                                    32G  3.4G   28G  11% /run
tmpfs                                                    32G     0   32G   0% /sys/fs/cgroup
/dev/sda1                                               494M  159M  336M  33% /boot
/dev/mapper/rhel_transformers-home                      477G   13G  464G   3% /home
tmpfs                                                   6.3G     0  6.3G   0% /run/user/0
/dev/mapper/RHS_vg1-RHS_lv1                             1.9T  323G  1.5T  18% /rhs/brick1
/dev/mapper/RHS_vg2-RHS_lv2                             1.9T   57G  1.8T   4% /rhs/brick2
/dev/mapper/RHS_vg3-RHS_lv3                             1.9T   57G  1.8T   4% /rhs/brick3
/dev/mapper/RHS_vg4-RHS_lv4                             1.9T   57G  1.8T   4% /rhs/brick4
/dev/mapper/RHS_vg5-RHS_lv5                             1.9T   57G  1.8T   4% /rhs/brick5
/dev/mapper/RHS_vg6-RHS_lv6                             1.9T   57G  1.8T   4% /rhs/brick6
/dev/mapper/RHS_vg7-RHS_lv7                             1.9T  1.4G  1.9T   1% /rhs/brick7
/dev/mapper/RHS_vg8-RHS_lv8                             1.9T  1.4G  1.9T   1% /rhs/brick8
/dev/mapper/RHS_vg9-RHS_lv9                             1.9T  1.4G  1.9T   1% /rhs/brick9
/dev/mapper/RHS_vg10-RHS_lv10                           1.9T  4.2G  1.8T   1% /rhs/brick10
/dev/mapper/RHS_vg11-RHS_lv11                           1.9T  4.2G  1.8T   1% /rhs/brick11
/dev/mapper/RHS_vg12-RHS_lv12                           1.9T  4.2G  1.8T   1% /rhs/brick12
ninja.lab.eng.blr.redhat.com:afr2x2_tier                1.9T  567G  1.3T  31% /mnt/glusterfs
ninja.lab.eng.blr.redhat.com:/afr2x2_tier_new           1.9T  567G  1.3T  31% /mnt/glusterfs2
ninja.lab.eng.blr.redhat.com:/disperse_vol2             4.2T  3.6T  596G  86% /mnt/glusterfs_EC
ninja.lab.eng.blr.redhat.com:/disperse_vol2             4.2T  3.6T  596G  86% /mnt/glusterfs_EC_NO
ninja.lab.eng.blr.redhat.com:/afr2x2_tier_new           1.9T  567G  1.3T  31% /mnt/glusterfs2_new
ninja.lab.eng.blr.redhat.com:/afr2x2_tier_new           1.9T  567G  1.3T  31% /mnt/glusterfs2_new2
ninja.lab.eng.blr.redhat.com:/afr2x2_tier_mod           1.9T  564G  1.3T  31% /mnt/glusterfs2_mod
ninja.lab.eng.blr.redhat.com:afr2x2_tier_new            1.9T  567G  1.3T  31% /mnt/afr2x2_tier_new

Comment 2 Gaurav Kumar Garg 2016-01-18 13:15:37 UTC

Hi Rajesh,

Can you make sure that the brick is already used by another volume means .glusterfs directory is not there while creating new volume.?

Comment 3 Atin Mukherjee 2016-01-18 14:13:50 UTC

As Gaurav mentioned in #c2 iIt seems like you have tried to reuse a brick which is or was earlier used for other gluster volume, that's exactly the error message says. I strongly believe this is not a bug. Please confirm.

Comment 5 Atin Mukherjee 2016-01-19 04:43:18 UTC

After going through the code, it looks like a bug. If realpath () call fails with an EIO (which indicates the underlying file system of existing bricks may have some problem) then we return the path is not available instead of skipping the same brick path

Comment 6 Atin Mukherjee 2016-01-19 05:38:59 UTC

Upstream patch http://review.gluster.org/13258 is posted for review

Comment 7 RajeshReddy 2016-01-19 05:45:47 UTC

Development team is able to re-create the problem

Comment 9 Atin Mukherjee 2016-03-22 12:06:00 UTC

The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 11 Byreddy 2016-04-11 04:58:39 UTC

Verified this bug using the build "glusterfs-3.7.9-1"

Steps followed:
===============
1. Created 1*2 volume using one node cluster and started it.
2. crashed underlying xfs for one of volume brick using "godown" tool
3. created the new volume using bricks not part of volume created in step-1, able to create new volume successfully.


With this Fix, reported issue is working fine.

Moving to verified state.


Note: Issues found around this fix will be tracked in different bugs.

Comment 13 errata-xmlrpc 2016-06-23 05:02:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.