1406569 – Element missing for arbiter bricks in XML volume status details output

Bug 1406569 - Element missing for arbiter bricks in XML volume status details output

Summary: Element missing for arbiter bricks in XML volume status details output

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	cli
Sub Component:
Version:	3.7.17
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1410283
TreeView+	depends on / blocked

Reported:	2016-12-20 23:36 UTC by Giuseppe Ragusa
Modified:	2017-03-08 11:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 11:00:33 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Giuseppe Ragusa 2016-12-20 23:36:57 UTC

Description of problem:

Output of 'gluster volume status <vol-name> details --xml' should have a <device> element for all the bricks in the volume, but it is missing for the arbiter bricks.

Version-Release number of selected component (if applicable):

3.7.17-1

How reproducible:

Always.


Steps to Reproduce:
1. Create a replica 3 with arbiter volume named myvol
2. Try to get XML volume status details output from commandline with: "gluster volume status myvol details --xml"

Actual results:

The output does not contain the <device> element for the arbiter bricks.

Expected results:

The output does contain the <device> element for all the bricks in the volume, including the arbiter bricks.

Additional info:

This output format error causes VDSM to fail with a Python stack trace.
Bug discussed and diagnosed (by Ramesh Nachimuthu) on oVirt and GlusterFS users' mailing list, see:

http://www.gluster.org/pipermail/gluster-users/2016-December/029485.html

Comment 1 Ravishankar N 2017-01-04 10:44:38 UTC

Trial run on my setup:

0:root@vm3 ~$ gluster --version
glusterfs 3.7.17 built on Jan  4 2017 12:39:59
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
0:root@vm3 ~$ 

============================================
root@vm3 ~$ gluster v create testvol replica 3 arbiter 1 127.0.0.2:/bricks/brick{1..3} force

root@vm3 ~$ gluster v start testvol

130:root@vm3 ~$ gluster v status testvol detail --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>0</opErrno>
  <opErrstr/>
  <volStatus>
    <volumes>
      <volume>
        <volName>testvol</volName>
        <nodeCount>3</nodeCount>
        <node>
          <hostname>127.0.0.2</hostname>
          <path>/bricks/brick1</path>
          <peerid>78097816-11e9-4445-ae9a-dd1c5a8e89e3</peerid>
          <status>1</status>
          <port>49153</port>
          <ports>
            <tcp>49153</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>13736</pid>
          <sizeTotal>4283432960</sizeTotal>
          <sizeFree>4249432064</sizeFree>
          <device>/dev/vdb1</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>127.0.0.2</hostname>
          <path>/bricks/brick2</path>
          <peerid>78097816-11e9-4445-ae9a-dd1c5a8e89e3</peerid>
          <status>1</status>
          <port>49155</port>
          <ports>
            <tcp>49155</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>13755</pid>
          <sizeTotal>4283432960</sizeTotal>
          <sizeFree>4249432064</sizeFree>
          <device>/dev/vdb1</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>127.0.0.2</hostname>
          <path>/bricks/brick3</path>
          <peerid>78097816-11e9-4445-ae9a-dd1c5a8e89e3</peerid>
          <status>1</status>
          <port>49156</port>
          <ports>
            <tcp>49156</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>13774</pid>
          <sizeTotal>4283432960</sizeTotal>
          <sizeFree>4249432064</sizeFree>
          <device>/dev/vdb1</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
      </volume>
    </volumes>
  </volStatus>
</cliOutput>

Comment 2 Ravishankar N 2017-01-04 10:47:35 UTC

Hi Giuseppe, I am not able to recreate this issue on 3.7.17 or later. Are you sure that there isn't something that is wrong with your setup? Can you check if the glusterd logs on the node where you don't get the device id  throws any error when you run the command?

Comment 3 Giuseppe Ragusa 2017-01-05 12:44:46 UTC

Hi,
I checked /var/log/glusterfs/etc-glusterfs-glusterd.vol.log and you're right:

[2017-01-05 12:21:52.343032] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : No brick details in volume home

while getting this:

[root@shockley tmp]# gluster volume status home detail --xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cliOutput>
  <opRet>0</opRet>
  <opErrno>0</opErrno>
  <opErrstr/>
  <volStatus>
    <volumes>
      <volume>
        <volName>home</volName>
        <nodeCount>9</nodeCount>
        <node>
          <hostname>read.gluster.private</hostname>
          <path>/srv/glusterfs/disk0/home_brick</path>
          <peerid>f1be76be-dec9-46be-98cb-a89c65aebde9</peerid>
          <status>1</status>
          <port>49152</port>
          <ports>
            <tcp>49152</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>2773</pid>
          <sizeTotal>3767015563264</sizeTotal>
          <sizeFree>3045302603776</sizeFree>
          <device>/dev/sdb4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>hall.gluster.private</hostname>
          <path>/srv/glusterfs/disk0/home_brick</path>
          <peerid>e391505d-372f-4148-9d3f-7dbdb8ad0366</peerid>
          <status>1</status>
          <port>49152</port>
          <ports>
            <tcp>49152</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>5052</pid>
          <sizeTotal>3767015563264</sizeTotal>
          <sizeFree>3045302202368</sizeFree>
          <device>/dev/sdb4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>shockley.gluster.private</hostname>
          <path>/srv/glusterfs/disk0/home_arbiter_brick</path>
          <peerid>3075fdea-4bb6-4fad-94b3-b09b13d7d6a7</peerid>
          <status>1</status>
          <port>49152</port>
          <ports>
            <tcp>49152</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>10557</pid>
          <sizeTotal>1767605006336</sizeTotal>
          <sizeFree>1764696260608</sizeFree>
          <device>/dev/sda4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>read.gluster.private</hostname>
          <path>/srv/glusterfs/disk1/home_brick</path>
          <peerid>f1be76be-dec9-46be-98cb-a89c65aebde9</peerid>
          <status>1</status>
          <port>49153</port>
          <ports>
            <tcp>49153</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>2763</pid>
          <sizeTotal>3767015563264</sizeTotal>
          <sizeFree>3059324719104</sizeFree>
          <device>/dev/sdc4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>hall.gluster.private</hostname>
          <path>/srv/glusterfs/disk1/home_brick</path>
          <peerid>e391505d-372f-4148-9d3f-7dbdb8ad0366</peerid>
          <status>1</status>
          <port>49153</port>
          <ports>
            <tcp>49153</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>5058</pid>
          <sizeTotal>3767015563264</sizeTotal>
          <sizeFree>3059324432384</sizeFree>
          <device>/dev/sdc4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>shockley.gluster.private</hostname>
          <path>/srv/glusterfs/disk1/home_arbiter_brick</path>
          <peerid>3075fdea-4bb6-4fad-94b3-b09b13d7d6a7</peerid>
          <status>1</status>
          <port>49153</port>
          <ports>
            <tcp>49153</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>10568</pid>
          <sizeTotal>1767605006336</sizeTotal>
          <sizeFree>1766170935296</sizeFree>
          <device>/dev/sdb4</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>read.gluster.private</hostname>
          <path>/srv/glusterfs/disk2/home_brick</path>
          <peerid>f1be76be-dec9-46be-98cb-a89c65aebde9</peerid>
          <status>1</status>
          <port>49171</port>
          <ports>
            <tcp>49171</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>2768</pid>
          <sizeTotal>3998831407104</sizeTotal>
          <sizeFree>3233375506432</sizeFree>
          <device>/dev/sda2</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>hall.gluster.private</hostname>
          <path>/srv/glusterfs/disk2/home_brick</path>
          <peerid>e391505d-372f-4148-9d3f-7dbdb8ad0366</peerid>
          <status>1</status>
          <port>49171</port>
          <ports>
            <tcp>49171</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>5064</pid>
          <sizeTotal>3998831407104</sizeTotal>
          <sizeFree>3233376612352</sizeFree>
          <device>/dev/sda2</device>
          <blockSize>4096</blockSize>
          <mntOptions>rw,seclabel,relatime,attr2,inode64,noquota</mntOptions>
          <fsName>xfs</fsName>
        </node>
        <node>
          <hostname>shockley.gluster.private</hostname>
          <path>/srv/glusterfs/disk2/home_arbiter_brick</path>
          <peerid>3075fdea-4bb6-4fad-94b3-b09b13d7d6a7</peerid>
          <status>1</status>
          <port>49171</port>
          <ports>
            <tcp>49171</tcp>
            <rdma>N/A</rdma>
          </ports>
          <pid>10579</pid>
          <sizeTotal>1767605006336</sizeTotal>
          <sizeFree>1764696260608</sizeFree>
          <blockSize>4096</blockSize>
        </node>
      </volume>
    </volumes>
  </volStatus>
</cliOutput>


Now that I think about it, I suspect that there is one trick in my setup that is not interpreted correctly: let me explain (it is a bit involved, sorry ;-) )

Since it is well known that the arbiter does not need to have the same disk space available as the "full replica" nodes, I used smaller disks on the arbiter node (I have all arbiter bricks confined to the same node, so I call this node the "arbiter node") and then when I needed more storage I kept adding disks to the "full replica" nodes but not to the arbiter one (since it should be more than enough for arbiter bricks already).

Note also that I use a separate mount point /srv/glusterfs/diskN for each new disk that I add, and then I create individual brick subdirs for my GlusterFS volumes on each disk /srv/glusterfs/diskN/volname_brick or /srv/glusterfs/diskN/volname_arbiter_brick.

Now all is well, but I like to have coherent output too on all nodes (arbiter or full replica), so I used a higher-level symlink on the arbiter to fake the presence of the further disk and here it is the actual XFS filesystem view of my arbiter bricks:

[root@shockley tmp]# tree -L 3 /srv/
/srv/
├── ctdb
│   └── lockfile
└── glusterfs
    ├── disk0
    │   ├── ctdb_arbiter_brick
    │   ├── disk2
    │   ├── enginedomain_arbiter_brick
    │   ├── exportdomain_arbiter_brick
    │   ├── home_arbiter_brick
    │   ├── isodomain_arbiter_brick
    │   ├── share_arbiter_brick
    │   ├── software_arbiter_brick
    │   ├── src_arbiter_brick
    │   ├── tmp_arbiter_brick
    │   └── vmdomain_arbiter_brick
    ├── disk1
    │   ├── ctdb_arbiter_brick
    │   ├── enginedomain_arbiter_brick
    │   ├── exportdomain_arbiter_brick
    │   ├── home_arbiter_brick
    │   ├── isodomain_arbiter_brick
    │   ├── share_arbiter_brick
    │   ├── software_arbiter_brick
    │   ├── src_arbiter_brick
    │   ├── tmp_arbiter_brick
    │   └── vmdomain_arbiter_brick
    └── disk2 -> disk0/disk2

26 directories, 1 file
[root@shockley tmp]# tree -L 1 /srv/glusterfs/disk0/disk2
/srv/glusterfs/disk0/disk2
├── enginedomain_arbiter_brick
├── exportdomain_arbiter_brick
├── home_arbiter_brick
├── share_arbiter_brick
├── software_arbiter_brick
├── src_arbiter_brick
└── vmdomain_arbiter_brick

7 directories, 0 files

At this point you can guess that I added the bricks by specifing the "fake" (symlinked) path for the sake of uniformity.

I thought that this would not cause issues, but it seems I was wrong :-(

Is this kind of setup really unsupported?

Comment 4 Ravishankar N 2017-01-06 06:08:08 UTC

It is generally advisable to use separate paths for bricks but I don't think symlink is the issue here. FWIW, I tried to create a similar setup like yours and volume status printed everything:
-------------------------
0:root@vm1 bricks$ gluster v info
Volume Name: testvol
Type: Replicate
Volume ID: 822fdbac-6c7d-460c-8d89-3e6222c0ec11
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 127.0.0.2:/bricks/disk0/brick1
Brick2: 127.0.0.2:/bricks/disk1/brick2
Brick3: 127.0.0.2:/bricks/disk2/brick3 (arbiter)


0:root@vm1 bricks$ ll /bricks/
total 0
drwxr-xr-x. 4 root root 33 Jan  6 11:26 disk0
drwxr-xr-x. 3 root root 20 Jan  6 11:26 disk1
lrwxrwxrwx. 1 root root 11 Jan  6 11:23 disk2 -> disk0/disk2

0:root@vm1 bricks$ tree /bricks/
/bricks/
├── disk0
│   ├── brick1
│   └── disk2
│       └── brick3
├── disk1
│   └── brick2
└── disk2 -> disk0/disk2

7 directories, 0 files

-------------------------

Changing the BZ component to glusterd for the glusterd folks to look at the error message on comment #3.

Comment 5 Gaurav Yadav 2017-01-10 07:07:35 UTC

Hi,

We tried to recreate the issue once again and was not able to recreate the issue.

We checked /var/log/glusterfs/etc-glusterfs-glusterd.vol.log and look in to the code as well for logs : "Staging of operation 'Volume Status' failed on localhost : No brick details in volume home".

Here is my observation...
Above mentioned logs will be generated when we issue a wrong CLI command like...
[root@dhcp42-174 ~]# gluster v status testvol details
log generated in log file...

2017-01-10 06:47:37.473855] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : No brick details in volume testvol

One thing I want you to check, whether above logs are relevant with the issue you have raised because as per my debugging we found that whenever we issue wrong command, such logs will generate.

Comment 6 Giuseppe Ragusa 2017-01-10 21:36:16 UTC

Hi,

I double checked and it seems that my history actually contains a gluster command with the same exact syntax error ("details" vs "detail") that you hinted to, maybe issued while investigating the problem: I mis-attributed the error message to the correct command - I apologize for the confusion generated.

This time I checked better by issuing a:

date; gluster volume status home detail --xml; echo $? ; date

and actually the only log in that time interval is:

[2017-01-10 21:19:29.085296] I [MSGID: 106499] [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: Received status volume req for volume home

while the command still produces no device XML attribute for the arbiter brick on the third distributed component of the replicated-distributed volume (and the command still exits with status 0).

Now: what kind of tests/logs should I produce to fully trace the XML output anomaly?

Please consider that this setup is in production phase now (so all tests must not disrupt current activity, but maintenance windows can be arranged for).

Comment 7 Giuseppe Ragusa 2017-02-17 14:36:32 UTC

Changed back component to cli since it seems that glusterd/arbiter is not the culprit here.
I'm still willing to investigate the issue further with any log / debug actions.

Many thanks.

Comment 8 Kaushal 2017-03-08 11:00:33 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.