1391893 – heal statistics for disperse volume not working

Bug 1391893 - heal statistics for disperse volume not working

Summary: heal statistics for disperse volume not working

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-04 10:43 UTC by Nag Pavan Chilakam
Modified:	2020-02-14 18:05 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-11 11:35:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2016-11-04 10:43:50 UTC

When I issue heal statistics on an ec-volume, it displays wrong message saying unable to fetch from bricks which are offline.
However all the bricks are online.

[root@dhcp35-179 glusterfs]# gluster v heal dist-ec  statistics
Gathering crawl statistics on volume dist-ec has been unsuccessful on bricks that are down. Please check if all brick processes are running.

[root@dhcp35-179 glusterfs]# gluster v status
Status of volume: dist-ec
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.179:/rhs/brick1/dist-ec      49152     0          Y       19313
Brick 10.70.35.180:/rhs/brick1/dist-ec      49152     0          Y       7020 
Brick 10.70.35.86:/rhs/brick1/dist-ec       49152     0          Y       31806
Brick 10.70.35.9:/rhs/brick1/dist-ec        49152     0          Y       14347
Brick 10.70.35.153:/rhs/brick1/dist-ec      49152     0          Y       12841
Brick 10.70.35.79:/rhs/brick1/dist-ec       49152     0          Y       3774 
Brick 10.70.35.179:/rhs/brick2/dist-ec      49153     0          Y       19332
Brick 10.70.35.180:/rhs/brick2/dist-ec      49153     0          Y       5787 
Brick 10.70.35.86:/rhs/brick2/dist-ec       49153     0          Y       3770 
Brick 10.70.35.9:/rhs/brick2/dist-ec        49153     0          Y       14366
Brick 10.70.35.153:/rhs/brick2/dist-ec      49153     0          Y       12860
Brick 10.70.35.79:/rhs/brick2/dist-ec       49153     0          Y       3793 
Snapshot Daemon on localhost                49154     0          Y       21888
Self-heal Daemon on localhost               N/A       N/A        Y       22044
Snapshot Daemon on 10.70.35.180             49154     0          Y       15831
Self-heal Daemon on 10.70.35.180            N/A       N/A        Y       15934
Snapshot Daemon on 10.70.35.86              49154     0          Y       14409
Self-heal Daemon on 10.70.35.86             N/A       N/A        Y       14511
Snapshot Daemon on dhcp35-79.lab.eng.blr.re
dhat.com                                    49154     0          Y       12729
Self-heal Daemon on dhcp35-79.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       12833
Snapshot Daemon on 10.70.35.9               49154     0          Y       14254
Self-heal Daemon on 10.70.35.9              N/A       N/A        Y       14386
Snapshot Daemon on 10.70.35.153             49154     0          Y       12738
Self-heal Daemon on 10.70.35.153            N/A       N/A        Y       12881
 
Task Status of Volume dist-ec
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 1b482140-fd66-4d2e-9bae-903c74bcd875
Status               : completed           


[root@dhcp35-179 glusterfs]# rpm -qa|grep gluste
glusterfs-libs-3.8.4-3.el7rhgs.x86_64
glusterfs-cli-3.8.4-3.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-3.el7rhgs.x86_64
glusterfs-server-3.8.4-3.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-fuse-3.8.4-3.el7rhgs.x86_64
glusterfs-api-3.8.4-3.el7rhgs.x86_64
glusterfs-3.8.4-3.el7rhgs.x86_64
[root@dhcp35-179 glusterfs]# r
Volume Name: dist-ec
Type: Distributed-Disperse
Volume ID: 3bcd582c-f0cd-446c-afce-bfa3b0b8e316
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.179:/rhs/brick1/dist-ec
Brick2: 10.70.35.180:/rhs/brick1/dist-ec
Brick3: 10.70.35.86:/rhs/brick1/dist-ec
Brick4: 10.70.35.9:/rhs/brick1/dist-ec
Brick5: 10.70.35.153:/rhs/brick1/dist-ec
Brick6: 10.70.35.79:/rhs/brick1/dist-ec
Brick7: 10.70.35.179:/rhs/brick2/dist-ec
Brick8: 10.70.35.180:/rhs/brick2/dist-ec
Brick9: 10.70.35.86:/rhs/brick2/dist-ec
Brick10: 10.70.35.9:/rhs/brick2/dist-ec
Brick11: 10.70.35.153:/rhs/brick2/dist-ec
Brick12: 10.70.35.79:/rhs/brick2/dist-ec
Options Reconfigured:
features.uss: enable
disperse.shd-max-threads: 3
disperse.heal-wait-qlength: 3
cluster.shd-max-threads: 4
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp35-179 glusterfs]#

Comment 4 Zdenek Styblik 2017-08-24 10:19:09 UTC

The same behaviour can be observed in 3.7.x occasionally. 3 node cluster, replica 3, type replicate.

Comment 7 Zdenek Styblik 2017-08-29 12:53:58 UTC

Just to give a small update. I've encountered similar issue in 2 node cluster during upgrade. Unfortunately, I didn't save output of the commands.

One node was running 3.11.3-1 and the other one was still running 3.7.20-1. Despite all bricks were online and everything was 5-by-5, I couldn't issue % gluster volume heal;. The reason given was as above ``[...] has been unsuccessful on bricks that are down. Please check if all brick processes are running.''. As stated previously, I could see all bricks and PIDs from remote node and everything as you should. Once the whole cluster has been updated to 3.11.3, heal became available.

There is one thing I don't understand about this bug, though. Original reporter reported issue against 3.8.x(based on provided output), yet version set in bug is 3.2. That's why I've originally posted this issue can be seen in 3.7.x as well.

Thanks.

Comment 8 Nag Pavan Chilakam 2017-08-29 13:15:08 UTC

(In reply to Zdenek Styblik from comment #7)
> Just to give a small update. I've encountered similar issue in 2 node
> cluster during upgrade. Unfortunately, I didn't save output of the commands.
> 
> One node was running 3.11.3-1 and the other one was still running 3.7.20-1.
> Despite all bricks were online and everything was 5-by-5, I couldn't issue %
> gluster volume heal;. The reason given was as above ``[...] has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.''. As stated previously, I could see all bricks and PIDs from
> remote node and everything as you should. Once the whole cluster has been
> updated to 3.11.3, heal became available.
> 
> There is one thing I don't understand about this bug, though. Original
> reporter reported issue against 3.8.x(based on provided output), yet version
> set in bug is 3.2. That's why I've originally posted this issue can be seen
> in 3.7.x as well.
> 
> Thanks.

I understand your confusion.
The 3.2 version in this bug is the version of enterprise Gluster(ie Redhat Paid Subscription). This version is based on 3.8.4 codeline(same as 3.8.4 community version). 
Yes, it is possible for this bug to be existing even before 3.8.x as you mentioned.
If a bug is raised on Product "Redhat Gluster Storage" , then it means it was reported while testing "Redhat Paid Subscription" release. In such a case, kindly look into the description to identify the community version(in this case 3.8.4).

Comment 9 Atin Mukherjee 2017-08-29 13:36:13 UTC

(In reply to Zdenek Styblik from comment #7)
> Just to give a small update. I've encountered similar issue in 2 node
> cluster during upgrade. Unfortunately, I didn't save output of the commands.
> 
> One node was running 3.11.3-1 and the other one was still running 3.7.20-1.
> Despite all bricks were online and everything was 5-by-5, I couldn't issue %
> gluster volume heal;. The reason given was as above ``[...] has been
> unsuccessful on bricks that are down. Please check if all brick processes
> are running.''. As stated previously, I could see all bricks and PIDs from
> remote node and everything as you should. Once the whole cluster has been
> updated to 3.11.3, heal became available.
> 
> There is one thing I don't understand about this bug, though. Original
> reporter reported issue against 3.8.x(based on provided output), yet version
> set in bug is 3.2. That's why I've originally posted this issue can be seen
> in 3.7.x as well.

One thing you'd need to keep an eye on while googling out any gluster issues is the product type in the bugzilla. If you happen to see the bug filed against Red Hat Gluster Storage, please consider it to be a Red Hat Product and not the community version. GlusterFS as the product type are the bugs reported by community users.

> 
> Thanks.

Comment 11 Zdenek Styblik 2017-09-05 10:38:01 UTC

The same issue is happening with 3.11.3.

Note You need to log in before you can comment on or make changes to this bug.