Bug 1517077

Summary:

[RFE] Grafana dashboard not showing all the volume in UP mode when brick path has "short names"

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Karan Sandha <ksandha>

Component:

web-admin-tendrl-monitoring-integration

Assignee:

Nishanth Thomas <nthomas>

Status:

CLOSED ERRATA

QA Contact:

Daniel Horák <dahorak>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.3

CC:

amukherj, dahorak, mbukatov, negupta, nthomas, rallan, rcyriac, rhinduja, rhs-bugs, sanandpa, sankarshan, sarora

Target Milestone:

---

Keywords:

FutureFeature

Target Release:

RHGS 3.4.0

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

tendrl-commons-1.6.3-2.el7rhgs tendrl-monitoring-integration-1.6.3-1.el7rhgs

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-04 06:58:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1503132

Attachments:

Description	Flags
FIle1	none

Description Karan Sandha 2017-11-24 07:10:30 UTC

Created attachment 1358503 [details]
FIle1

Description of problem:
Created the volume using hostname resolvable by hosts
[root@gqas001 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.96.140 gqas001.sbu.lab.eng.bos.redhat.com gqas001
192.168.96.141 gqas004.sbu.lab.eng.bos.redhat.com gqas004
192.168.96.142 gqas010.sbu.lab.eng.bos.redhat.com gqas010
192.168.96.143 gqas012.sbu.lab.eng.bos.redhat.com gqas012
192.168.96.144 gqac006.sbu.lab.eng.bos.redhat.com gqac006
192.168.96.145 gqac025.sbu.lab.eng.bos.redhat.com gqac025
192.168.96.146 gqac026.sbu.lab.eng.bos.redhat.com gqac026
192.168.96.147 gqac027.sbu.lab.eng.bos.redhat.com gqac027


Version-Release number of selected component (if applicable):
3.8.4-52 

How reproducible:
100%

Steps to Reproduce:
1. created inital volume using hostname i.e gqas001.sbu.lab.eng.bos.redhat.com whihc was active and green in the dashboard
2. Created an arbiter volume using i.e gqas001  
3. after waiting for half hour the arbiter voluem created using gqas001 is not showing up.

Actual results:
 should be up within minutes
 arbiter volume was down for the whole time.
Expected results:
 

Additional info:
[root@gqas001 ~]# gluster v info
 
Volume Name: arbiter
Type: Replicate
Volume ID: 83b4cb59-c903-44c4-abdb-a2c6294c24e4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gqas012:/bricks/arbiter_1
Brick2: gqas010:/bricks/arbiter_1
Brick3: gqas004:/bricks/arbiter_1 (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
 
Volume Name: arbiter-2
Type: Replicate
Volume ID: aadda253-b450-403c-8115-e8d8f41d0feb
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gqas012.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_21
Brick2: gqas010.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_2
Brick3: gqas004.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_2 (arbiter)
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 14cec5af-14b3-407a-b849-b0e68e979ea1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: on
transport.address-family: inet
nfs.disable: on

Comment 2 Nishanth Thomas 2017-11-28 05:32:42 UTC

There is no fix required to be done in the code. This is basically an issue with sync intervals. So this issue won't be seen with latest builds with reduced default sync intervals, if run on the recommended hw configuration.

Comment 6 Sweta Anandpara 2017-11-29 11:12:56 UTC

Restoring the qa ack.

Comment 11 Martin Bukatovic 2018-03-06 11:48:00 UTC

Could you review this BZ, which we discussed on "RHGS WA with RHS One testing" meeting on 2018-03-05 and add here a comment with severity from RHSOne perspective? Thanks a lot for the feedback.

Comment 14 Martin Bukatovic 2018-04-05 10:00:56 UTC

Could you recheck status of this BZ and add FiV if possible to move it to the ON
QE state?

Comment 17 Daniel Horák 2018-04-27 14:05:56 UTC

I've tested it with multiple various scenarios (various volumes configurations,
usage of FQDN, short hostname or IPs) and in all cases Tendrl "Volumes" page
and various Gluster Dashboards properly shows all the volumes in "Up" state.

>> VERIFIED
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Version-Release number of selected components:
  Tendrl Server
  # cat /etc/redhat-release 
    Red Hat Enterprise Linux Server release 7.5 (Maipo)
  # rpm -qa | grep -e tendrl -e collectd -e gluster -e etcd | sort
    collectd-5.7.2-3.1.el7rhgs.x86_64
    collectd-ping-5.7.2-3.1.el7rhgs.x86_64
    etcd-3.2.7-1.el7.x86_64
    libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
    python-etcd-0.4.5-2.el7rhgs.noarch
    rubygem-etcd-0.3.0-2.el7rhgs.noarch
    tendrl-ansible-1.6.3-3.el7rhgs.noarch
    tendrl-api-1.6.3-2.el7rhgs.noarch
    tendrl-api-httpd-1.6.3-2.el7rhgs.noarch
    tendrl-commons-1.6.3-3.el7rhgs.noarch
    tendrl-grafana-plugins-1.6.3-1.el7rhgs.noarch
    tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
    tendrl-monitoring-integration-1.6.3-1.el7rhgs.noarch
    tendrl-node-agent-1.6.3-3.el7rhgs.noarch
    tendrl-notifier-1.6.3-2.el7rhgs.noarch
    tendrl-selinux-1.5.4-2.el7rhgs.noarch
    tendrl-ui-1.6.3-1.el7rhgs.noarch
  
  Gluster Storage Server
  # cat /etc/redhat-release 
    Red Hat Enterprise Linux Server release 7.5 (Maipo)
  # cat /etc/redhat-storage-release 
    Red Hat Gluster Storage Server 3.4.0
  # rpm -qa | grep -e tendrl -e collectd -e gluster | sort
    collectd-5.7.2-3.1.el7rhgs.x86_64
    collectd-ping-5.7.2-3.1.el7rhgs.x86_64
    glusterfs-3.12.2-8.el7rhgs.x86_64
    glusterfs-api-3.12.2-8.el7rhgs.x86_64
    glusterfs-cli-3.12.2-8.el7rhgs.x86_64
    glusterfs-client-xlators-3.12.2-8.el7rhgs.x86_64
    glusterfs-events-3.12.2-8.el7rhgs.x86_64
    glusterfs-fuse-3.12.2-8.el7rhgs.x86_64
    glusterfs-geo-replication-3.12.2-8.el7rhgs.x86_64
    glusterfs-libs-3.12.2-8.el7rhgs.x86_64
    glusterfs-rdma-3.12.2-8.el7rhgs.x86_64
    glusterfs-server-3.12.2-8.el7rhgs.x86_64
    gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
    gluster-nagios-common-0.2.4-1.el7rhgs.noarch
    libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
    libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.2.x86_64
    python2-gluster-3.12.2-8.el7rhgs.x86_64
    tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
    tendrl-commons-1.6.3-3.el7rhgs.noarch
    tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch
    tendrl-node-agent-1.6.3-3.el7rhgs.noarch
    tendrl-selinux-1.5.4-2.el7rhgs.noarch
    vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

Comment 20 errata-xmlrpc 2018-09-04 06:58:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Comment 21 Nishanth Thomas 2018-10-16 11:01:45 UTC

*** Bug 1616194 has been marked as a duplicate of this bug. ***