Bug 1581789 - Connection trends panel information can be mis-understood by customers.
Summary: Connection trends panel information can be mis-understood by customers.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Ankush Behl
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-05-23 15:48 UTC by Anand Paladugu
Modified: 2018-09-04 07:08 UTC (History)
7 users (show)

Fixed In Version: tendrl-monitoring-integration-1.6.3-6.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 07:07:10 UTC
Embargoed:


Attachments (Terms of Use)
screenshot of Connection Trend chart with it's info text (43.08 KB, image/png)
2018-05-24 14:31 UTC, Martin Bukatovic
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/Tendrl monitoring-integration issues 490 0 None None None 2018-07-03 10:54:04 UTC
Red Hat Product Errata RHSA-2018:2616 0 None None None 2018-09-04 07:08:05 UTC

Description Anand Paladugu 2018-05-23 15:48:27 UTC
Description of problem:  I have a sandbox setup where I pretty sure know that I only have one client driving the IO to RHGS cluster.  Yet in WA we should connections being close to 200 when I drive the load (copying files to RHGS volume) with one process on the client machine.  The help on the panel also says that the panel display the total number of client connections.  From a customer perspective client is the machine that is accessing storage.


Version-Release number of selected component (if applicable): 3.4 


How reproducible:  Generate some traffic via a single client to RHGS cluster and watch the connections trend metric in WA cluster dashboard.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Martin Bukatovic 2018-05-24 14:31:24 UTC
Created attachment 1441076 [details]
screenshot of Connection Trend chart with it's info text

Using tendrl-monitoring-integration-1.6.3-3.el7rhgs.noarch

Comment 3 Martin Bukatovic 2018-05-24 14:40:25 UTC
(In reply to Anand Paladugu from comment #0)
> Yet in WA we
> should connections being close to 200 when I drive the load (copying files
> to RHGS volume) with one process on the client machine.  The help on the
> panel also says that the panel display the total number of client
> connections. 

The problem here is that this feature is not described correctly.

If I recall right, there is no (easy or straightforward) way to get number of
client connections in GlusterFS, so what this chart shows are all gluster
connections, which includes both client (from user perspective) and internal
ones.

I think that fix of this BZ should include:

* fix in the info text of the chart
* documentation for this chart, describing what the value here means exactly,
  why should someone care and how and when to read it

That said, this feature and a use case is neither described in a requirement
(so that we know what the intent here is) nor documented (so that we don't know
what the user will think about this). And without that, qe team can't test it
properly.

Comment 4 Martin Bukatovic 2018-05-24 14:41:40 UTC
(In reply to Martin Bukatovic from comment #3)
> If I recall right, there is no (easy or straightforward) way to get number of
> client connections in GlusterFS, so what this chart shows are all gluster
> connections, which includes both client (from user perspective) and internal
> ones.

Is this description correct?

Comment 5 Martin Bukatovic 2018-05-24 14:42:56 UTC
(In reply to Martin Bukatovic from comment #3)
> If I recall right, there is no (easy or straightforward) way to get number of
> client connections in GlusterFS, so what this chart shows are all gluster
> connections, which includes both client (from user perspective) and internal
> ones.

Is this description correct?

Comment 7 Nishanth Thomas 2018-05-28 09:23:58 UTC
We need to fix the help text here. Make it clear what it really represents.
Ju, Could you please help here?

Comment 8 Anand Paladugu 2018-05-30 16:38:38 UTC
Thanks Nishanth.  As discussed please also open an RFE for support from Gluster for a way to determine real clients.

Comment 9 Ju Lim 2018-06-05 14:14:02 UTC
I see a few different action items here:

(1) fix text for the WA 3.4 release (dashboard panel description and Docs) and RHGS docs (https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/displaying_volume_status where it's not documented)

(2) RFE to determine real clients in RHGS

(3) Add and/or improve the panel in WA Dashboards to include visibility of the client connections in a future WA release

(4) Add panel on # of clients in WA Dashboard (though ideally it would be nice to see the specific clients vs. the ports the clients are connecting on), e.g.

# gluster volume status test-volume clients

Brick : arch:/export/1
Clients connected : 2
Hostname          Bytes Read   BytesWritten
--------          ---------    ------------
127.0.0.1:1013    776          676
127.0.0.1:1012    50440        51200


I think we should be consistent on how we describe this in WA and RHGS core docs, so we should get Docs to suggest the text into RHGS docs (for volume status explanation), and then describe it in WA accordingly.

Comment 10 Nishanth Thomas 2018-06-06 11:01:06 UTC
(In reply to Ju Lim from comment #9)
> I see a few different action items here:
> 
> (1) fix text for the WA 3.4 release (dashboard panel description and Docs)
> and RHGS docs
> (https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/
> html/administration_guide/displaying_volume_status where it's not documented)
> 
> (2) RFE to determine real clients in RHGS


Please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1581789#c6. This will be available in the next RHGS rebase.


> 
> (3) Add and/or improve the panel in WA Dashboards to include visibility of
> the client connections in a future WA release
> 
> (4) Add panel on # of clients in WA Dashboard (though ideally it would be
> nice to see the specific clients vs. the ports the clients are connecting
> on), e.g.
> 
> # gluster volume status test-volume clients
> 
> Brick : arch:/export/1
> Clients connected : 2
> Hostname          Bytes Read   BytesWritten
> --------          ---------    ------------
> 127.0.0.1:1013    776          676
> 127.0.0.1:1012    50440        51200
> 

Items 3,4 can be worked on once the RHGS feature is available, probably in next release.

> 
> I think we should be consistent on how we describe this in WA and RHGS core
> docs, so we should get Docs to suggest the text into RHGS docs (for volume
> status explanation), and then describe it in WA accordingly.

Comment 11 Rakesh 2018-06-06 13:45:44 UTC
My thoughts and queries:

Ju's suggestion of documenting the Connections parameter in the RHGS Admin guide is a good one but, at this point, the enhancement mentioned by Atin in comment 6 is not yet available. Only when the enhancement is available in the next rebase, the RHGS docs would be able to document it. I will query this with Gluster Content Strategist, Kenneth H for his action plan.  

So I guess, the immediate action to address this issue would be to modify the Connections Trend info tip in the Grafana dashboard panel by describing clearly what the 100-200 upward connection means and once the rebase is done, update the RHGS docs. 

As per astutely observed by Anand, a single client is generating a staggering 200+ connections in the chart which is not true. Here, we need help from the engineering to know what other services are contributing to the massive numbers. 

Ju indicated that the numbers are a combination of various gluster services talking to the object, eg, the Volumes. 

Nishanth, could you confirm what are the other services contributing to the data/numbers besides clients? Once we have clarity onthis, CCS will suggest a clear and meaningful text to avoid any potential confusion. Thanks!

Comment 12 Nishanth Thomas 2018-06-06 14:49:33 UTC
This is what I know of

fuse
snapd
quotad
glustershd
rebalance
tierd
gfapi

Atin, Could you please confirm?

Comment 13 Atin Mukherjee 2018-06-08 18:37:55 UTC
(In reply to Nishanth Thomas from comment #12)
> This is what I know of
> 
> fuse
> snapd
> quotad
> glustershd
> rebalance
> tierd
> gfapi
> 
> Atin, Could you please confirm?

You nailed it. Nothing missed out.

Comment 14 Rakesh 2018-06-12 17:48:49 UTC
Thanks, Nishanth. 

As the Connection panel displays the total number of client connections and Gluster services, I suggest the following info tip that accurately conveys what the numbers are without any potential confusion:

"The Connection panel displays the total number of client connections and Gluster services connected to the bricks in the volumes of a given cluster over a period of time."

Ju acknowledged this version. The Gluster services provided by Nishanth will be added in WA docs.

Comment 18 Ankush Behl 2018-07-03 10:56:57 UTC
@martin I have updated the tool-tip of the connections panel with  
"The Connection panel displays the total number of client connections and Gluster services connected to the bricks in the volumes of a given cluster over a period of time."

https://github.com/Tendrl/monitoring-integration/pull/491/files

Comment 23 Martin Bukatovic 2018-08-13 19:29:43 UTC
Testing with tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch

On Cluster dashboard, I see that the description of Connections panel states:

> The Connections panel displays the total number of client connections and
> Gluster services connected to the bricks in the volumes of a given cluster
> over a period of time.

Which matches expected result, as noted in comment 15 and as drafted in
comment 14.

Comment 25 errata-xmlrpc 2018-09-04 07:07:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616


Note You need to log in before you can comment on or make changes to this bug.