Description of problem: I have a sandbox setup where I pretty sure know that I only have one client driving the IO to RHGS cluster. Yet in WA we should connections being close to 200 when I drive the load (copying files to RHGS volume) with one process on the client machine. The help on the panel also says that the panel display the total number of client connections. From a customer perspective client is the machine that is accessing storage. Version-Release number of selected component (if applicable): 3.4 How reproducible: Generate some traffic via a single client to RHGS cluster and watch the connections trend metric in WA cluster dashboard. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1441076 [details] screenshot of Connection Trend chart with it's info text Using tendrl-monitoring-integration-1.6.3-3.el7rhgs.noarch
(In reply to Anand Paladugu from comment #0) > Yet in WA we > should connections being close to 200 when I drive the load (copying files > to RHGS volume) with one process on the client machine. The help on the > panel also says that the panel display the total number of client > connections. The problem here is that this feature is not described correctly. If I recall right, there is no (easy or straightforward) way to get number of client connections in GlusterFS, so what this chart shows are all gluster connections, which includes both client (from user perspective) and internal ones. I think that fix of this BZ should include: * fix in the info text of the chart * documentation for this chart, describing what the value here means exactly, why should someone care and how and when to read it That said, this feature and a use case is neither described in a requirement (so that we know what the intent here is) nor documented (so that we don't know what the user will think about this). And without that, qe team can't test it properly.
(In reply to Martin Bukatovic from comment #3) > If I recall right, there is no (easy or straightforward) way to get number of > client connections in GlusterFS, so what this chart shows are all gluster > connections, which includes both client (from user perspective) and internal > ones. Is this description correct?
We need to fix the help text here. Make it clear what it really represents. Ju, Could you please help here?
Thanks Nishanth. As discussed please also open an RFE for support from Gluster for a way to determine real clients.
I see a few different action items here: (1) fix text for the WA 3.4 release (dashboard panel description and Docs) and RHGS docs (https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/displaying_volume_status where it's not documented) (2) RFE to determine real clients in RHGS (3) Add and/or improve the panel in WA Dashboards to include visibility of the client connections in a future WA release (4) Add panel on # of clients in WA Dashboard (though ideally it would be nice to see the specific clients vs. the ports the clients are connecting on), e.g. # gluster volume status test-volume clients Brick : arch:/export/1 Clients connected : 2 Hostname Bytes Read BytesWritten -------- --------- ------------ 127.0.0.1:1013 776 676 127.0.0.1:1012 50440 51200 I think we should be consistent on how we describe this in WA and RHGS core docs, so we should get Docs to suggest the text into RHGS docs (for volume status explanation), and then describe it in WA accordingly.
(In reply to Ju Lim from comment #9) > I see a few different action items here: > > (1) fix text for the WA 3.4 release (dashboard panel description and Docs) > and RHGS docs > (https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/ > html/administration_guide/displaying_volume_status where it's not documented) > > (2) RFE to determine real clients in RHGS Please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1581789#c6. This will be available in the next RHGS rebase. > > (3) Add and/or improve the panel in WA Dashboards to include visibility of > the client connections in a future WA release > > (4) Add panel on # of clients in WA Dashboard (though ideally it would be > nice to see the specific clients vs. the ports the clients are connecting > on), e.g. > > # gluster volume status test-volume clients > > Brick : arch:/export/1 > Clients connected : 2 > Hostname Bytes Read BytesWritten > -------- --------- ------------ > 127.0.0.1:1013 776 676 > 127.0.0.1:1012 50440 51200 > Items 3,4 can be worked on once the RHGS feature is available, probably in next release. > > I think we should be consistent on how we describe this in WA and RHGS core > docs, so we should get Docs to suggest the text into RHGS docs (for volume > status explanation), and then describe it in WA accordingly.
My thoughts and queries: Ju's suggestion of documenting the Connections parameter in the RHGS Admin guide is a good one but, at this point, the enhancement mentioned by Atin in comment 6 is not yet available. Only when the enhancement is available in the next rebase, the RHGS docs would be able to document it. I will query this with Gluster Content Strategist, Kenneth H for his action plan. So I guess, the immediate action to address this issue would be to modify the Connections Trend info tip in the Grafana dashboard panel by describing clearly what the 100-200 upward connection means and once the rebase is done, update the RHGS docs. As per astutely observed by Anand, a single client is generating a staggering 200+ connections in the chart which is not true. Here, we need help from the engineering to know what other services are contributing to the massive numbers. Ju indicated that the numbers are a combination of various gluster services talking to the object, eg, the Volumes. Nishanth, could you confirm what are the other services contributing to the data/numbers besides clients? Once we have clarity onthis, CCS will suggest a clear and meaningful text to avoid any potential confusion. Thanks!
This is what I know of fuse snapd quotad glustershd rebalance tierd gfapi Atin, Could you please confirm?
(In reply to Nishanth Thomas from comment #12) > This is what I know of > > fuse > snapd > quotad > glustershd > rebalance > tierd > gfapi > > Atin, Could you please confirm? You nailed it. Nothing missed out.
Thanks, Nishanth. As the Connection panel displays the total number of client connections and Gluster services, I suggest the following info tip that accurately conveys what the numbers are without any potential confusion: "The Connection panel displays the total number of client connections and Gluster services connected to the bricks in the volumes of a given cluster over a period of time." Ju acknowledged this version. The Gluster services provided by Nishanth will be added in WA docs.
@martin I have updated the tool-tip of the connections panel with "The Connection panel displays the total number of client connections and Gluster services connected to the bricks in the volumes of a given cluster over a period of time." https://github.com/Tendrl/monitoring-integration/pull/491/files
Testing with tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch On Cluster dashboard, I see that the description of Connections panel states: > The Connections panel displays the total number of client connections and > Gluster services connected to the bricks in the volumes of a given cluster > over a period of time. Which matches expected result, as noted in comment 15 and as drafted in comment 14.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616