Bug 1652896 - [ceph-metrics]clients values are not reflecting in ceph mds performance
Summary: [ceph-metrics]clients values are not reflecting in ceph mds performance
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Metrics
Version: 3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 3.3
Assignee: Boris Ranto
QA Contact: ymane
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1629656 1726135
TreeView+ depends on / blocked
 
Reported: 2018-11-23 12:36 UTC by ymane
Modified: 2020-02-17 02:28 UTC (History)
10 users (show)

Fixed In Version: cephmetrics-2.0.3-1.el7cp
Doc Type: Bug Fix
Doc Text:
.The _MDS Performance_ dashboard now displays the correct number of CephFS clients The _MDS Performance_ dashboard displayed an incorrect value for _Clients_ after increasing and decreasing the number of active Metadata Servers (MDS) and clients multiple times. This bug has been fixed, and the _MDS Performance_ dashboard now displays the correct number of Ceph File System (CephFS) clients as expected.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
Screenshout of grafana output (99.34 KB, image/png)
2018-11-23 12:36 UTC, ymane
no flags Details

Description ymane 2018-11-23 12:36:14 UTC
Created attachment 1508264 [details]
Screenshout of grafana output

Description of problem:
When we are changing values of mds & clients for many times & then if you decrease the number of client , number of clients showing wrong value in grafana.


Version-Release number of selected component (if applicable):
Cephmetrics-ansible-2.0.1-1.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Increase the number of active mds
2.Decrease the number of active mds
3.Increase the number client
4.Decrease the number of client

Actual results:
Grafana shows wrong value of number of client

Expected results:
Grafana should show correct number of client

Additional info:
# ceph fs status cephfs
cephfs - 2 clients
======
+------+--------+----------+---------------+-------+-------+
| Rank | State  |   MDS    |    Activity   |  dns  |  inos |
+------+--------+----------+---------------+-------+-------+
|  0   | active | magna049 | Reqs:    0 /s |   11  |   13  |
|  1   | active | magna046 | Reqs:    0 /s |   10  |   12  |
+------+--------+----------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  398k | 2331G |
|   cephfs_data   |   data   | 9.76G | 2331G |
+-----------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
|   magna060  |
+-------------+

Comment 3 Christina Meno 2018-11-26 22:43:53 UTC
That page updates every 15s is is possible that you caught it before the update to reflect the change to 2 clients.

When I look at it now it's correct at zero.

Comment 4 ymane 2018-11-27 13:04:37 UTC
Hi Gregory,the value is actually increasing sometimes,when you decrease the number of clients.

Comment 5 Christina Meno 2018-11-27 16:05:51 UTC
ok we'll have to investigate, I'm moving to z1 as not a blocker

Comment 7 Madhavi Kasturi 2018-12-18 10:19:15 UTC
Hi John,

PFB the Doc Text content.

The 'MDS Performance' dashboard, does not reflect correct value for 'Clients', on performing increase/decrease of active MDS servers and Clients Multiple times
Workaround : Not available.

Comment 8 Christina Meno 2019-01-09 23:04:41 UTC
Boris would you please investigate this issue ?

Comment 9 Boris Ranto 2019-01-31 07:36:28 UTC
It looks like we are using 'ceph_mds_sessions_session_count' to get the clients count. However, this probably includes the stale sessions. We should probably switch to 'ceph_mds_sessions_sessions_open' for the clients metric instead. This may still take some time to fix itself on the page because of the way how exposing data, monitoring and querying works though (especially if you put down an mds node to test this).

Comment 10 Boris Ranto 2019-06-19 19:47:42 UTC
Upstream PR: https://github.com/ceph/cephmetrics/pull/237


Note You need to log in before you can comment on or make changes to this bug.