Created attachment 1508264 [details] Screenshout of grafana output Description of problem: When we are changing values of mds & clients for many times & then if you decrease the number of client , number of clients showing wrong value in grafana. Version-Release number of selected component (if applicable): Cephmetrics-ansible-2.0.1-1.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1.Increase the number of active mds 2.Decrease the number of active mds 3.Increase the number client 4.Decrease the number of client Actual results: Grafana shows wrong value of number of client Expected results: Grafana should show correct number of client Additional info: # ceph fs status cephfs cephfs - 2 clients ====== +------+--------+----------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------+---------------+-------+-------+ | 0 | active | magna049 | Reqs: 0 /s | 11 | 13 | | 1 | active | magna046 | Reqs: 0 /s | 10 | 12 | +------+--------+----------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 398k | 2331G | | cephfs_data | data | 9.76G | 2331G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | magna060 | +-------------+
That page updates every 15s is is possible that you caught it before the update to reflect the change to 2 clients. When I look at it now it's correct at zero.
Hi Gregory,the value is actually increasing sometimes,when you decrease the number of clients.
ok we'll have to investigate, I'm moving to z1 as not a blocker
Hi John, PFB the Doc Text content. The 'MDS Performance' dashboard, does not reflect correct value for 'Clients', on performing increase/decrease of active MDS servers and Clients Multiple times Workaround : Not available.
Boris would you please investigate this issue ?
It looks like we are using 'ceph_mds_sessions_session_count' to get the clients count. However, this probably includes the stale sessions. We should probably switch to 'ceph_mds_sessions_sessions_open' for the clients metric instead. This may still take some time to fix itself on the page because of the way how exposing data, monitoring and querying works though (especially if you put down an mds node to test this).
Upstream PR: https://github.com/ceph/cephmetrics/pull/237
It seems that the modifications did not pass QA. There is no time to address this issue, and since it is not a regression, I am moving it to z6.