Bug 1647812 - Ceph-Mgr does not properly feed mgr's perf_counter schema
Summary: Ceph-Mgr does not properly feed mgr's perf_counter schema
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Dashboard
Version: 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: z1
: 4.0
Assignee: Boris Ranto
QA Contact: Madhavi Kasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-08 12:15 UTC by Ernesto Puerta
Modified: 2019-12-05 20:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1647115
Environment:
Last Closed: 2019-12-05 20:32:22 UTC
Embargoed:


Attachments (Terms of Use)
Mgr appears not attached to any host, and with empty information (ceph version) (74.59 KB, image/png)
2018-11-26 11:13 UTC, Ernesto Puerta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github rhcs-dashboard/ceph/commit/073fc2c2ab 0 None None None 2019-12-05 20:26:52 UTC

Comment 2 Ernesto Puerta 2018-11-08 12:18:05 UTC
@branto found it:
"Backport lacks this commits:

cf68ce511f31d3746ca1cb13a9d3b39c25bb06a8

    mgr: have MgrStandby send mgr reports to active mgr (even self!)
    
    This allows the mgr daemons to register state with the active mgr just
    like the rest of the cluster, including perfcounters and current config.
"

Comment 3 Boris Ranto 2018-11-08 20:14:56 UTC
The commit referenced in Comment#2 does help somewhat. The mgr nodes do get registered but the perf schema is empty forever. Either there are no perf schemas to show for ceph-mgr in luminous or they are not being sent properly.

IIRC, in mimic (or master), we only have a couple (about 4 iirc) of perf counters and these are all related to objecter so this may have something to do with this.

I have checked both luminous and mimic/master branches and there are no actual direct counters for ceph-mgr (at least not yet) -- there is no add_u64_counter in src/mgr folder. There only seems to be the couple of counters sent by objecter in mimic/master. I suspect we might need to register ceph-mgr with objecter somehow to get them.

btw: The perf_schema for other daemons is also empty in the beginning but it gets filled with the perf counter data after a couple of seconds.

Comment 4 Boris Ranto 2018-11-08 22:01:17 UTC
OK, it turned I back-ported this kinda wrong and that is why it did not work. We have already had commit f04190ed117 back-ported downstream but I did not apply it thinking it was an old one.

After fixing the merge order conflict, this started working properly in my testing.

Comment 5 Boris Ranto 2018-11-08 23:42:03 UTC
I have pushed the (tested) commit to the luminous-dashboard branch in rhcs ceph.git:

https://github.com/rhcs-dashboard/ceph/commit/a0f040c3390360da43e64e7026547f09097c5507

Comment 6 Ernesto Puerta 2018-11-26 11:13:52 UTC
Created attachment 1508534 [details]
Mgr appears not attached to any host, and with empty information (ceph version)

Comment 7 Ernesto Puerta 2018-11-26 11:20:16 UTC
Ceph-mgr does not show properly attached to a host (/host endpoint, which just dumps mgr.list_servers() method, which in turn invokes Mgr C++ _ceph_get_server() method):

> 0: {services: [{type: "mgr", id: "x"}], hostname: "", ceph_version: ""}
>   ceph_version: ""
>   hostname: ""
>   services: [{type: "mgr", id: "x"}]
>   0: {type: "mgr", id: "x"}
>     id: "x"
>     type: "mgr"

Versus other daemons:
> 1: {,…}
>   ceph_version: "ceph version Development (no_version) luminous (stable)"
>   hostname: "ceph.dev"
>   services: [{type: "mds", id: "a"}, {type: "mds", id: "b"}, {type: "mds", id: "c"}, {type: "mon", id: "a"},…]
>   0: {type: "mds", id: "a"}
>     id: "a"
>     type: "mds"
>   1: {type: "mds", id: "b"}
>     id: "b"
>     type: "mds"

Comment 8 Boris Ranto 2018-11-28 17:31:15 UTC
I found the culprit:

http://tracker.ceph.com/issues/23286

It should be fixed upstream by the following PR:

https://github.com/ceph/ceph/pull/20875

It applies cleanly on top of rhcs-3.2. I am currently doing a build to test. I will let you know once I know more.

Comment 9 Boris Ranto 2018-11-28 22:25:37 UTC
We also need the fix for the following issue:

https://tracker.ceph.com/issues/23330

Otherwise, the ceph-mgr will crash as it will try to decode unexpected/incorrect message. It should be fixed by the following PR:

https://github.com/ceph/ceph/pull/20866

Comment 10 Boris Ranto 2018-11-28 22:42:28 UTC
The rhcs-3.2 branch PR:

https://github.com/rhcs-dashboard/ceph/pull/27

Comment 11 Boris Ranto 2018-11-29 21:27:50 UTC
The patches were merged, awaiting the new build.


Note You need to log in before you can comment on or make changes to this bug.