1647812 – Ceph-Mgr does not properly feed mgr's perf_counter schema

Bug 1647812 - Ceph-Mgr does not properly feed mgr's perf_counter schema

Summary: Ceph-Mgr does not properly feed mgr's perf_counter schema

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Dashboard
Sub Component:
Version:	3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	z1
Target Release:	4.0
Assignee:	Boris Ranto
QA Contact:	Madhavi Kasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-11-08 12:15 UTC by Ernesto Puerta
Modified:	2019-12-05 20:32 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1647115
Environment:
Last Closed:	2019-12-05 20:32:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Mgr appears not attached to any host, and with empty information (ceph version) (74.59 KB, image/png) 2018-11-26 11:13 UTC, Ernesto Puerta	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	rhcs-dashboard/ceph/commit/073fc2c2ab	0	None	None	None	2019-12-05 20:26:52 UTC

Comment 2 Ernesto Puerta 2018-11-08 12:18:05 UTC

@branto found it:
"Backport lacks this commits:

cf68ce511f31d3746ca1cb13a9d3b39c25bb06a8

    mgr: have MgrStandby send mgr reports to active mgr (even self!)
    
    This allows the mgr daemons to register state with the active mgr just
    like the rest of the cluster, including perfcounters and current config.
"

Comment 3 Boris Ranto 2018-11-08 20:14:56 UTC

The commit referenced in Comment#2 does help somewhat. The mgr nodes do get registered but the perf schema is empty forever. Either there are no perf schemas to show for ceph-mgr in luminous or they are not being sent properly.

IIRC, in mimic (or master), we only have a couple (about 4 iirc) of perf counters and these are all related to objecter so this may have something to do with this.

I have checked both luminous and mimic/master branches and there are no actual direct counters for ceph-mgr (at least not yet) -- there is no add_u64_counter in src/mgr folder. There only seems to be the couple of counters sent by objecter in mimic/master. I suspect we might need to register ceph-mgr with objecter somehow to get them.

btw: The perf_schema for other daemons is also empty in the beginning but it gets filled with the perf counter data after a couple of seconds.

Comment 4 Boris Ranto 2018-11-08 22:01:17 UTC

OK, it turned I back-ported this kinda wrong and that is why it did not work. We have already had commit f04190ed117 back-ported downstream but I did not apply it thinking it was an old one.

After fixing the merge order conflict, this started working properly in my testing.

Comment 5 Boris Ranto 2018-11-08 23:42:03 UTC

I have pushed the (tested) commit to the luminous-dashboard branch in rhcs ceph.git:

https://github.com/rhcs-dashboard/ceph/commit/a0f040c3390360da43e64e7026547f09097c5507

Comment 6 Ernesto Puerta 2018-11-26 11:13:52 UTC

Created attachment 1508534 [details]
Mgr appears not attached to any host, and with empty information (ceph version)

Comment 7 Ernesto Puerta 2018-11-26 11:20:16 UTC

Ceph-mgr does not show properly attached to a host (/host endpoint, which just dumps mgr.list_servers() method, which in turn invokes Mgr C++ _ceph_get_server() method):

> 0: {services: [{type: "mgr", id: "x"}], hostname: "", ceph_version: ""}
>   ceph_version: ""
>   hostname: ""
>   services: [{type: "mgr", id: "x"}]
>   0: {type: "mgr", id: "x"}
>     id: "x"
>     type: "mgr"

Versus other daemons:
> 1: {,…}
>   ceph_version: "ceph version Development (no_version) luminous (stable)"
>   hostname: "ceph.dev"
>   services: [{type: "mds", id: "a"}, {type: "mds", id: "b"}, {type: "mds", id: "c"}, {type: "mon", id: "a"},…]
>   0: {type: "mds", id: "a"}
>     id: "a"
>     type: "mds"
>   1: {type: "mds", id: "b"}
>     id: "b"
>     type: "mds"

Comment 8 Boris Ranto 2018-11-28 17:31:15 UTC

I found the culprit:

http://tracker.ceph.com/issues/23286

It should be fixed upstream by the following PR:

https://github.com/ceph/ceph/pull/20875

It applies cleanly on top of rhcs-3.2. I am currently doing a build to test. I will let you know once I know more.

Comment 9 Boris Ranto 2018-11-28 22:25:37 UTC

We also need the fix for the following issue:

https://tracker.ceph.com/issues/23330

Otherwise, the ceph-mgr will crash as it will try to decode unexpected/incorrect message. It should be fixed by the following PR:

https://github.com/ceph/ceph/pull/20866

Comment 10 Boris Ranto 2018-11-28 22:42:28 UTC

The rhcs-3.2 branch PR:

https://github.com/rhcs-dashboard/ceph/pull/27

Comment 11 Boris Ranto 2018-11-29 21:27:50 UTC

The patches were merged, awaiting the new build.

Note You need to log in before you can comment on or make changes to this bug.