@branto found it: "Backport lacks this commits: cf68ce511f31d3746ca1cb13a9d3b39c25bb06a8 mgr: have MgrStandby send mgr reports to active mgr (even self!) This allows the mgr daemons to register state with the active mgr just like the rest of the cluster, including perfcounters and current config. "
The commit referenced in Comment#2 does help somewhat. The mgr nodes do get registered but the perf schema is empty forever. Either there are no perf schemas to show for ceph-mgr in luminous or they are not being sent properly. IIRC, in mimic (or master), we only have a couple (about 4 iirc) of perf counters and these are all related to objecter so this may have something to do with this. I have checked both luminous and mimic/master branches and there are no actual direct counters for ceph-mgr (at least not yet) -- there is no add_u64_counter in src/mgr folder. There only seems to be the couple of counters sent by objecter in mimic/master. I suspect we might need to register ceph-mgr with objecter somehow to get them. btw: The perf_schema for other daemons is also empty in the beginning but it gets filled with the perf counter data after a couple of seconds.
OK, it turned I back-ported this kinda wrong and that is why it did not work. We have already had commit f04190ed117 back-ported downstream but I did not apply it thinking it was an old one. After fixing the merge order conflict, this started working properly in my testing.
I have pushed the (tested) commit to the luminous-dashboard branch in rhcs ceph.git: https://github.com/rhcs-dashboard/ceph/commit/a0f040c3390360da43e64e7026547f09097c5507
Created attachment 1508534 [details] Mgr appears not attached to any host, and with empty information (ceph version)
Ceph-mgr does not show properly attached to a host (/host endpoint, which just dumps mgr.list_servers() method, which in turn invokes Mgr C++ _ceph_get_server() method): > 0: {services: [{type: "mgr", id: "x"}], hostname: "", ceph_version: ""} > ceph_version: "" > hostname: "" > services: [{type: "mgr", id: "x"}] > 0: {type: "mgr", id: "x"} > id: "x" > type: "mgr" Versus other daemons: > 1: {,…} > ceph_version: "ceph version Development (no_version) luminous (stable)" > hostname: "ceph.dev" > services: [{type: "mds", id: "a"}, {type: "mds", id: "b"}, {type: "mds", id: "c"}, {type: "mon", id: "a"},…] > 0: {type: "mds", id: "a"} > id: "a" > type: "mds" > 1: {type: "mds", id: "b"} > id: "b" > type: "mds"
I found the culprit: http://tracker.ceph.com/issues/23286 It should be fixed upstream by the following PR: https://github.com/ceph/ceph/pull/20875 It applies cleanly on top of rhcs-3.2. I am currently doing a build to test. I will let you know once I know more.
We also need the fix for the following issue: https://tracker.ceph.com/issues/23330 Otherwise, the ceph-mgr will crash as it will try to decode unexpected/incorrect message. It should be fixed by the following PR: https://github.com/ceph/ceph/pull/20866
The rhcs-3.2 branch PR: https://github.com/rhcs-dashboard/ceph/pull/27
The patches were merged, awaiting the new build.