Bug 1745607

Summary: Backport to Nautilus: Add mgr module for kubernetes event integration
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Anmol Sachan <asachan>
Component: Ceph-Mgr PluginsAssignee: Boris Ranto <branto>
Status: CLOSED ERRATA QA Contact: Sidhant Agrawal <sagrawal>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: bancinco, branto, ceph-eng-bugs, ceph-qe-bugs, ebenahar, etamir, gmeno, kdreyer, mkasturi, pcuzner, ratamir, sagrawal, tchandra, tserlin
Target Milestone: rc   
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-14.2.4-14.el8cp, ceph-14.2.4-2.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-31 12:47:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1745617    

Description Anmol Sachan 2019-08-26 13:36:48 UTC
Description of problem:

mgr/k8sevents: Add new mgr module for kubernetes event integration

New mgr module to provide a means of sending ceph related events to the kubernetes
events API, and also retrieval of all kubernetes events from the rook-ceph namespace.
Events may be viewed by a ceph k8sevents namespace command which will show
similar output at the ceph cli to a native kubernetes client kubectl get events command.

Since the events are cached by the module, it would also be possible to expose them to
other mgr module(s) e.g. dashboard

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Yaniv Kaul 2019-09-15 13:52:59 UTC
Backport has been merged >1 week ago. What's the latest on this BZ?

Comment 5 Boris Ranto 2019-09-16 18:33:38 UTC
The back-port hasn't been merged upstream, yet. In fact, it is currently in a DNM state because it is awaiting a fix for the k8sevents module. I did a partial back-port downstream to make some progress though.

Comment 7 Yaniv Kaul 2019-09-18 06:27:02 UTC
(In reply to Boris Ranto from comment #5)
> The back-port hasn't been merged upstream, yet. In fact, it is currently in
> a DNM state because it is awaiting a fix for the k8sevents module. I did a
> partial back-port downstream to make some progress though.

What fix is it waiting for? Please add the issue here so we can track it in one place (https://bugzilla.redhat.com/show_bug.cgi?id=1745617 is waiting for this BZ, so I'm trying to track the whole chain of deps...)

Comment 8 Boris Ranto 2019-09-18 11:49:06 UTC
The upstream nautilus back-port PR is in the DNM state:

https://github.com/ceph/ceph/pull/30215

It is waiting for a fix for:

https://tracker.ceph.com/issues/41737

i.e. module currently crashes in non-k8s environments.

Comment 9 Yaniv Kaul 2019-09-23 06:44:50 UTC
(In reply to Boris Ranto from comment #8)
> The upstream nautilus back-port PR is in the DNM state:
> 
> https://github.com/ceph/ceph/pull/30215
> 
> It is waiting for a fix for:
> 
> https://tracker.ceph.com/issues/41737

PR - https://github.com/ceph/ceph/pull/30482 (easier to track by PR if available)

> 
> i.e. module currently crashes in non-k8s environments.

Comment 11 Boris Ranto 2019-10-15 19:55:05 UTC
There was some progress upstream so I have cherry-picked the rest of the necessary commits for this bz. However, we do have an issue with our downstream builds at the moment as one of the back-ported patches (not for this bz) introduced a build failure on s390x. The s390x build failure will need to be fixed before we can move this to ON_QA.

Comment 14 Raz Tamir 2019-11-06 07:41:34 UTC
Hi Anmol,

Could you please provide steps for testing this functionality?
(QE need this in order to qa_ack this BZ)

Comment 15 Anmol Sachan 2019-11-06 08:25:48 UTC
pcuzner can provide the information as he developed the feature.

Comment 16 Paul Cuzner 2019-11-06 21:07:40 UTC
The merge is still pending upstream - but if we have it downstream in our container build that's great.

Ultimately the module should be enabled by rook - but until this is done, you'll need to enable manually through the tools pod.
ceph mgr module enable k8sevents

this registers a coupld of commands
ceph k8sevents status | ls | ceph

So once loaded you can use those commands to check that it's operational. When storageclasses get created, or osds get added/removed or healthcheck fails you chould see kubernetes events show up in the OCS dashboard.

Raz, if you can confirm the above I'll start the process to auto enable the module within rook.

Comment 18 Raz Tamir 2019-11-13 16:20:55 UTC
Hi Paul,

Is it just for testing?
I'm asking because the tools pod is not going to be part of OCS

Comment 24 Paul Cuzner 2019-11-22 00:05:59 UTC
the tools pod is not required for this. k8sevents is a mgr module, so as long as it's in the image we can enable manually to test it and then raise a PR for rook to automatically enable the module - like we do for prometheus

Comment 25 Raz Tamir 2019-11-28 10:48:18 UTC
@Elad, can you please assign this for verification?

Comment 36 Eran Tamir 2020-01-15 08:34:37 UTC
verified

Comment 39 errata-xmlrpc 2020-01-31 12:47:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312