Description of problem: Some events related to osd, mon, cluster state change are emitted from all the calamari-lite instances. Since USM will be listening to multiple calamari-lite instances for events, it must send event(push event to salt bus) from only one instance. which instance has to send can be decided based on if it is residing on a leader mon node. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Have a ceph setup with multiple mon nodes(hence multiple calamari-lite instances) 2. Simulate an event for osd state change. 3. Actual results: Same event would be sent from multiple calamari-lite instances to salt event bus. Expected results: A single event should be sent from only one calamari-lite instance to salt bus. Additional info:
I believe that we should address this issue in a different way. calamari-lite running on all ceph monitors will be a risk to cluster stability and data integrity. I think that the storage-console should choose a single monitor to enable calamari on in the first release. That way when the inevitable happens we only loose management temporarily and not trigger data-loss If we organize this way event filtering won't be needed until we design calamari to be HA. Mrugesh what do you think about this approach as risk-mitigation?