Bug 1313305

Summary: Calamari must filter duplicate events before pushing it to salt event bus.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Darshan <dnarayan>
Component: CalamariAssignee: Christina Meno <gmeno>
Calamari sub component: Back-end QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: unspecified CC: ceph-eng-bugs, flucifre, mkarnik
Version: 1.3.1   
Target Milestone: rc   
Target Release: 2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-05 18:41:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1291304    

Description Darshan 2016-03-01 11:20:00 UTC
Description of problem:
Some events related to osd, mon, cluster state change are emitted from all the calamari-lite instances. Since USM will be listening to multiple calamari-lite instances for events, it must send event(push event to salt bus) from only one instance. which instance has to send can be decided based on if it is residing on a leader mon node.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Have a ceph setup with multiple mon nodes(hence multiple calamari-lite instances)
2. Simulate an event for osd state change.
3. 

Actual results:
Same event would be sent from multiple calamari-lite instances to salt event bus.

Expected results:
A single event should be sent from only one calamari-lite instance to salt bus.

Additional info:

Comment 2 Christina Meno 2016-04-06 21:30:18 UTC
I believe that we should address this issue in a different way.
calamari-lite running on all ceph monitors will be a risk to cluster stability and data integrity.

I think that the storage-console should choose a single monitor to enable calamari on in the first release. That way when the inevitable happens we only loose management temporarily and not trigger data-loss

If we organize this way event filtering won't be needed until we design calamari to be HA.

Mrugesh what do you think about this approach as risk-mitigation?