Bug 1028090

Summary: Implement event queue in the Indication manager
Product: Red Hat Enterprise Linux 7 Reporter: Tomáš Bžatek <tbzatek>
Component: openlmi-providersAssignee: Vitezslav Crhonek <vcrhonek>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: amahdal, ovasik, tsmetana
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-04 11:17:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1026663, 1033026    

Description Tomáš Bžatek 2013-11-07 16:09:02 UTC
The Indication manager we use as a helper/wrapper needs some kind of queue where incoming events will be sent, awaiting further processing. This is to ensure uninterrupted event flow. With the current consecutive watcher() and gather() parts there's a high chance to miss some events as long as watcher() is not always running.

Additionally, when receiving events in batches in some buffer, all events should be fully processed. With no queue, the rest of the buffer is thrown away as long as the watcher needs to end and return success.

Third requirement is a kind of rate-limiting, merging multiple equal events of the same type or for the same object in one, when sent in a short interval (user defined). This will naturally cause slight delay which may be desired for some applications - think of it as a settle timeout.

As a side-effect, the reworked watcher() should deal better with errors when critical issue prevents it from working. A kind of self-recovery would be nice too.

Comment 1 Tomáš Bžatek 2013-11-07 16:09:58 UTC
This currently blocks e.g. bug 1026663 to be fully race-free.

Comment 3 Tomáš Bžatek 2013-11-26 11:23:17 UTC
Debugging further it turned out there are more rules we should obey to ensure thread safety and play nice with memory allocations:

 - any CMPI call that manipulates with instances should be made from the same thread. I.e. don't free allocated instances from other thread than collected.
 - hold locks as little time as possible, operate on local variables and then fill shared memory with a quick operation with lock held
 - perform proper thread shutdown and cleanup, to free all data, unlock all locks and properly detach threads from CIMOM. No forced thread cancellation.
 - handle a scenario when new filters are registered while worker threads are already running (for the first poll case)

 - implement a clean way to cancel worker threads from outside, the idea was to use poll()/select() with a side fd acting as a cancellation channel, similar to how GCancellable works. This should be exposed to watcher/gather callbacks and indmanager users should integrate it in their code.

Comment 4 Tomáš Bžatek 2013-12-03 16:07:45 UTC
Note to myself: GCancellable principle explained: http://blog.verbum.org/2013/12/03/cancelling-computation-gcancellable-or-sigint-versus-threads-versus-exceptions/

Comment 5 Tomáš Bžatek 2014-01-06 16:05:31 UTC
- be sure to check CMPI calls error codes and rc statuses

Comment 11 Vitezslav Crhonek 2020-03-04 11:17:39 UTC
Red Hat Enterprise Linux version 7 entered the Maintenance Support 1 Phase in August 2019. In this phase only qualified Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released as they become available. Other errata advisories may be delivered as appropriate.

This bug has been reviewed by Support and Engineering representative and does not meet the inclusion criteria for Maintenance Support 1 Phase. If this issue still exists in newer major version of Red Hat Enterprise Linux, it has been cloned there and work will continue in the cloned bug.

For more information about Red Hat Enterprise Linux Lifecycle, please see https://access.redhat.com/support/policy/updates/errata/