Bug 1494599

Summary: [PRD][RFE] Add CloudForms Alerts for OpenShift Provider based on Kubernetes Event
Product: Red Hat CloudForms Management Engine Reporter: Loic Avenel <lavenel>
Component: ProvidersAssignee: Oved Ourfali <oourfali>
Status: CLOSED WONTFIX QA Contact: Einat Pacifici <epacific>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.8.0CC: cben, fsimonce, gblomqui, jfrey, jhardy, lavenel, obarenbo, oourfali, saali
Target Milestone: GAKeywords: FutureFeature
Target Release: 5.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-26 10:51:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Feature
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1480786, 1511957    

Description Loic Avenel 2017-09-22 15:20:13 UTC
Description of problem: CloudForms should allow Customers to create Alerts based Kubernetes Events 

Alerts should be based on the following category of Events:

Pods
Containers
Container Nodes

For each Category it should be possible to define the Frequency of notification and custom expression for filtering

Alert Profile Assignment should offer:

Pods: Providers, tagged Providers,Tagged Project
Containers: Providers, tagged Providers, Project, Tagged Project
Containers Nodes: Providers, tagged Providers, Nodes, Tagged Nodes

Comment 2 Dave Johnson 2017-09-22 15:44:49 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 3 Beni Paskin-Cherniavsky 2017-09-26 11:46:50 UTC
UI does allow creating regular Node alerts, What to Evaluate: Expression (custom), Driving Event: Hourly timer.

Such alert doesn't work.  Log of queuing the evaluation:

[----] I, [2017-09-14T04:44:33.455577 #5795:737134]  INFO -- : MIQ(MiqScheduleWorker::Runner#do_work) Number of scheduled items to be processed: 2.
[----] I, [2017-09-14T04:44:33.462695 #5795:737134]  INFO -- : MIQ(MiqQueue.put) Message id: [101000021485536],  id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAlert.evaluate_hourly_timer], Timeout: [600], Priority: [90], State: [ready], Deliver On: [], Data: [], Args: []

and stacktrace of evaluation crashing:

[----] I, [2017-09-14T04:44:36.582339 #22039:737134]  INFO -- : MIQ(MiqQueue#deliver) Message id: [101000021485536], Delivering...
[----] I, [2017-09-14T04:44:36.582666 #22039:737134]  INFO -- : MIQ(MiqAlert.evaluate_hourly_timer) Starting
[----] E, [2017-09-14T04:44:36.644221 #22039:737134] ERROR -- : MIQ(MiqQueue#deliver) Message id: [101000021485536], Error: [undefined method `container_nodes' for #<Zone:0x0000000af8bf88>]
[----] E, [2017-09-14T04:44:36.644701 #22039:737134] ERROR -- : [NoMethodError]: undefined method `container_nodes' for #<Zone:0x0000000af8bf88>  Method:[rescue in deliver]
[----] E, [2017-09-14T04:44:36.645494 #22039:737134] ERROR -- : /opt/rh/cfme-gemset/gems/activemodel-5.0.3/lib/active_model/attribute_methods.rb:433:in `method_missing'
/var/www/miq/vmdb/app/models/miq_alert.rb:137:in `public_send'
/var/www/miq/vmdb/app/models/miq_alert.rb:137:in `block (2 levels) in evaluate_hourly_timer'
/var/www/miq/vmdb/app/models/miq_alert.rb:131:in `each'
/var/www/miq/vmdb/app/models/miq_alert.rb:131:in `block in evaluate_hourly_timer'
/var/www/miq/vmdb/app/models/miq_alert.rb:130:in `each'
/var/www/miq/vmdb/app/models/miq_alert.rb:130:in `evaluate_hourly_timer'
/var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:107:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:135:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:153:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:147:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:147:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:340:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:337:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:337:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:160:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:134:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:339:in `block in start_runner'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
...


This all from 5.8.1, found in customer logs from bug 1489616.  I have DB dump restored locally, if anyone wants to debug further.
Didn't test on master.

Comment 4 Loic Avenel 2017-09-26 21:17:30 UTC
I was looking at different parameters:

Based: Node:
What to Evaluate: Nothing
Driving Event: Kubernetes Event..

I think the issue is not in the definition of the Alert BUT in the Alert Profiles, you cannot add a provider to a Node or a Provider, these are not available for Node Alert Profiles.

We can try at Provider level

Comment 5 Beni Paskin-Cherniavsky 2017-09-27 07:29:07 UTC
Sorry, maybe I commented on wrong BZ (or should open new one)?

What I'm talking about are "good old" MiqExpression-based periodically evaluated alerts, not externally evaluated ones.  We only worked on external alerts for container providers, but didn't make sure internally evaluated MiqExpression-based alerts work in the backend.  AFAIK we didn't intend UI to allow those, it's accidental side effect of allowing externally evaluated Node alerts.
(Not sure about profile UI, but I see in customer DB they did define MiqExpression Node alerts, and log proves it tries to run, with error.)

Error in traceback sounds very easy to fix (add few lines in zone.rb so it can find targets to evaluate).
Of course that could reveal more work to do. (Specifically, I heard if expression looks at metrics, there is a complex mechanism to collect those metrics frequently, and that's something we wanted to neglect in upcoming effort to scale metric collection?)

The other option is forbid these in UI for now.

Comment 8 Dave Johnson 2017-09-28 15:04:05 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 10 Dave Johnson 2017-09-28 15:45:02 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 11 Loic Avenel 2018-01-25 13:28:45 UTC
*** Bug 1438002 has been marked as a duplicate of this bug. ***