We are generating an MiqEvent "evm_worker_memory_exceeded" It looks like it is working but am diving deeper to see if a bug is preventing this from surfacing
https://github.com/ManageIQ/manageiq/pull/17640
dmetzger57: circling back before closing > Defining a new synthetic event which is triggered when a worker has been identified as exceeding its associated memory threshold. That sounds like we want to raise an event if the worker goes over memory threshold. We currently raise an event and apply a policy. I had read this as adding a ui notification. (now I'm not so sure) > Should de-duplication be implemented so that at most 1 event is generated when a worker exceeds the threshold Don't think we do this anywhere. > It is possible for a single [worker] to be identified as exceeding the threshold if it does not complete termination This sounds like doubling down on the de-duplication statement. Serious concern that that there will be way too many over memory exceptions. ? Is this a request, or an optimization to reduce the number of events generated?
Dennis, see Keenan's comment #8.
It was a request to ensure we don't spam the user with multiple memory exceeded events for the same worker once it has exceeded the threshold but does not terminate for a long time - in which case it could be found multiple times in the "exceeded" state. From out of band conversations, it sounds like the code already handles this in the check logic.
https://github.com/ManageIQ/manageiq/pull/17673
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/ebd613a5a6deecd0abf0e47b6bf8a620e516a749 commit ebd613a5a6deecd0abf0e47b6bf8a620e516a749 Author: Keenan Brock <keenan> AuthorDate: Thu Jun 21 17:04:31 2018 -0400 Commit: Keenan Brock <keenan> CommitDate: Thu Jun 21 17:04:31 2018 -0400 notify users of killing workers when exceed memory - Provide more complete information for worker out of memory errors - Provide notification messages for out of memory errors - Introduce ability to disable notification - Include notification.message in alert emails https://bugzilla.redhat.com/show_bug.cgi?id=1535177 app/models/miq_action.rb | 1 + app/models/miq_event.rb | 7 +- app/models/miq_server/worker_management/monitor/validation.rb | 10 +- app/models/notification.rb | 11 +- app/models/notification_type.rb | 10 + db/fixtures/notification_types.yml | 5 + spec/models/miq_action_spec.rb | 46 + spec/models/miq_event_spec.rb | 22 +- spec/models/notification_spec.rb | 21 + 9 files changed, 121 insertions(+), 12 deletions(-)
Merged into master Notifications have a template to display information for some EventsStream events. This notification event is now included in alert policy emails. More information can be found in https://github.com/ManageIQ/manageiq/pull/17673
Tested on 5.10.0.14. The alert emails do not include the additional information mentioned in https://github.com/ManageIQ/manageiq/pull/17673 : **** Before Alert Triggered Alert 'aKB Worker killed', triggered Event: Alert condition met Entity: (MiqServer) local After Alert Triggered Alert 'aKB Worker killed', triggered Event: Alert condition met Entity: (MiqServer) local Details: Killing worker MiqPriorityWorker due to excessive memory usage. 144.6 MB used memory exceeds limit of 100 MB. **** My alert emails still look like: **** Alert Triggered Alert 'tpa', triggered Event: Alert condition met Entity: (MiqServer) EVM **** I'm guessing this is because the evm_worker_memory_exceeded notification type added to db/fixtures/notification_types.yml in this commit did not actually get added to the database: vmdb_production=# select * from notification_types where name like 'evm_worker_memory_exceeded'; id | name | level | audience | message | expires_in ----+------+-------+----------+---------+------------ (0 rows)
Tasos, I was expecting those changes to get seeded to the database. Thanks for the very clear issue description - I'm looking into this now
Tasos, A new database, or a database that comes with a nightly build should have these new values. If you have an existing database, the templates will not get automatically updated at server startup. We do auto update a few of the tables, but this is not one of them. Is it possible to run the following script from a shell prompt and then restart/start the server? This will not change any EMS or VM values that you already have in the database. It will only bring the default values up to date. vmdb # change to the manageiq directory rake db:seed # populate the default tables. This will populate the database with the proper templates. Thank you, Keenan
Tested on the latest build 5.10.0.15 (9/14), and the notification type record is still missing from the db. Running 'rake db:seed' completes successfully but doesn't resolve the issue either.
Sorry for the delay Tasos. Was spinning my wheels because I did not properly assign this to a server. I found the issue and have put in a PR. My reproduction steps: (reproduction steps are so I don't spin my wheels again) You probably want to just use the UI for this configuration instead of my hack vmdb/config/settings/production.local.yml ######### :smtp: :authentication: 'plain' :domain: mydomain.com :from: cfadmin :host: 127.0.0.1 :port: '1025' :workers: :worker_base: :queue_worker_base: :priority_worker: # this will force priority worker to keep restarting :memory_threshold: 100.megabytes :count: 1 ###### In the UI: Administrator Menu: Configuration (Upper Right) Tree view: Settings > Zones > Zone: Default > Server: EVM[1] Current (this is default for me) Server Control Server Roles Notifier: enable (Provider Inventory: disable - if you want to cut down on server work) Left Menu: Control > Explorer Tree view: Alert Menu: Configuration > Create Alert name: aaRestartWorker enabled: true based upon: Server evaluate: Nothing Driving Event: Appliance Operation: Worker Exceeded Memory Notification Frequency: 1 minute Send Email: true To: problem Use "Add (enter Manually)" to enter Ensure the name shows up in the To box You probably have a different value save Tree view: Alert Profile Tree view: All Alert Profiles > Server Alert Profiles Menu: Configuration > Add Alert Profile Description: aaRestartWorkerProfile Move aaRestart to right save Tree view: aaRestartWorkerProfile Menu: Configuration > Edit Alert Assignments for this profile Choose all servers (only 1 server "EVM" for me) save
https://github.com/ManageIQ/manageiq/pull/18035
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/acd0a192b30b09850595ccda6d20daa348384286 commit acd0a192b30b09850595ccda6d20daa348384286 Author: Keenan Brock <keenan> AuthorDate: Fri Sep 28 13:11:20 2018 -0400 Commit: Keenan Brock <keenan> CommitDate: Fri Sep 28 13:11:20 2018 -0400 Fix Notifications None fixes ebd613a5 The notification type "NONE" was introduced, but it was not part of basic validations https://bugzilla.redhat.com/show_bug.cgi?id=1535177 app/models/notification_type.rb | 2 +- spec/models/notification_type_spec.rb | 8 + 2 files changed, 9 insertions(+), 1 deletion(-)
New commit detected on ManageIQ/manageiq/hammer: https://github.com/ManageIQ/manageiq/commit/33aa42d74dafb5cc00206c4af28dfef3ca079ace commit 33aa42d74dafb5cc00206c4af28dfef3ca079ace Author: Brandon Dunne <brandondunne> AuthorDate: Fri Sep 28 18:26:33 2018 -0400 Commit: Brandon Dunne <brandondunne> CommitDate: Fri Sep 28 18:26:33 2018 -0400 Merge pull request #18035 from kbrock/monitor_notifier_fix Fix Notifications None (cherry picked from commit 8489078c17dcc236541e3c3cebdf20bb2e6291c1) https://bugzilla.redhat.com/show_bug.cgi?id=1535177 app/models/notification_type.rb | 2 +- spec/models/notification_type_spec.rb | 8 + 2 files changed, 9 insertions(+), 1 deletion(-)
Verified on 5.10.0.18.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0212