Description of problem: RHEV-M upgraded from 3.2 (3.3.x -> 3.4.x) to 3.5.7. After the upgrade, the 'Host memory usage exceeded defined threshold' event notification is not generated hence there are no email notification sent. There is no mention of the memory exceeded in the engine logs. According to the audit_log, the log_type_name is USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN which is not correct. Should be VDS_HIGH_MEM_USE. ~~~ log_time | 2016-04-07 13:51:09.987-04 log_type_name | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN log_type | 532 severity | 1 message | Used memory of host XXXX [71%] exceeded defined threshold [70%]. processed | t| ~~~ ~~~ VDS_HIGH_MEM_USE(532, AuditLogSeverity.WARNING, AuditLogTimeInterval.MINUTE.getValue() * 30), @Deprecated USER_SUSPEND_VM_FINISH_SUCCESS(512), USER_SUSPEND_VM_FINISH_FAILURE(521, AuditLogSeverity.ERROR), USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN(130, AuditLogSeverity.ERROR), ~~~ LogMaxPhysicalMemoryUsedThresholdInPercentage setting in the database: ~~~ option_id | option_name | option_value | version -----------+-----------------------------------------------+--------------+--------- 462 | LogMaxPhysicalMemoryUsedThresholdInPercentage | 70 | general ~~~ Hosts with >70% memory usage ~~~ vds_name | usage_mem_percent ---------------------------+------------------- XXXX | 70 YYYY | 71 ZZZZ | 72 ~~~ Version-Release number of selected component (if applicable): rhevm-3.5.7-0.1.el6ev.noarch vdsm-4.10.2-22.0.el6ev - RHEV Hypervisor - 6.4 - 20130528.0.el6_4 How reproducible: 100% reproduced on end user system Steps to Reproduce: 1. Assign Host memory usage threshold in the RHEV-M GUI 2. Configure ovirt-engine-notifier 3. Increase memory usage on the host Actual results: There are no Host memory exceeded threshold email messages generated. Expected results: Expect Host memory exceeded threshold email messages generated. Additional info:
Following patch https://gerrit.ovirt.org/29740 fixed duplicate audit_log event ids, but unfortunately it didn't contain database upgrade script. In tables audit_log and event_notification_hist we have stored both event_name and event_id so we need to provide upgrade script which will fix audit_log event_id changes.
Considered not to be a blocker for 3.6.6 release
I tested several upgrade scenarios between 3.4 and 3.5, here are results: 1. In 3.4 (and most probably also 3.3) 'Host memory usage exceeded defined threshold' is never sent by email, because the event is stored with 'USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN' type instead of 'VDS_HIGH_MEM_USE' in audit_log table. This issue was caused by duplicate event ids for audit log events and it was fixed in 3.5 2. After upgrade from 3.4.5 to 3.5.8 'Host memory usage exceeded defined threshold' events are sent by email always without any issues 3. On a clean 3.5.8 installation 'Host memory usage exceeded defined threshold' events are also sent by email without any issues 4. The only issue is, that we hadn't update records in audit_log table which was created before upgrade in previous RHEV versions (< 3.5), so those events are not displayed correctly in 3.5. This issue is handled by a patch attached to this bug. Bimal, are you sure that engine upgrade from 3.2 -> 3.5.7 was finished successfully and you are really using engine version 3.5.7?
what has been tested: engine reconfigured to: engine-config -s LogMaxPhysicalMemoryUsedThresholdInPercentage=50 engine-config -s LogPhysicalMemoryThresholdInMB=128 and engine was restarted smtp server was configured as well on: 3.4: trigger from host : ran pig stress result: 1.no mail was sent 2.audit_log did show correct event 3.DB present the wrong message :log_type_name | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN log_type | 532 3.5 trigger:stress by "pig" results: 1. mail was sent 2.audit_log did show correct event 3.DB id and alert showed VDS_HIGH_MEM_USE with 532 id engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532; audit_log_id | vm_name | log_type_name | log_type | log_time --------------+---------+-----------------------------------------------+----------+---------------------------- 850 | | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN | 532 | 2016-06-22 15:01:42.714+03 857 | | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN | 532 | 2016-06-23 09:03:24.446+03 1101 | | VDS_HIGH_MEM_USE | 532 | 2016-06-23 13:35:21.475+03 (3 rows) 3.6.7 trigger:stress by "pig" results: 1.mail was sent 2.audit_log did show correct event 3.new id and alert were indeed changed as expected: engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532; audit_log_id | vm_name | log_type_name | log_type | log_time --------------+---------+------------------+----------+---------------------------- 1101 | | VDS_HIGH_MEM_USE | 532 | 2016-06-23 13:35:21.475+03 1327 | | VDS_HIGH_MEM_USE | 532 | 2016-06-23 15:35:16.754+03 (2 rows) and: engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=130; audit_log_id | vm_name | log_type_name | log_type | log_time --------------+---------+-----------------------------------------------+----------+---------------------------- 850 | | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN | 130 | 2016-06-22 15:01:42.714+03 857 | | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN | 130 | 2016-06-23 09:03:24.446+03 (2 rows)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1364