Bug 1331186 - [events] Host memory usage exceeded defined threshold email message not generated.
Summary: [events] Host memory usage exceeded defined threshold email message not gener...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-3.6.7
: 3.6.7
Assignee: Martin Perina
QA Contact: eberman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-27 22:49 UTC by Bimal Chollera
Modified: 2019-11-14 07:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-29 16:19:55 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1364 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager (rhevm) bug fix 3.6.7 2016-06-29 20:18:44 UTC
oVirt gerrit 56960 0 'None' MERGED core: Fix audit event type ids in database 2020-06-08 01:48:01 UTC
oVirt gerrit 58306 0 'None' MERGED core: Fix audit event type ids in database 2020-06-08 01:48:01 UTC
oVirt gerrit 58307 0 'None' MERGED core: Fix audit event type ids in database 2020-06-08 01:48:01 UTC
oVirt gerrit 58308 0 'None' MERGED core: Fix audit event type ids in database 2020-06-08 01:48:01 UTC

Description Bimal Chollera 2016-04-27 22:49:26 UTC
Description of problem:

RHEV-M upgraded from 3.2 (3.3.x -> 3.4.x) to 3.5.7.  After the upgrade, the 'Host memory usage exceeded defined threshold' event notification is not generated hence there are no email notification sent.  There is no mention of the memory exceeded in the engine logs.

According to the audit_log, the log_type_name is USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN which is not correct.  Should be VDS_HIGH_MEM_USE.

~~~
log_time            | 2016-04-07 13:51:09.987-04
log_type_name       | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN  
log_type            | 532
severity            | 1
message             | Used memory of host XXXX [71%] exceeded defined threshold [70%].
processed           | t|
~~~


~~~    
    VDS_HIGH_MEM_USE(532, AuditLogSeverity.WARNING,
            AuditLogTimeInterval.MINUTE.getValue() * 30),

    @Deprecated
    USER_SUSPEND_VM_FINISH_SUCCESS(512),
    USER_SUSPEND_VM_FINISH_FAILURE(521, AuditLogSeverity.ERROR),
    USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN(130, AuditLogSeverity.ERROR),
~~~


LogMaxPhysicalMemoryUsedThresholdInPercentage setting in the database:

~~~
option_id |                  option_name                  | option_value | version 
-----------+-----------------------------------------------+--------------+---------
       462 | LogMaxPhysicalMemoryUsedThresholdInPercentage | 70           | general
~~~

Hosts with >70% memory usage

~~~
         vds_name          | usage_mem_percent 
---------------------------+-------------------
 XXXX                      |                70
 YYYY                      |                71
 ZZZZ                      |                72
~~~


Version-Release number of selected component (if applicable):

rhevm-3.5.7-0.1.el6ev.noarch
vdsm-4.10.2-22.0.el6ev - RHEV Hypervisor - 6.4 - 20130528.0.el6_4

How reproducible:

100% reproduced on end user system

Steps to Reproduce:
1.  Assign Host memory usage threshold in the RHEV-M GUI
2.  Configure ovirt-engine-notifier
3.  Increase memory usage on the host

Actual results:

There are no Host memory exceeded threshold email messages generated.

Expected results:

Expect Host memory exceeded threshold email messages generated.

Additional info:

Comment 6 Martin Perina 2016-04-28 17:44:11 UTC
Following patch https://gerrit.ovirt.org/29740 fixed duplicate audit_log event ids, but unfortunately it didn't contain database upgrade script. In tables audit_log and event_notification_hist we have stored both event_name and event_id so we need to provide upgrade script which will fix audit_log event_id changes.

Comment 8 Martin Perina 2016-05-04 10:32:28 UTC
Considered not to be a blocker for 3.6.6 release

Comment 10 Martin Perina 2016-05-31 12:07:59 UTC
I tested several upgrade scenarios between 3.4 and 3.5, here are results:

1. In 3.4 (and most probably also 3.3) 'Host memory usage exceeded defined threshold' is never sent by email, because the event is stored with 'USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN' type instead of 'VDS_HIGH_MEM_USE' in audit_log table. This issue was caused by duplicate event ids for audit log events and it was fixed in 3.5

2. After upgrade from 3.4.5 to 3.5.8 'Host memory usage exceeded defined threshold' events are sent by email always without any issues

3. On a clean 3.5.8 installation 'Host memory usage exceeded defined threshold' events are also sent by email without any issues

4. The only issue is, that we hadn't update records in audit_log table which was created before upgrade in previous RHEV versions (< 3.5), so those events are not displayed correctly in 3.5. This issue is handled by a patch attached to this bug.

Bimal, are you sure that engine upgrade from 3.2 -> 3.5.7 was finished successfully and you are really using engine version 3.5.7?

Comment 14 eberman 2016-06-23 18:04:07 UTC
what has been tested:

engine reconfigured to:
engine-config -s LogMaxPhysicalMemoryUsedThresholdInPercentage=50
engine-config -s LogPhysicalMemoryThresholdInMB=128
and engine was restarted 
smtp server was configured as well 

on:

3.4:
trigger from host : ran  pig stress

result:
1.no mail was sent
2.audit_log did show correct event
3.DB present the wrong message :log_type_name       | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN  
log_type            | 532

3.5
trigger:stress by "pig"

results:
1. mail was sent
2.audit_log did show correct event
3.DB id and alert showed VDS_HIGH_MEM_USE  with 532 id
engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532;
 audit_log_id | vm_name |                 log_type_name                 | log_type |          log_time          
--------------+---------+-----------------------------------------------+----------+----------------------------
          850 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      532 | 2016-06-22 15:01:42.714+03
          857 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      532 | 2016-06-23 09:03:24.446+03
         1101 |         | VDS_HIGH_MEM_USE                              |      532 | 2016-06-23 13:35:21.475+03
(3 rows)

3.6.7
trigger:stress by "pig"

results:
1.mail was sent
2.audit_log did show correct event
3.new id and alert were indeed changed as expected: 



engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532;
 audit_log_id | vm_name |  log_type_name   | log_type |          log_time          
--------------+---------+------------------+----------+----------------------------
         1101 |         | VDS_HIGH_MEM_USE |      532 | 2016-06-23 13:35:21.475+03
         1327 |         | VDS_HIGH_MEM_USE |      532 | 2016-06-23 15:35:16.754+03
(2 rows)

and:

engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=130;
 audit_log_id | vm_name |                 log_type_name                 | log_type |          log_time          
--------------+---------+-----------------------------------------------+----------+----------------------------
          850 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      130 | 2016-06-22 15:01:42.714+03
          857 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      130 | 2016-06-23 09:03:24.446+03
(2 rows)

Comment 16 errata-xmlrpc 2016-06-29 16:19:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1364


Note You need to log in before you can comment on or make changes to this bug.