Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1331186

Summary: [events] Host memory usage exceeded defined threshold email message not generated.
Product: Red Hat Enterprise Virtualization Manager Reporter: Bimal Chollera <bcholler>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED ERRATA QA Contact: eberman
Severity: high Docs Contact:
Priority: high    
Version: 3.5.7CC: bcholler, gklein, lsurette, mgoldboi, mperina, oourfali, pstehlik, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ovirt-3.6.7Keywords: Regression, ZStream
Target Release: 3.6.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-29 16:19:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bimal Chollera 2016-04-27 22:49:26 UTC
Description of problem:

RHEV-M upgraded from 3.2 (3.3.x -> 3.4.x) to 3.5.7.  After the upgrade, the 'Host memory usage exceeded defined threshold' event notification is not generated hence there are no email notification sent.  There is no mention of the memory exceeded in the engine logs.

According to the audit_log, the log_type_name is USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN which is not correct.  Should be VDS_HIGH_MEM_USE.

~~~
log_time            | 2016-04-07 13:51:09.987-04
log_type_name       | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN  
log_type            | 532
severity            | 1
message             | Used memory of host XXXX [71%] exceeded defined threshold [70%].
processed           | t|
~~~


~~~    
    VDS_HIGH_MEM_USE(532, AuditLogSeverity.WARNING,
            AuditLogTimeInterval.MINUTE.getValue() * 30),

    @Deprecated
    USER_SUSPEND_VM_FINISH_SUCCESS(512),
    USER_SUSPEND_VM_FINISH_FAILURE(521, AuditLogSeverity.ERROR),
    USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN(130, AuditLogSeverity.ERROR),
~~~


LogMaxPhysicalMemoryUsedThresholdInPercentage setting in the database:

~~~
option_id |                  option_name                  | option_value | version 
-----------+-----------------------------------------------+--------------+---------
       462 | LogMaxPhysicalMemoryUsedThresholdInPercentage | 70           | general
~~~

Hosts with >70% memory usage

~~~
         vds_name          | usage_mem_percent 
---------------------------+-------------------
 XXXX                      |                70
 YYYY                      |                71
 ZZZZ                      |                72
~~~


Version-Release number of selected component (if applicable):

rhevm-3.5.7-0.1.el6ev.noarch
vdsm-4.10.2-22.0.el6ev - RHEV Hypervisor - 6.4 - 20130528.0.el6_4

How reproducible:

100% reproduced on end user system

Steps to Reproduce:
1.  Assign Host memory usage threshold in the RHEV-M GUI
2.  Configure ovirt-engine-notifier
3.  Increase memory usage on the host

Actual results:

There are no Host memory exceeded threshold email messages generated.

Expected results:

Expect Host memory exceeded threshold email messages generated.

Additional info:

Comment 6 Martin Perina 2016-04-28 17:44:11 UTC
Following patch https://gerrit.ovirt.org/29740 fixed duplicate audit_log event ids, but unfortunately it didn't contain database upgrade script. In tables audit_log and event_notification_hist we have stored both event_name and event_id so we need to provide upgrade script which will fix audit_log event_id changes.

Comment 8 Martin Perina 2016-05-04 10:32:28 UTC
Considered not to be a blocker for 3.6.6 release

Comment 10 Martin Perina 2016-05-31 12:07:59 UTC
I tested several upgrade scenarios between 3.4 and 3.5, here are results:

1. In 3.4 (and most probably also 3.3) 'Host memory usage exceeded defined threshold' is never sent by email, because the event is stored with 'USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN' type instead of 'VDS_HIGH_MEM_USE' in audit_log table. This issue was caused by duplicate event ids for audit log events and it was fixed in 3.5

2. After upgrade from 3.4.5 to 3.5.8 'Host memory usage exceeded defined threshold' events are sent by email always without any issues

3. On a clean 3.5.8 installation 'Host memory usage exceeded defined threshold' events are also sent by email without any issues

4. The only issue is, that we hadn't update records in audit_log table which was created before upgrade in previous RHEV versions (< 3.5), so those events are not displayed correctly in 3.5. This issue is handled by a patch attached to this bug.

Bimal, are you sure that engine upgrade from 3.2 -> 3.5.7 was finished successfully and you are really using engine version 3.5.7?

Comment 14 eberman 2016-06-23 18:04:07 UTC
what has been tested:

engine reconfigured to:
engine-config -s LogMaxPhysicalMemoryUsedThresholdInPercentage=50
engine-config -s LogPhysicalMemoryThresholdInMB=128
and engine was restarted 
smtp server was configured as well 

on:

3.4:
trigger from host : ran  pig stress

result:
1.no mail was sent
2.audit_log did show correct event
3.DB present the wrong message :log_type_name       | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN  
log_type            | 532

3.5
trigger:stress by "pig"

results:
1. mail was sent
2.audit_log did show correct event
3.DB id and alert showed VDS_HIGH_MEM_USE  with 532 id
engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532;
 audit_log_id | vm_name |                 log_type_name                 | log_type |          log_time          
--------------+---------+-----------------------------------------------+----------+----------------------------
          850 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      532 | 2016-06-22 15:01:42.714+03
          857 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      532 | 2016-06-23 09:03:24.446+03
         1101 |         | VDS_HIGH_MEM_USE                              |      532 | 2016-06-23 13:35:21.475+03
(3 rows)

3.6.7
trigger:stress by "pig"

results:
1.mail was sent
2.audit_log did show correct event
3.new id and alert were indeed changed as expected: 



engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=532;
 audit_log_id | vm_name |  log_type_name   | log_type |          log_time          
--------------+---------+------------------+----------+----------------------------
         1101 |         | VDS_HIGH_MEM_USE |      532 | 2016-06-23 13:35:21.475+03
         1327 |         | VDS_HIGH_MEM_USE |      532 | 2016-06-23 15:35:16.754+03
(2 rows)

and:

engine=# select audit_log_id, vm_name, log_type_name, log_type, log_time from audit_log where log_type=130;
 audit_log_id | vm_name |                 log_type_name                 | log_type |          log_time          
--------------+---------+-----------------------------------------------+----------+----------------------------
          850 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      130 | 2016-06-22 15:01:42.714+03
          857 |         | USER_SUSPEND_VM_FINISH_FAILURE_WILL_TRY_AGAIN |      130 | 2016-06-23 09:03:24.446+03
(2 rows)

Comment 16 errata-xmlrpc 2016-06-29 16:19:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1364