1042829 – [RFE] Add new event notifiers for VM_DOWN_ERROR and VDS_INITIATED_RUN_VM_FAIL

Bug 1042829 - [RFE] Add new event notifiers for VM_DOWN_ERROR and VDS_INITIATED_RUN_VM_FAIL

Summary: [RFE] Add new event notifiers for VM_DOWN_ERROR and VDS_INITIATED_RUN_VM_FAIL

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Adam Litke
QA Contact:	Jiri Belka
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:	1072527
TreeView+	depends on / blocked

Reported:	2013-12-13 13:33 UTC by Adam Litke
Modified:	2014-03-31 15:05 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ovirt-3.4.0-alpha1
Clone Of:
Environment:
Last Closed:	2014-03-31 15:05:42 UTC
oVirt Team:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	22247	0	None	None	None	Never
oVirt gerrit	24397	0	None	None	None	Never

Description Adam Litke 2013-12-13 13:33:38 UTC

There are existing audit log entries for VM_DOWN_ERROR (when a VM exits due to an error condition) and VDS_INITIATED_RUN_VM_FAIL (when a vm being migrated by an internal process fails to start on the selected host).  Both of these events should have associated event notifiers so users can be alerted when they occur.

Comment 1 Sandro Bonazzola 2014-01-13 13:57:07 UTC

oVirt 3.4.0 alpha has been released including the fix for this issue.

Comment 2 Jiri Belka 2014-01-27 11:37:54 UTC

fail, ovirt-engine-3.4.0-0.2.master.20140112020439.git9ad8529.el6.noarch, ovirt-engine-dbscripts-3.4.0-0.2.master.20140112020439.git9ad8529.el6.noarch

* While assigning event to an user:

-%-
Error while executing action: The notification event VDS_INITIATED_RUN_VM_FAILED is unsupported.
-%-

You have spelling typo, FAIL vs FAILED in diffs.

packaging/dbscripts/upgrade/03_04_0230_vm_down_error_and_vds_initiated_run_fail_event_map.sql

+insert into event_map(event_up_name, event_down_name) values('VDS_INITIATED_RUN_VM_FAIL', ''); 

but in *.java diffs you use 'VDS_INITIATED_RUN_VM_FAILED'.

Plus please provide some trick to get the mentioned events, thank you.

Comment 3 Jiri Belka 2014-01-29 09:21:34 UTC

Plus please provide some trick to get the mentioned events, thank you. We are in time pressure to finalize our test cases.

Comment 4 Adam Litke 2014-01-29 13:32:08 UTC

In which log are you finding the message:
-%-
Error while executing action: The notification event VDS_INITIATED_RUN_VM_FAILED is unsupported.
-%-

When I assign the event to the admin user I only see the following message in engine.log:

-%-
INFO  [org.ovirt.engine.core.bll.AddEventSubscriptionCommand] (org.ovirt.thread.pool-6-thread-50) [791ef0f5] Running command: AddEventSubscriptionCommand internal: false. Entities affected :  ID: aaa00000-0000-0000-0000-123456789aaa Type: System
-%-

Comment 5 Adam Litke 2014-01-29 14:01:29 UTC

Here is how I am generating the events:

Start with a cluster of at least two hosts.

"Break" a host
0. Place a host into Maintenance mode
1. Install vdsm-hook-qemucmdline
2. sudo systemctl set-environment qemu_cmdline='["-badopt"]'
3. sudo systemctl restart vdsmd.service
4. Activate host

Try to run a VM on the broken host.  VM_DOWN_ERROR should result.

Run the VM on a good host.  Place all free hosts except the broken host into maintenance mode.  Finally, place the host that runs the VM into maintenance mode.  engine will try to migrate to the bad host and VDS_INITIATED_RUN_VM_FAIL will result.

Comment 6 Adam Litke 2014-01-29 15:19:47 UTC

(In reply to Adam Litke from comment #5)
> Here is how I am generating the events:
> 
> Start with a cluster of at least two hosts.
> 
> "Break" a host
> 0. Place a host into Maintenance mode
> 1. Install vdsm-hook-qemucmdline
> 2. sudo systemctl set-environment qemu_cmdline='["-badopt"]'
> 3. sudo systemctl restart vdsmd.service
> 4. Activate host
> 
> Try to run a VM on the broken host.  VM_DOWN_ERROR should result.
> 
> Run the VM on a good host.  Place all free hosts except the broken host into
> maintenance mode.  Finally, place the host that runs the VM into maintenance
> mode.  engine will try to migrate to the bad host and
> VDS_INITIATED_RUN_VM_FAIL will result.

Actually there is an easier way:

For VM_DOWN_ERROR: Just kill the qemu process on the host.

Comment 7 Adam Litke 2014-01-29 15:32:05 UTC

For VDS_INITIATED_RUN_VM_FAIL:

Start a highly available VM on a cluster with a single host.
On the host:
    sudo mv /usr/bin/qemu-system-x86_64 /usr/bin/qemu-system-x86_64.moved
    Kill the qemu process

The VM will fail to restart and VDS_INITIATED_RUN_VM_FAIL will be raised.

Comment 8 Jiri Belka 2014-01-30 08:45:20 UTC

As clarified on #rhev-devel, the error is seen in Admin Portal. Adam said he can't reproduce it anymore on newer code... but versions as written in comment#2 have still this issue.

When new build is done, re-check the code in part of the build and move back to ON_QA with 'Fixed in Version' value. Thank you.

Comment 9 Barak 2014-02-11 13:01:51 UTC

Moving this RFE to 3.5

Comment 10 Barak 2014-02-11 13:04:07 UTC

It was merged in December and included in 3.4,
Moving to ON_QA

Comment 11 Jiri Belka 2014-02-12 08:08:41 UTC

Please re-read comment #2. The issue still exists in Admin Portal (beta2).

Comment 12 Jiri Belka 2014-02-12 15:06:16 UTC

As I told alitke@ on IRC, I have problem to get VM_INITIATED_RUN_VM_FAIL event even I manually inserted subscriber for this event into DB. Thus I mean reproduce steps as written in comment #7 do not work. Please send to ON_QA only when you have it fully working and have valid reproducing steps (so there's no ping-pong). Thank you.

Comment 13 Adam Litke 2014-02-12 15:45:32 UTC

The ovirt-engine-3.4 branch needs a backport of 7e379 (http://gerrit.ovirt.org/#/c/23633/) in order to correct the event map.

Comment 14 Adam Litke 2014-02-13 15:22:59 UTC

(In reply to Jiri Belka from comment #12)
> As I told alitke@ on IRC, I have problem to get VM_INITIATED_RUN_VM_FAIL
> event even I manually inserted subscriber for this event into DB. Thus I
> mean reproduce steps as written in comment #7 do not work. Please send to
> ON_QA only when you have it fully working and have valid reproducing steps
> (so there's no ping-pong). Thank you.

The patch is posted and I have followed the steps in comment #7 to generate the email notification.  Please use the UI to subscribe to the event rather than hacking the DB directly.  Also, make sure your setup is properly configured to send event notifications by verifying that you can receive a VM console connected event.

As soon as this patch is merged I would like to see this moved back to ON_QA.

Comment 15 Itamar Heim 2014-02-13 18:31:27 UTC

pushing to target release 3.5, assuming its not planned for 3.4 at this point...

Comment 16 Adam Litke 2014-02-13 19:53:12 UTC

(In reply to Itamar Heim from comment #15)
> pushing to target release 3.5, assuming its not planned for 3.4 at this
> point...

The problem is that the feature is already merged but has a small bug that prevents it from working properly.  If the user tries to subscribe to the event for 'Failed to restart a VM on a different host' an error message about an unsupported event will be displayed.

The fix is a simple DB row insert and is already merged upstream.  Given the simple nature of the fix and the fact that it's not really an RFE anymore, but a bug fix, I recommend that we do this in 3.4.0.  Itamar, if you agree, please reset to 3.4.0.

Comment 17 Sandro Bonazzola 2014-03-03 15:24:08 UTC

This is an automated message.

This BZ should be fixed in oVirt 3.4.0 RC repository, assignee please update the status of the bug to ON_QA once you've verified that the change is included in the build.

Comment 18 Adam Litke 2014-03-03 15:35:50 UTC

Merged as commit 549f27f446fad508a83c89b47bd05260d197411a

Comment 19 Jiri Belka 2014-03-04 17:23:37 UTC

ok, beta3 / av2.

i doubt about the solution:

engine=# select * from event_subscriber ;
            subscriber_id             |        event_up_name        |  method_address   | tag_name | notification_method 
--------------------------------------+-----------------------------+-------------------+----------+---------------------
 fdfc627c-d875-11e0-90f0-83df133b58cc | VDS_INITIATED_RUN_VM_FAILED | jbelka |          | EMAIL
(1 row)

engine=# select * from event_map where event_up_name ilike '%run_vm%';
        event_up_name        | event_down_name 
-----------------------------+-----------------
 VDS_INITIATED_RUN_VM_FAIL   | 
 VDS_INITIATED_RUN_VM_FAILED | 
(2 rows)

so now there are _TWO_ event_up_name(s) :D

Comment 20 Adam Litke 2014-03-04 18:23:34 UTC

Yeah, I wanted to keep a simple backport of what was merged upstream.  We can remove the bad one in a subsequent patch upstream, but the erroneous event in the DB is harmless.

Comment 21 Sandro Bonazzola 2014-03-31 15:05:42 UTC

This is an automated message: moving to Closed CURRENT_RELEASE since oVirt 3.4.0 has been released.

Note You need to log in before you can comment on or make changes to this bug.