1015171 – engine report task failure after "after_nic_hotplug" hook fails, although HW plug completed successfully

Bug 1015171 - engine report task failure after "after_nic_hotplug" hook fails, although HW plug completed successfully

Summary: engine report task failure after "after_nic_hotplug" hook fails, although HW ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.3.0
Assignee:	Yaniv Bronhaim
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-03 14:45 UTC by Aharon Canan
Modified:	2016-02-10 19:19 UTC (History)
CC List:	12 users (show)
Fixed In Version:	is22
Doc Type:	Bug Fix
Doc Text:	The engine reported a task failure when "after_" hooks fail, even though the task completed successfully. This was because a HookError is raised, throwing an exception to the engine. The exception is now disabled on after_ hook failures, and the task completes.
Clone Of:
Environment:
Last Closed:	2014-01-21 16:17:18 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (1.58 MB, application/x-gzip) 2013-10-03 14:45 UTC, Aharon Canan	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0040	normal	SHIPPED_LIVE	vdsm bug fix and enhancement update	2014-01-21 20:26:21 UTC
oVirt gerrit	20876	None	None	None	Never
oVirt gerrit	20901	None	None	None	Never
oVirt gerrit	21464	None	None	None	Never
oVirt gerrit	21767	None	None	None	Never

Description Aharon Canan 2013-10-03 14:45:19 UTC

Created attachment 807112 [details]
logs

Description of problem:
in case after hook fails, engine report task fails although the plug completed successfully 
Engine doesn't aware of the new HW plugged

Version-Release number of selected component (if applicable):
is17

How reproducible:
100%

Steps to Reproduce:
1. add JPG file to /usr/libexec/vdsm/hooks/after_nic_hotplug/
2. chmod 775 the JPG
3. hotplug NIC/DISK

Actual results:
plug completed successfully, task failed, engine doesn't aware of the new HW

Expected results:
task should completed, eneinge should show the new HW, alert about hook failure

Additional info:
==========
Thread-86162::DEBUG::2013-10-03 17:04:00,583::vm::3028::vm.Vm::(hotplugNic) vmId=`efaf5dd0-3c42-4369-9359-9d7895925ff8`::Hotplug NIC xml: <interface type="bridge">
        <mac address="00:1a:4a:a4:72:01"/>
        <model type="virtio"/>
        <source bridge="rhevm"/>
        <filterref filter="vdsm-no-mac-spoofing"/>
        <link state="up"/>
</interface>

Dummy-33329::DEBUG::2013-10-03 17:04:00,590::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd N
one)
Dummy-33329::DEBUG::2013-10-03 17:04:00,676::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0362193 s, 28.3 MB/s\n'; <rc> = 0
Thread-86162::DEBUG::2013-10-03 17:04:00,920::utils::489::root::(execCmd) '/usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg' (cwd None)
Thread-86162::DEBUG::2013-10-03 17:04:00,950::utils::509::root::(execCmd) FAILED: <err> = '/usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: cannot execute binary file\n'; <rc> = 126
Thread-86162::INFO::2013-10-03 17:04:00,951::hooks::88::root::(_runHooksDir) /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: cannot execute binary file

Thread-86162::WARNING::2013-10-03 17:04:00,951::hooks::95::root::(_runHooksDir) hook returned unexpected return code 126
Thread-86162::ERROR::2013-10-03 17:04:00,952::BindingXMLRPC::993::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 979, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 276, in vmHotplugNic
    return vm.hotplugNic(params)
  File "/usr/share/vdsm/API.py", line 400, in hotplugNic
    return curVm.hotplugNic(params)
  File "/usr/share/vdsm/vm.py", line 3051, in hotplugNic
    params=customProps)
  File "/usr/share/vdsm/hooks.py", line 254, in after_nic_hotplug
    params=params)
  File "/usr/share/vdsm/hooks.py", line 98, in _runHooksDir
    raise HookError()
HookError

Comment 1 Barak 2013-10-07 12:44:15 UTC

What was the flow in which you have tested it ?
What after hook ? (after what ?)

Comment 2 Aharon Canan 2013-10-07 15:41:54 UTC

(In reply to Barak from comment #1)
> What was the flow in which you have tested it ?
> What after hook ? (after what ?)

as you can see in the description, 
add JPG file to the after_disk_plug hooks directory
change it to executable 
hot plug the disk
engine fails but disk plugged.

Comment 3 Barak 2013-10-14 13:14:32 UTC

Arthur,

Please define the desired behavior,
It should be generic for all after hooks.

Comment 4 Barak 2013-10-28 10:20:04 UTC

After speaking about it with Danken,

It looks like the behaviour should be success on any action despite the fact that after hook failed.

Hence this is an issue for this specific flow.

Dan please verify ?

Comment 5 Dan Kenigsberg 2013-10-28 14:46:34 UTC

Indeed. An "after" hook cannot fail the operation, as the operation has already been done. A failure in such hook should be recored to the log, and the original return value should be passed to Engine.

Comment 6 Yaniv Bronhaim 2013-11-03 16:17:24 UTC

All hooks should not fail operations. Apparently everywhere we fail the operation if the hook fails to run, we don't catch the HookError that can be raised from _runHooksDir, which means we throw the exception to engine.

I can fix this specific location with catching the exception, reporting it to log  and returning success to engine.. but just saying that we'll might bump into more cases as this, or we can avoid raising the HookError exception , return false instead and report if false returned , which I'm not sure about the implications - we have 76 locations that call to _runHookDir

Dan, What do you say? changing all locations that catch the exception or return false instead?

Comment 7 Dan Kenigsberg 2013-11-03 22:18:42 UTC

As document in the man page, before_* hooks SHOULD fail the following operation. after_* hooks cannot fail the operation (since it's already done), and there, the exception should be swallowed (and logged).

Comment 8 Yaniv Bronhaim 2013-11-04 15:50:26 UTC

I preferred to have something more general that checks for all after_ prefix.

But this targeted to 3.3, so for now I fix only the specific hotplugNic after hook.

Comment 9 Tareq Alayan 2013-11-10 12:52:12 UTC

verified. 
tested on vdsm-4.13.0-0.7.beta1.el6ev.x86_64.

Disk are pluged to running VM even when hook is failed.

Comment 10 Raz Tamir 2013-11-20 11:28:28 UTC

As stated in the bug description, if "after_*" hook fails, engine reports an entire hotplug/hotunplug operation as failed, even though it was already done.

When putting a jpg file in the after_* hook directory and plugging a disk, for example, webadmin GUI and engine logs show the hotplug operation as failed and the disk status after the hotplug is "inactive" even though it is already activated on the host. 

Please apply fix for disk hotplug/hotunplug

(probably need to fix all "after_*")

Comment 11 Yaniv Bronhaim 2013-11-20 12:11:59 UTC

The bug is about:
"""
Steps to Reproduce:
1. add JPG file to /usr/libexec/vdsm/hooks/after_nic_hotplug/
2. chmod 775 the JPG
3. hotplug NIC/DISK
"""

means after_nic_hotplug. in commmet #8 i asked if require general solution for all after hooks and didn't get reply.

Please provide separate BZ for that case, as the patch for this specific fix (nic hotplug) is already merged and backported.

In the scope of this BZ please verify the stated scenario. moving back to ON_QA and waiting for bz number.

Comment 12 Aharon Canan 2013-11-20 12:23:39 UTC

Yaniv, 

as you wrote in your last comment, in step 3 you can see NIC/DISK - which means disk as well, 
as storage team we opened it for DISK, after consult with Barak he asked to check if it happens with DISK only or all HW changes (stated in the summary).

as you see it happens with all - means we need to fix it for all HW - as stated in the summary.

I do not get the problem with reopen and I do not mind to open seperate bug for DISK but in this specific case it is the same issue so same bug.

let's try to avoid ping pong and talk with me directly if needed.

Aharon.

Comment 13 Yaniv Bronhaim 2013-11-20 14:58:10 UTC

http://gerrit.ovirt.org/#/c/21464/

Comment 14 Charlie 2013-11-28 00:28:50 UTC

This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 15 Raz Tamir 2013-12-15 15:40:38 UTC

verified in is27

Comment 16 Dan Kenigsberg 2014-01-09 08:24:30 UTC

This bug appeared and fixed during the rhev-3.3 development cycle. No customer saw it, and there is no need to document it.

Comment 17 errata-xmlrpc 2014-01-21 16:17:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html

Note You need to log in before you can comment on or make changes to this bug.