Created attachment 807112 [details] logs Description of problem: in case after hook fails, engine report task fails although the plug completed successfully Engine doesn't aware of the new HW plugged Version-Release number of selected component (if applicable): is17 How reproducible: 100% Steps to Reproduce: 1. add JPG file to /usr/libexec/vdsm/hooks/after_nic_hotplug/ 2. chmod 775 the JPG 3. hotplug NIC/DISK Actual results: plug completed successfully, task failed, engine doesn't aware of the new HW Expected results: task should completed, eneinge should show the new HW, alert about hook failure Additional info: ========== Thread-86162::DEBUG::2013-10-03 17:04:00,583::vm::3028::vm.Vm::(hotplugNic) vmId=`efaf5dd0-3c42-4369-9359-9d7895925ff8`::Hotplug NIC xml: <interface type="bridge"> <mac address="00:1a:4a:a4:72:01"/> <model type="virtio"/> <source bridge="rhevm"/> <filterref filter="vdsm-no-mac-spoofing"/> <link state="up"/> </interface> Dummy-33329::DEBUG::2013-10-03 17:04:00,590::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) 'dd if=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd N one) Dummy-33329::DEBUG::2013-10-03 17:04:00,676::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0362193 s, 28.3 MB/s\n'; <rc> = 0 Thread-86162::DEBUG::2013-10-03 17:04:00,920::utils::489::root::(execCmd) '/usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg' (cwd None) Thread-86162::DEBUG::2013-10-03 17:04:00,950::utils::509::root::(execCmd) FAILED: <err> = '/usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: cannot execute binary file\n'; <rc> = 126 Thread-86162::INFO::2013-10-03 17:04:00,951::hooks::88::root::(_runHooksDir) /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: /usr/libexec/vdsm/hooks/after_nic_hotplug/1e4f4c7.jpg: cannot execute binary file Thread-86162::WARNING::2013-10-03 17:04:00,951::hooks::95::root::(_runHooksDir) hook returned unexpected return code 126 Thread-86162::ERROR::2013-10-03 17:04:00,952::BindingXMLRPC::993::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 979, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 276, in vmHotplugNic return vm.hotplugNic(params) File "/usr/share/vdsm/API.py", line 400, in hotplugNic return curVm.hotplugNic(params) File "/usr/share/vdsm/vm.py", line 3051, in hotplugNic params=customProps) File "/usr/share/vdsm/hooks.py", line 254, in after_nic_hotplug params=params) File "/usr/share/vdsm/hooks.py", line 98, in _runHooksDir raise HookError() HookError
What was the flow in which you have tested it ? What after hook ? (after what ?)
(In reply to Barak from comment #1) > What was the flow in which you have tested it ? > What after hook ? (after what ?) as you can see in the description, add JPG file to the after_disk_plug hooks directory change it to executable hot plug the disk engine fails but disk plugged.
Arthur, Please define the desired behavior, It should be generic for all after hooks.
After speaking about it with Danken, It looks like the behaviour should be success on any action despite the fact that after hook failed. Hence this is an issue for this specific flow. Dan please verify ?
Indeed. An "after" hook cannot fail the operation, as the operation has already been done. A failure in such hook should be recored to the log, and the original return value should be passed to Engine.
All hooks should not fail operations. Apparently everywhere we fail the operation if the hook fails to run, we don't catch the HookError that can be raised from _runHooksDir, which means we throw the exception to engine. I can fix this specific location with catching the exception, reporting it to log and returning success to engine.. but just saying that we'll might bump into more cases as this, or we can avoid raising the HookError exception , return false instead and report if false returned , which I'm not sure about the implications - we have 76 locations that call to _runHookDir Dan, What do you say? changing all locations that catch the exception or return false instead?
As document in the man page, before_* hooks SHOULD fail the following operation. after_* hooks cannot fail the operation (since it's already done), and there, the exception should be swallowed (and logged).
I preferred to have something more general that checks for all after_ prefix. But this targeted to 3.3, so for now I fix only the specific hotplugNic after hook.
verified. tested on vdsm-4.13.0-0.7.beta1.el6ev.x86_64. Disk are pluged to running VM even when hook is failed.
As stated in the bug description, if "after_*" hook fails, engine reports an entire hotplug/hotunplug operation as failed, even though it was already done. When putting a jpg file in the after_* hook directory and plugging a disk, for example, webadmin GUI and engine logs show the hotplug operation as failed and the disk status after the hotplug is "inactive" even though it is already activated on the host. Please apply fix for disk hotplug/hotunplug (probably need to fix all "after_*")
The bug is about: """ Steps to Reproduce: 1. add JPG file to /usr/libexec/vdsm/hooks/after_nic_hotplug/ 2. chmod 775 the JPG 3. hotplug NIC/DISK """ means after_nic_hotplug. in commmet #8 i asked if require general solution for all after hooks and didn't get reply. Please provide separate BZ for that case, as the patch for this specific fix (nic hotplug) is already merged and backported. In the scope of this BZ please verify the stated scenario. moving back to ON_QA and waiting for bz number.
Yaniv, as you wrote in your last comment, in step 3 you can see NIC/DISK - which means disk as well, as storage team we opened it for DISK, after consult with Barak he asked to check if it happens with DISK only or all HW changes (stated in the summary). as you see it happens with all - means we need to fix it for all HW - as stated in the summary. I do not get the problem with reopen and I do not mind to open seperate bug for DISK but in this specific case it is the same issue so same bug. let's try to avoid ping pong and talk with me directly if needed. Aharon.
http://gerrit.ovirt.org/#/c/21464/
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
verified in is27
This bug appeared and fixed during the rhev-3.3 development cycle. No customer saw it, and there is no need to document it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html