Created attachment 649314 [details] log Description of problem: after live storage migration we see in vdsm log the following error: ibvirtEventLoop::ERROR::2012-11-21 18:59:21,020::libvirtvm::1986::vm.Vm::(_onBlockJobEvent) vmId=`29d84c1b-666e-4554-8e13-8fccee67d28d`::Live merge completed for an unexpected path: /rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/a5f10bab-bd9d-4834-b1d9-b29d0ec887dc/images/d1ba6b3d-3c2b-4e39-b454-4f88b8b8bef6/79c0ab39-1a48-4324-a4fa-ca1c4e3b5c47 this is an event logged by libvirt which vdsm does not trap. since this is not actually an ERROR it would help debugging if we remove the event from the log or change the ERROR to different log level entry. Version-Release number of selected component (if applicable): vdsm-4.9.6-44.0.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: 1. run live storage migration for several vms 2. 3. Actual results: we log a libvirt event as ERROR Expected results: this even should not be logged as ERROR in vdsm log. Additional info: full log
The message shouldn't be propagated at all.
The issue here is that blockJobAbort (at the end of the live storage migration) generates an event that is similar to the one generated at the end of a live merge. This means that vdsm should try and identify why the event was received (whether it is for live storage migration or live merge). The pseudo code would be something like: def eventCallback(...): if (the event is related to a live storage migration): log... else if (the event is related to a live merge): (the current logic) else: log the error
After talking to Dafna- This should be reproduced by using a 3.2 DC with a 6.4 host. This is storage migration (disks) and is not related to vm migration.
Same error I get in scale environment: Version-Release number of selected component (if applicable): RHEVM 3.2 - SF17.5 environment: RHEVM: rhevm-3.2.0-11.30.el6ev.noarch VDSM: vdsm-4.10.2-22.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.5.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 PythonSDK: rhevm-sdk-3.2.0.11-1.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. Create FCP Data-center 2. Create VM with 25 disks with 'VirtIO' interface 3. Select and move all 25 VM's disk from DC-01 to DC-02 Logs attached.
Created attachment 760650 [details] ## Logs rhevm, vdsm, libvirt
Fede, is this vdsm related or libvirt?
*** Bug 998280 has been marked as a duplicate of this bug. ***
Same error reproduced in RHEVM 3.3 - IS18 environment: Host OS: RHEL 6.5 RHEVM: rhevm-3.3.0-0.25.beta1.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.15-1.el6ev.noarch VDSM: vdsm-4.13.0-0.2.beta1.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-27.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.412.el6.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 VDSM Log: libvirtEventLoop::DEBUG::2013-10-20 11:21:18,107::vm::4792::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`ce280769-8b99-4810-8f0f-29757ad3abc2`::event Suspended detail 0 opaque None libvirtEventLoop::ERROR::2013-10-20 11:21:18,128::vm::3837::vm.Vm::(_onBlockJobEvent) vmId=`ce280769-8b99-4810-8f0f-29757ad3abc2`::Live merge completed for an unexpected path: /rhev/data-center/mnt/blockSD/fc786f96-81c2-488f-a5a9-1b2c0a4a0aa2/images/670d77f5-1bf5-4c2d-8b73-8f0a56a6f97a/f9f97f91-17b2-48f4-8f6a-50903ab0ff17 libvirtEventLoop::DEBUG::2013-10-20 11:21:18,184::vm::4792::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`ce280769-8b99-4810-8f0f-29757ad3abc2`::event Resumed detail 0 opaque None Thread-1872::DEBUG::2013-10-20 11:21:18,185::task::579::TaskManager.Task::(_updateState) Task=`6bd6814f-197c-4ace-93a3-c04a1e5864e3`::moving from state init -> state preparing Thread-1872::INFO::2013-10-20 11:21:18,185::logUtils::44::dispatcher::(wrapper) Run and protect: teardownImage(sdUUID='561ce535-9830-49a3-975b-ac5fa2915cce', spUUID='1e157529-0537-4696-8a6d-9b4be4680e44', imgUUID='670d77f5-1bf5-4c2d-8b73-8f0a56a6f97a', volUUID=None) Logs attached
Created attachment 814178 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Adam, with all the changes around how block jobs are handled in 3.5, is this still relevant?
It's changed slightly. We don't handle block job events at all so any block job event that comes through (be it from a live merge or LSM) will trigger a warning in the log. If we want to make any changes at all, we could trap the block job event and print it as a debug message.
(In reply to Adam Litke from comment #13) > It's changed slightly. We don't handle block job events at all so any block > job event that comes through (be it from a live merge or LSM) will trigger a > warning in the log. If we want to make any changes at all, we could trap > the block job event and print it as a debug message. We probably should, but not as a high priority. We can attempt to handle this post 3.6.0's feature freeze.
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
(In reply to Allon Mureinik from comment #14) > (In reply to Adam Litke from comment #13) > > It's changed slightly. We don't handle block job events at all so any block > > job event that comes through (be it from a live merge or LSM) will trigger a > > warning in the log. If we want to make any changes at all, we could trap > > the block job event and print it as a debug message. > > We probably should, but not as a high priority. > We can attempt to handle this post 3.6.0's feature freeze. Will we? Apart from being confusing, I'd close this bug as WONTFIX.
(In reply to Yaniv Kaul from comment #16) > (In reply to Allon Mureinik from comment #14) > > (In reply to Adam Litke from comment #13) > > > It's changed slightly. We don't handle block job events at all so any block > > > job event that comes through (be it from a live merge or LSM) will trigger a > > > warning in the log. If we want to make any changes at all, we could trap > > > the block job event and print it as a debug message. > > > > We probably should, but not as a high priority. > > We can attempt to handle this post 3.6.0's feature freeze. > > Will we? Apart from being confusing, I'd close this bug as WONTFIX. Worth leaving open to re-examine during the SDM work.
Can no longer reproduce with 4.0's codebase, moving to ON_QA to verification.
Verified on ovirt-engine-4.0.0-0.0.master.20160406161747.gita4ecba2.el7.centos.noarch No error message in vdsm
oVirt 4.0.0 has been released, closing current release.