Bug 1377353 - [3.6.9] [PPC, RHEL 7.3] [regression] Failure to activate (hotplug) multiple disks, vm shutdowns.
Summary: [3.6.9] [PPC, RHEL 7.3] [regression] Failure to activate (hotplug) multiple d...
Keywords:
Status: CLOSED DUPLICATE of bug 1270717
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.7
Hardware: ppc64le
OS: Unspecified
unspecified
urgent vote
Target Milestone: ---
: ---
Assignee: Carlos Mestre González
QA Contact: Carlos Mestre González
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-19 13:22 UTC by Carlos Mestre González
Modified: 2016-09-20 13:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-20 13:19:15 UTC
oVirt Team: Storage
gklein: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
engine, vdsm, libvirt logs (2.09 MB, application/x-gzip)
2016-09-19 13:22 UTC, Carlos Mestre González
no flags Details

Description Carlos Mestre González 2016-09-19 13:22:10 UTC
Created attachment 1202481 [details]
engine, vdsm, libvirt logs

Description of problem:
Fails to hotplug multiple disks and the vm goes down.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a VM with a disk and OS install
2. plug 9 disks (COW, 1GB) (don't activate)
3. Start the VM
4. Activate 3 disks.

Actual results:
2016-09-19 15:31:00,718 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (ajp-/127.0.0.1:8702-9) [disks_syncAction_56ed6e08-4a3f-4557] Unexpected return value: StatusForXmlRpc [code=-32603, message=VM u'114666e3-5e5f-4eb5-a7b3-0acac48e4ef2' was not started yet or was shut down]
2016-09-19 15:31:00,719 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (ajp-/127.0.0.1:8702-9) [disks_syncAction_56ed6e08-4a3f-4557] Failed in 'HotPlugDiskVDS' method
2016-09-19 15:31:00,720 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (ajp-/127.0.0.1:8702-9) [disks_syncAction_56ed6e08-4a3f-4557] Unexpected return value: StatusForXmlRpc [code=-32603, message=VM u'114666e3-5e5f-4eb5-a7b3-0acac48e4ef2' was not started yet or was shut down]
2016-09-19 15:31:00,732 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-9) [disks_syncAction_56ed6e08-4a3f-4557] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: VM u'114666e3-5e5f-4eb5-a7b3-0acac48e4ef2' was not started yet or was shut down

==== vdsm:
hread-17231::DEBUG::2016-09-19 07:26:56,362::bindingxmlrpc::1273::vds::(wrapper) client [127.0.0.1]::call vmGetIoTunePolicy with ('114666e3-5e5f-4eb5-a7b3-0acac48e4ef2',) {}
Thread-17231::ERROR::2016-09-19 07:26:56,363::bindingxmlrpc::1300::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/rpc/bindingxmlrpc.py", line 1276, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/rpc/bindingxmlrpc.py", line 564, in vmGetIoTunePolicy
    return vm.getIoTunePolicy()
  File "/usr/share/vdsm/API.py", line 792, in getIoTunePolicy
    return v.getIoTunePolicy()
  File "/usr/share/vdsm/virt/vm.py", line 2577, in getIoTunePolicy
    qos = self._getVmPolicy()
  File "/usr/share/vdsm/virt/vm.py", line 2555, in _getVmPolicy
    metadata_xml = self._dom.metadata(
  File "/usr/share/vdsm/virt/virdomain.py", line 46, in __getattr__
    % self.vmid)
NotConnectedError: VM u'114666e3-5e5f-4eb5-a7b3-0acac48e4ef2' was not started yet or was shut down



Expected results:
Disks activated without a problem (as before and in x86)

Additional info:
Adding logs (you can search for the error with the same ID)

Comment 1 Carlos Mestre González 2016-09-19 13:26:48 UTC
Versions:
libvirt-2.0.0-9.el7.ppc64le
vdsm-4.17.35-1.el7ev.noarch
rhevm-3.6.9.2-0.1.el6.noarch

This is a regression (on PPC), seems to be working normally on x86 with the same build.

Comment 2 Yaniv Kaul 2016-09-19 13:39:39 UTC
I'm not sure a 40MB libvirt log is useful, but checking if qemu-kvm crashed is. Looks for a coredump, look at /var/log/messages, etc.

Comment 3 Carlos Mestre González 2016-09-19 14:13:43 UTC
Libvirtd log is small because the log just rotated before the test, I think it should be level 2 if I'm not mistaken.

No core dump, I'm providing the /var/log/messages:

Sep 19 07:26:56 ibm-p8-rhevm-hv-02 journal: vdsm vds ERROR unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/rpc/bindingxmlrpc.py", line 1276, in wrapper#012    res = f(*args, **kwargs)#012  File "/usr/share/vdsm/rpc/bindingxmlrpc.py", line 564, in vmGetIoTunePolicy#012    return vm.getIoTunePolicy()#012  File "/usr/share/vdsm/API.py", line 792, in getIoTunePolicy#012    return v.getIoTunePolicy()#012  File "/usr/share/vdsm/virt/vm.py", line 2577, in getIoTunePolicy#012    qos = self._getVmPolicy()#012  File "/usr/share/vdsm/virt/vm.py", line 2555, in _getVmPolicy#012    metadata_xml = self._dom.metadata(#012  File "/usr/share/vdsm/virt/virdomain.py", line 46, in __getattr__#012    % self.vmid)#012NotConnectedError: VM u'114666e3-5e5f-4eb5-a7b3-0acac48e4ef2' was not started yet or was shut down

Comment 4 Yaniv Kaul 2016-09-19 21:01:46 UTC
Did you check the libvirtd.log? 

[ykaul@ykaul Downloads]$ grep internal libvirtd.log 
2016-09-19 12:28:23.940+0000: 4461: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk0' not found
2016-09-19 12:36:43.719+0000: 4461: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk2' not found
2016-09-19 12:47:18.449+0000: 4460: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk0' not found
2016-09-19 12:47:38.569+0000: 4461: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk2' not found
2016-09-19 12:47:58.400+0000: 4459: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk3' not found
2016-09-19 12:48:38.779+0000: 4457: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command '__com.redhat_drive_del': Device 'drive-virtio-disk4' not found

And before this:
2016-09-19 12:32:44.681+0000: 23214: info : virDBusCall:1558 : DBUS_METHOD_ERROR: 'org.freedesktop.machine1.Manager.TerminateMachine' on '/org/freedesktop/machine1' at 'org.freedesktop.machine1' error org.freedesktop.machine1.NoSuchMachine: No machine 'qemu-114-hotplugfulltestplaniscsivm' known

which correlates with VDSM:
libvirtEventLoop::DEBUG::2016-09-19 07:32:44,685::vm::4648::virt.vm::(onLibvirtLifecycleEvent) vmId=`114666e3-5e5f-4eb5-a7b3-0acac48e4ef2`::event Stopped detail 2 opaque None
libvirtEventLoop::INFO::2016-09-19 07:32:44,686::vm::791::virt.vm::(_onQemuDeath) vmId=`114666e3-5e5f-4eb5-a7b3-0acac48e4ef2`::underlying process disconnected


Can we see the host?

Comment 7 Michal Skrivanek 2016-09-20 05:15:19 UTC
This is 7.3 testing, isn't it?

Comment 8 Yaniv Kaul 2016-09-20 07:15:10 UTC
(In reply to Michal Skrivanek from comment #7)
> This is 7.3 testing, isn't it?

Yes.

Comment 9 Michal Skrivanek 2016-09-20 07:23:23 UTC
bug 1377092 may possibly be related. Would be great if you can confirm https://bugzilla.redhat.com/show_bug.cgi?id=1377092#c10 in this case as well

Comment 12 Yaniv Kaul 2016-09-20 10:27:26 UTC
Assigning to QE - this looks very much like a platform issue. Please complete investigation and move the bug over.

Comment 13 Carlos Mestre González 2016-09-20 13:19:15 UTC
same root issue as https://bugzilla.redhat.com/show_bug.cgi?id=1377092 Updated the guest agent kernel to kernel-3.10.0-506.el7.ppc64le from kernel-3.10.0-306.0.1.el7.ppc64le and the issue is gone.

*** This bug has been marked as a duplicate of bug 1270717 ***


Note You need to log in before you can comment on or make changes to this bug.