Created attachment 802186 [details] logs Description of problem: cannot run a vm from 'paused' state after connectivity with storage resumed. Version-Release number of selected component (if applicable): vdsm-4.12.0-138.gitab256be.el6ev.x86_64 How reproducible: unknown Steps to Reproduce: 1. have a data center (iscsi) with 2 storage domains created from 2 different storage servers 2. run a VM (with a disk located on the non-master storage domain) 3. block connectivity from host to the non master storage domain using iptables 4. when VM enters to 'pause' state, resume connectivity to storage. 5. when host is active again, try to activate the VM Actual results: Cannot start the vm from paused state. vdsm fails with: clientIFinit::ERROR::2013-09-23 18:12:26,480::clientIF::465::vds::(_recoverExistingVms) Vm afac6a2c-2210-4f5d-a827-cadb046243d1 recovery failed Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 462, in _recoverExistingVms vmObj.getConfDevices()[vm.DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 1873, in getConfDevices self.normalizeDrivesIndices(devices[DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 2058, in normalizeDrivesIndices if drv['iface'] not in self._usedIndices: KeyError: 'iface' Thread-386::ERROR::2013-09-23 18:27:07,690::BindingXMLRPC::993::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 979, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 227, in vmCont return vm.cont() File "/usr/share/vdsm/API.py", line 145, in cont return v.cont() File "/usr/share/vdsm/vm.py", line 2396, in cont self._underlyingCont() File "/usr/share/vdsm/vm.py", line 3440, in _underlyingCont hooks.before_vm_cont(self._dom.XMLDesc(0), self.conf) AttributeError: 'NoneType' object has no attribute 'XMLDesc' not sure whether it's a storage or a network issue. Additional info: logs
You failed to mention that vdsm restart MainThread::INFO::2013-09-23 18:12:10,362::vdsm::101::vds::(run) (PID: 26042) I am the actual vdsm 4.12.0-138.gitab256be.el6ev nott-vds2.qa.lab.tlv.redhat.com (2.6.32-419.el6.x86_64) please attach sanlock log. Regardless, the issue is that the devices that are marshalled to disk do not contain the 'iface' key which is added in getConfDrives which is only called when running a VM. This means you've hotplugged a device and it doesn't contain the key. Simply running the following would reach the same result: 1. hotplug a device 2. restart vdsm getConfDrive should not always add 'iface' to all devices and normalizeDrivesIndices should not assume all drives have the 'iface' key
Created attachment 802203 [details] sanlock.log sanlock.log attached
Elad, do you have the vdsm.log of the vmHotplugDisk() call? Engine Should have sent there the 'iface' element, which should have been either 'ide' or 'pci'. If not, it's an Engine bug (which can still be hacked around from vdsm side if impossible to fix properly on Engine).
Created attachment 802216 [details] vdsm.log (hotplug) (In reply to Dan Kenigsberg from comment #3) > Elad, do you have the vdsm.log of the vmHotplugDisk() call? > > Engine Should have sent there the 'iface' element, which should have been > either 'ide' or 'pci'. If not, it's an Engine bug (which can still be hacked > around from vdsm side if impossible to fix properly on Engine). Thread-7360::DEBUG::2013-09-23 15:11:56,223::BindingXMLRPC::974::vds::(wrapper) client [10.35.161.52]::call vmHotplugDisk with ({'vmId': 'afac6a2c-2210-4f5d-a827-cadb046243d1', 'drive': {'iface': 'virtio', 'format ': 'raw', 'optional': 'false', 'volumeID': '3257c0a1-9fd4-4882-ab77-afe3b6b23a2a', 'imageID': '09a8bc04-7fa6-4673-8fac-35926164024e', 'readonly': 'false', 'domainID': 'eff02bb9-cea8-4f89-a077-47f36be46197', 'devic eId': '09a8bc04-7fa6-4673-8fac-35926164024e', 'poolID': 'b7cb43df-2955-47ed-b2a5-07ee6891c2b4', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}},) {} flowID [5042b295]
Hi Elad, Can you please provide libvirt logs in time of hotplug, to see the difference between the inforamtion reaching libvirt and the information saved in vdsm for the device. In recovery, we obtain the vm info from libvirt. If the 'iface' attribute wasn't sent to libvirt, we can't recover correctly. Thanks!
Hi Elad / Yeela, iiuc you are not able to reproduce this issue at all?
(In reply to Ayal Baron from comment #7) > Hi Elad / Yeela, iiuc you are not able to reproduce this issue at all? Tried to reproduce it according to the steps from comment #0 as it happened to me in the first place and also according to Ayal's suggestion (including VM migration). Both doesn't seem to reproduce the issue
Closing according to comment 8, please reopen if happens again