Bug 1401523

Summary: Start VM action fails and host becomes non-responsive
Product: [oVirt] ovirt-engine Reporter: Mor <mkalfon>
Component: BLL.StorageAssignee: Fred Rolland <frolland>
Status: CLOSED WORKSFORME QA Contact: Raz Tamir <ratamir>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.1.0CC: bugs, danken, gklein, mburman, mkalfon, myakove, tnisan
Target Milestone: ovirt-4.1.0-betaKeywords: Automation, Regression
Target Release: ---Flags: rule-engine: ovirt-4.1+
gklein: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-15 09:07:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm, supervdsm and engine logs none

Description Mor 2016-12-05 14:00:53 UTC
Description of problem:
Start VM action sometimes fails and its host becomes non-responsive. This is also relevant for VM migration action.

Version-Release number of selected component (if applicable):
oVirt engine Version 4.1.0-0.2.master.20161201131309.git6c02a32.el7.centos
Vdsm version: vdsm-4.18.999-1028.gitd5c00b9.el7.centos

How reproducible:
Sometimes.

Steps to Reproduce:
1. Start a VM on host.

Actual results:
Start fails and makes host non-responsive.

Expected results:
Should work.

Additional info:
Run VM command started on: 2016-12-05 11:34

vdsm.log:

2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3115, in prepareImage
    self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 326, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',)
2016-12-05 11:34:26,929 INFO  (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309 (task:1175)
2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.Dispatcher] {'status': {'message': "Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',)", 'code': 309}} (dispatcher:77)
2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [virt.vm] (vmId='ccc27397-1d68-4883-a8be-5542ba077b26') The vm start process failed (vm:613)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 549, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1899, in _run
    self._preparePathsForDrives(dev_spec_map[hwclass.DISK])
  File "/usr/share/vdsm/virt/vm.py", line 639, in _preparePathsForDrives
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/share/vdsm/clientIF.py", line 320, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '6ac83627-74d3-4647-9f80-9dc7af5b493b', 'apparentsize': '18612224', 'imageID': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'truesize': '18554880', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}

engine.log:

2016-12-05 11:34:24,144+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_2 is down with error. Exit message: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '9b9de755-9280-43ee-8cb5-a0e06d064f8f', 'apparentsize': '18677760', 'imageID': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'truesize': '18620416', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}.

Comment 1 Mor 2016-12-05 14:01:30 UTC
Created attachment 1228066 [details]
vdsm, supervdsm and engine logs

Comment 2 Mor 2016-12-05 14:04:53 UTC
*** Bug 1400512 has been marked as a duplicate of this bug. ***

Comment 3 Mor 2016-12-07 15:43:26 UTC
Update on finding: If problem occurs, host can be put into maintenance mode and reactivated, then the VM starts normally.

Comment 4 Fred Rolland 2016-12-11 10:15:39 UTC
I could not reproduce on master.
I asked Mor to update if it will happen again on his setup.

Comment 5 Fred Rolland 2016-12-15 07:48:36 UTC
Mor, did you encountered this issue again ?

Comment 6 Mor 2016-12-15 08:41:40 UTC
(In reply to Fred Rolland from comment #5)
> Mor, did you encountered this issue again ?

No, it seems to be resolved.

Comment 7 Fred Rolland 2016-12-15 09:07:12 UTC
Closing it, seems to have been solved between the builds.