Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1401523

Summary:

Start VM action fails and host becomes non-responsive

Product:

[oVirt] ovirt-engine

Reporter:

Mor <mkalfon>

Component:

BLL.Storage

Assignee:

Fred Rolland <frolland>

Status:

CLOSED WORKSFORME

QA Contact:

Raz Tamir <ratamir>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

4.1.0

CC:

bugs, danken, gklein, mburman, mkalfon, myakove, tnisan

Target Milestone:

ovirt-4.1.0-beta

Keywords:

Automation, Regression

Target Release:

---

Flags:

rule-engine: ovirt-4.1+
gklein: blocker+

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-12-15 09:07:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vdsm, supervdsm and engine logs	none

Description Mor 2016-12-05 14:00:53 UTC

Description of problem:
Start VM action sometimes fails and its host becomes non-responsive. This is also relevant for VM migration action.

Version-Release number of selected component (if applicable):
oVirt engine Version 4.1.0-0.2.master.20161201131309.git6c02a32.el7.centos
Vdsm version: vdsm-4.18.999-1028.gitd5c00b9.el7.centos

How reproducible:
Sometimes.

Steps to Reproduce:
1. Start a VM on host.

Actual results:
Start fails and makes host non-responsive.

Expected results:
Should work.

Additional info:
Run VM command started on: 2016-12-05 11:34

vdsm.log:

2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3115, in prepareImage
    self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 326, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',)
2016-12-05 11:34:26,929 INFO  (vm/ccc27397) [storage.TaskManager.Task] (Task='435710e8-2bc2-4789-aec2-08e72b4f1b21') aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309 (task:1175)
2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [storage.Dispatcher] {'status': {'message': "Unknown pool id, pool not connected: ('a0ff0f8f-3675-4166-85f5-128bc3424160',)", 'code': 309}} (dispatcher:77)
2016-12-05 11:34:26,929 ERROR (vm/ccc27397) [virt.vm] (vmId='ccc27397-1d68-4883-a8be-5542ba077b26') The vm start process failed (vm:613)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 549, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1899, in _run
    self._preparePathsForDrives(dev_spec_map[hwclass.DISK])
  File "/usr/share/vdsm/virt/vm.py", line 639, in _preparePathsForDrives
    drive['path'] = self.cif.prepareVolumePath(drive, self.id)
  File "/usr/share/vdsm/clientIF.py", line 320, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '6ac83627-74d3-4647-9f80-9dc7af5b493b', 'apparentsize': '18612224', 'imageID': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': '43f9a986-0afb-4ca0-8a16-65b704e636ee', 'truesize': '18554880', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}

engine.log:

2016-12-05 11:34:24,144+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_2 is down with error. Exit message: Bad volume specification {'index': 0, 'domainID': '39de3668-5565-4901-888e-5294729ed509', 'reqsize': '0', 'format': 'cow', 'bootOrder': '1', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x07'}, 'volumeID': '9b9de755-9280-43ee-8cb5-a0e06d064f8f', 'apparentsize': '18677760', 'imageID': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'discard': False, 'specParams': {}, 'readonly': 'false', 'iface': 'virtio', 'optional': 'false', 'deviceId': 'd3470036-e997-48e1-bcf2-5624cb5e0a7b', 'truesize': '18620416', 'poolID': 'a0ff0f8f-3675-4166-85f5-128bc3424160', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}.

Comment 1 Mor 2016-12-05 14:01:30 UTC

Created attachment 1228066 [details]
vdsm, supervdsm and engine logs

Comment 2 Mor 2016-12-05 14:04:53 UTC

*** Bug 1400512 has been marked as a duplicate of this bug. ***

Comment 3 Mor 2016-12-07 15:43:26 UTC

Update on finding: If problem occurs, host can be put into maintenance mode and reactivated, then the VM starts normally.

Comment 4 Fred Rolland 2016-12-11 10:15:39 UTC

I could not reproduce on master.
I asked Mor to update if it will happen again on his setup.

Comment 5 Fred Rolland 2016-12-15 07:48:36 UTC

Mor, did you encountered this issue again ?

Comment 6 Mor 2016-12-15 08:41:40 UTC

(In reply to Fred Rolland from comment #5)
> Mor, did you encountered this issue again ?

No, it seems to be resolved.

Comment 7 Fred Rolland 2016-12-15 09:07:12 UTC

Closing it, seems to have been solved between the builds.