Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1625720

Summary:

VM not starting with: Multiple 'scsi' controllers with index '0'.

Product:

[oVirt] ovirt-engine

Reporter:

Andreas Elvers <andreas.elvers+redhat.bugzilla>

Component:

General

Assignee:

bugs <bugs>

Status:

CLOSED WORKSFORME

QA Contact:

meital avital <mavital>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

4.2.6

CC:

andreas.elvers+redhat.bugzilla, bugs, michal.skrivanek, rbarry, tnisan

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-10-22 08:15:52 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log with multiple scsi controller with id 0 error	none

Description Andreas Elvers 2018-09-05 16:13:57 UTC

Description of problem:

On a oVirt 4.2.6 cluster (this behaviour is also seen in 4.2.5) using cinder (Pike on Ubuntu xenial) / ceph (Mimic) combo for vm volumes there is a problem with shutting down a vm. After a shutdown the vm may not start but instead the following error is produced:

VM testceph is down with error. Exit message: XML error: Multiple 'scsi' controllers with index '0'.

Waiting for some time (5 to 10 minutes) will usually resolve the problem.

I don't see this behaviour on our old NFS backed cluster which is as current as the ceph backed cluster.

Version-Release number of selected component (if applicable):

oVirt 4.2.6 (node ng install)

How reproducible:


Steps to Reproduce:
1. Stop a vm with a bootable ceph rbd disk
2. try to start it
3. above error will occur


Actual results:

VM does not start

Expected results:

VM should start

Additional info:

To allow oVirt to communicate with ceph I had to upgrade librbd1 and librados to a more recent version. I used this repository, as it seems to be suitable:

[ovirt-4.2-centos-ceph-luminous]
enabled=1
name = CentOS-7 - ceph luminous
baseurl = http://mirror.centos.org/centos/7/storage/$basearch/ceph-luminous/
gpgcheck = 1
enabled = 1
gpgkey = https://raw.githubusercontent.com/CentOS-Storage-SIG/centos-release-storage-common/master/RPM-GPG-KEY-CentOS-SIG-Storage
includepkgs = librados2 librbd1 lttng-ust

Comment 1 Michal Skrivanek 2018-09-06 04:41:02 UTC

Please attach ovirt-engine logs covering the time of VM creation and successful starts and the failed one
Thanks

Comment 2 Tal Nisan 2018-09-06 11:14:40 UTC

Ryan, this is not necessarily storage related, might be domain XML related in which case it will be Virt.

Comment 3 Michal Skrivanek 2018-09-06 11:39:39 UTC

well, this sounds storage related. Correct XML for certain storage is still storage.

Comment 4 Andreas Elvers 2018-09-07 07:31:54 UTC

Created attachment 1481507 [details]
Log with multiple scsi controller with id 0 error

This log shows a start of a vm that failed but oVirt eventually brought it up without user intervention on subsequent tries. I have to correct myself on the node versions though. The participating nodes are named node01, node02, node03. node01 is current on 4.2.6, node02 and node03 are still on 4.2.5.1.

Comment 5 Andreas Elvers 2018-09-07 07:36:01 UTC

> Log with multiple scsi controller with id 0 error

I filtered out Gluster messages, for clarity, since the gluster status messages are logged every few seconds.

Comment 6 Michal Skrivanek 2018-09-07 14:41:20 UTC

when was this VM originally created? And whe it was run last time successfully before failure?

Comment 7 Andreas Elvers 2018-09-07 16:31:16 UTC

The VM was created like one or two weeks ago. I can always run the VM eventually. The problems arise, when I add a new disk. I use this particular VM to move the contents of NFS backed VMs to our Ceph backed cluster. I add a new disk, rsync from the nfs side to the ceph side. After that it is attached to the replacement VM on our ceph cluster.

Adding a new disk will usually trigger this error. I click on run, then it will try node01 and error, try node02 and error, try node03 and error out completely. After some clicks on run the VM will start eventually.

So today I have moved two VMs to our ceph backed oVirt cluster, and every time I removed the finished disk and added a new disk, the run problem is there. But I can always start it successfully after a few minutes of trying. After starting everything is fine.

Comment 8 Michal Skrivanek 2018-09-14 11:20:37 UTC

the log doesn't contain the first start of the VM. Can you please reproduce and attach the log covering VM creating, initial start that fails, and then the start that succeeds?
Also, please make sure you have 4.2.6 engine (hosts do not matter) and check if by any chance you've enabled iothreads

Comment 9 Andreas Elvers 2018-10-08 07:59:02 UTC

I upgraded to 4.2.6 engine. Can no longer reproduce the error.

Comment 10 Michal Skrivanek 2018-10-08 12:03:48 UTC

good, then we can close this....