Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1625720

Summary: VM not starting with: Multiple 'scsi' controllers with index '0'.
Product: [oVirt] ovirt-engine Reporter: Andreas Elvers <andreas.elvers+redhat.bugzilla>
Component: GeneralAssignee: bugs <bugs>
Status: CLOSED WORKSFORME QA Contact: meital avital <mavital>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.2.6CC: andreas.elvers+redhat.bugzilla, bugs, michal.skrivanek, rbarry, tnisan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-22 08:15:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log with multiple scsi controller with id 0 error none

Description Andreas Elvers 2018-09-05 16:13:57 UTC
Description of problem:

On a oVirt 4.2.6 cluster (this behaviour is also seen in 4.2.5) using cinder (Pike on Ubuntu xenial) / ceph (Mimic) combo for vm volumes there is a problem with shutting down a vm. After a shutdown the vm may not start but instead the following error is produced:

VM testceph is down with error. Exit message: XML error: Multiple 'scsi' controllers with index '0'.

Waiting for some time (5 to 10 minutes) will usually resolve the problem.

I don't see this behaviour on our old NFS backed cluster which is as current as the ceph backed cluster.

Version-Release number of selected component (if applicable):

oVirt 4.2.6 (node ng install)

How reproducible:


Steps to Reproduce:
1. Stop a vm with a bootable ceph rbd disk
2. try to start it
3. above error will occur


Actual results:

VM does not start

Expected results:

VM should start

Additional info:

To allow oVirt to communicate with ceph I had to upgrade librbd1 and librados to a more recent version. I used this repository, as it seems to be suitable:

[ovirt-4.2-centos-ceph-luminous]
enabled=1
name = CentOS-7 - ceph luminous
baseurl = http://mirror.centos.org/centos/7/storage/$basearch/ceph-luminous/
gpgcheck = 1
enabled = 1
gpgkey = https://raw.githubusercontent.com/CentOS-Storage-SIG/centos-release-storage-common/master/RPM-GPG-KEY-CentOS-SIG-Storage
includepkgs = librados2 librbd1 lttng-ust

Comment 1 Michal Skrivanek 2018-09-06 04:41:02 UTC
Please attach ovirt-engine logs covering the time of VM creation and successful starts and the failed one
Thanks

Comment 2 Tal Nisan 2018-09-06 11:14:40 UTC
Ryan, this is not necessarily storage related, might be domain XML related in which case it will be Virt.

Comment 3 Michal Skrivanek 2018-09-06 11:39:39 UTC
well, this sounds storage related. Correct XML for certain storage is still storage.

Comment 4 Andreas Elvers 2018-09-07 07:31:54 UTC
Created attachment 1481507 [details]
Log with multiple scsi controller with id 0 error

This log shows a start of a vm that failed but oVirt eventually brought it up without user intervention on subsequent tries. I have to correct myself on the node versions though. The participating nodes are named node01, node02, node03. node01 is current on 4.2.6, node02 and node03 are still on 4.2.5.1.

Comment 5 Andreas Elvers 2018-09-07 07:36:01 UTC
> Log with multiple scsi controller with id 0 error

I filtered out Gluster messages, for clarity, since the gluster status messages are logged every few seconds.

Comment 6 Michal Skrivanek 2018-09-07 14:41:20 UTC
when was this VM originally created? And whe it was run last time successfully before failure?

Comment 7 Andreas Elvers 2018-09-07 16:31:16 UTC
The VM was created like one or two weeks ago. I can always run the VM eventually. The problems arise, when I add a new disk. I use this particular VM to move the contents of NFS backed VMs to our Ceph backed cluster. I add a new disk, rsync from the nfs side to the ceph side. After that it is attached to the replacement VM on our ceph cluster.

Adding a new disk will usually trigger this error. I click on run, then it will try node01 and error, try node02 and error, try node03 and error out completely. After some clicks on run the VM will start eventually.

So today I have moved two VMs to our ceph backed oVirt cluster, and every time I removed the finished disk and added a new disk, the run problem is there. But I can always start it successfully after a few minutes of trying. After starting everything is fine.

Comment 8 Michal Skrivanek 2018-09-14 11:20:37 UTC
the log doesn't contain the first start of the VM. Can you please reproduce and attach the log covering VM creating, initial start that fails, and then the start that succeeds?
Also, please make sure you have 4.2.6 engine (hosts do not matter) and check if by any chance you've enabled iothreads

Comment 9 Andreas Elvers 2018-10-08 07:59:02 UTC
I upgraded to 4.2.6 engine. Can no longer reproduce the error.

Comment 10 Michal Skrivanek 2018-10-08 12:03:48 UTC
good, then we can close this....