Bug 1309834

Summary: New node won't run any VM nor allow migrations
Product: [oVirt] ovirt-engine Reporter: nicolas
Component: GeneralAssignee: bugs <bugs>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.2.6CC: bugs
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-22 11:31:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Engine log from the moment I click on "migrate"
none
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate"
none
libvirt log for the VM none

Description nicolas 2016-02-18 19:31:25 UTC
Created attachment 1128296 [details]
Engine log from the moment I click on "migrate"

Description of problem:

We had 4 nodes in our infrastructure and added another one, this latter (ovirtnode5.domain.com) being a different manufacturer and chipset than the other 4, but having the same amount of RAM memory (128GB).

Installation went smooth and all versions of packages are the same on all hosts. SELinux is enabled.

When running a machine on ovirtnode5.domain.com or moving a machine here it fails everytime.

Steps to Reproduce:
1. Choose any VM
2. Click migrate, choose ovirtnode5.domain.com as destination
3. After a while it fails

The most remarkable error on the vdsm.log side is:

libvirtEventLoop::INFO::2016-02-18 19:01:07,069::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId=u'86b4bb6c-b262-41c1-b7e0-964057153f59')
periodic/7::DEBUG::2016-02-18 19:01:07,070::executor::178::Executor::(_run) Worker was discarded
Thread-177::ERROR::2016-02-18 19:01:07,070::vm::752::virt.vm::(_startUnderlyingVm) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 721, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 2830, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 2889, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid '86b4bb6c-b262-41c1-b7e0-964057153f59'
Thread-177::INFO::2016-02-18 19:01:07,071::vm::1324::virt.vm::(setDownStatus) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Changed state to Down: VM failed to migrate (code=8)
Thread-177::DEBUG::2016-02-18 19:01:07,071::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"86b4bb6c-b262-41c1-b7e0-964057153f59": {"status": "Down", "timeOffset": "1", "exitReason": 8, "exitMessage": "VM failed to migrate", "exitCode": 1}, "notify_time": 4296088800}, "jsonrpc": "2.0", "method": "|virt|VM_status|86b4bb6c-b262-41c1-b7e0-964057153f59"}

I don't know what does that exactly mean. Any storage is mounted and I can access it and see UUIDs. We use gluster ver. 3.7.2 as client (same on all hosts).

I'm attaching both the vdsm.log and engine.log. I wonder if this might be some hardware incompatibility, although installation and inclusion into cluster went smoothly.

Some legend to read the log:

vm.domain.com -> VM to migrate
ovirtnode4.domain.com -> Node where vm.domain.com currently runs
ovirtnode5.domain.com -> Destination node of migration, where it actually fails
storage.domain.com -> Gluster server

Comment 1 nicolas 2016-02-18 19:32:21 UTC
Created attachment 1128297 [details]
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate"

Comment 2 nicolas 2016-02-19 18:17:39 UTC
Created attachment 1128621 [details]
libvirt log for the VM

I was able to find out something possibly related. For the VM start the line:

Domain id=3 is tainted: hook-script

shows up. See details in this attachment.

Comment 3 nicolas 2016-02-22 09:03:57 UTC
There's a detail I described incorrectly: Direct power on of a VM on the node ovirtnode5.domain.com works, what doesn't work is migration.

Comment 4 nicolas 2016-02-22 11:31:59 UTC
Solved, seems that the issue was related to the cable connected to the migration interface - replaced the cable and now it works. Sorry for the nuisance.