Bug 1309834 - New node won't run any VM nor allow migrations
New node won't run any VM nor allow migrations
Product: ovirt-engine
Classification: oVirt
Component: General (Show other bugs)
x86_64 Linux
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: bugs@ovirt.org
Depends On:
  Show dependency treegraph
Reported: 2016-02-18 14:31 EST by nicolas
Modified: 2016-02-22 06:31 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-02-22 06:31:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?

Attachments (Terms of Use)
Engine log from the moment I click on "migrate" (46.05 KB, text/plain)
2016-02-18 14:31 EST, nicolas
no flags Details
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate" (186.89 KB, text/plain)
2016-02-18 14:32 EST, nicolas
no flags Details
libvirt log for the VM (3.40 KB, text/plain)
2016-02-19 13:17 EST, nicolas
no flags Details

  None (edit)
Description nicolas 2016-02-18 14:31:25 EST
Created attachment 1128296 [details]
Engine log from the moment I click on "migrate"

Description of problem:

We had 4 nodes in our infrastructure and added another one, this latter (ovirtnode5.domain.com) being a different manufacturer and chipset than the other 4, but having the same amount of RAM memory (128GB).

Installation went smooth and all versions of packages are the same on all hosts. SELinux is enabled.

When running a machine on ovirtnode5.domain.com or moving a machine here it fails everytime.

Steps to Reproduce:
1. Choose any VM
2. Click migrate, choose ovirtnode5.domain.com as destination
3. After a while it fails

The most remarkable error on the vdsm.log side is:

libvirtEventLoop::INFO::2016-02-18 19:01:07,069::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId=u'86b4bb6c-b262-41c1-b7e0-964057153f59')
periodic/7::DEBUG::2016-02-18 19:01:07,070::executor::178::Executor::(_run) Worker was discarded
Thread-177::ERROR::2016-02-18 19:01:07,070::vm::752::virt.vm::(_startUnderlyingVm) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 721, in _startUnderlyingVm
  File "/usr/share/vdsm/virt/vm.py", line 2830, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 2889, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid '86b4bb6c-b262-41c1-b7e0-964057153f59'
Thread-177::INFO::2016-02-18 19:01:07,071::vm::1324::virt.vm::(setDownStatus) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Changed state to Down: VM failed to migrate (code=8)
Thread-177::DEBUG::2016-02-18 19:01:07,071::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"86b4bb6c-b262-41c1-b7e0-964057153f59": {"status": "Down", "timeOffset": "1", "exitReason": 8, "exitMessage": "VM failed to migrate", "exitCode": 1}, "notify_time": 4296088800}, "jsonrpc": "2.0", "method": "|virt|VM_status|86b4bb6c-b262-41c1-b7e0-964057153f59"}

I don't know what does that exactly mean. Any storage is mounted and I can access it and see UUIDs. We use gluster ver. 3.7.2 as client (same on all hosts).

I'm attaching both the vdsm.log and engine.log. I wonder if this might be some hardware incompatibility, although installation and inclusion into cluster went smoothly.

Some legend to read the log:

vm.domain.com -> VM to migrate
ovirtnode4.domain.com -> Node where vm.domain.com currently runs
ovirtnode5.domain.com -> Destination node of migration, where it actually fails
storage.domain.com -> Gluster server
Comment 1 nicolas 2016-02-18 14:32 EST
Created attachment 1128297 [details]
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate"
Comment 2 nicolas 2016-02-19 13:17 EST
Created attachment 1128621 [details]
libvirt log for the VM

I was able to find out something possibly related. For the VM start the line:

Domain id=3 is tainted: hook-script

shows up. See details in this attachment.
Comment 3 nicolas 2016-02-22 04:03:57 EST
There's a detail I described incorrectly: Direct power on of a VM on the node ovirtnode5.domain.com works, what doesn't work is migration.
Comment 4 nicolas 2016-02-22 06:31:59 EST
Solved, seems that the issue was related to the cable connected to the migration interface - replaced the cable and now it works. Sorry for the nuisance.

Note You need to log in before you can comment on or make changes to this bug.