Bug 1309834 - New node won't run any VM nor allow migrations
Summary: New node won't run any VM nor allow migrations
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.2.6
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-18 19:31 UTC by nicolas
Modified: 2016-02-22 11:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-22 11:31:59 UTC
oVirt Team: Virt
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Engine log from the moment I click on "migrate" (46.05 KB, text/plain)
2016-02-18 19:31 UTC, nicolas
no flags Details
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate" (186.89 KB, text/plain)
2016-02-18 19:32 UTC, nicolas
no flags Details
libvirt log for the VM (3.40 KB, text/plain)
2016-02-19 18:17 UTC, nicolas
no flags Details

Description nicolas 2016-02-18 19:31:25 UTC
Created attachment 1128296 [details]
Engine log from the moment I click on "migrate"

Description of problem:

We had 4 nodes in our infrastructure and added another one, this latter (ovirtnode5.domain.com) being a different manufacturer and chipset than the other 4, but having the same amount of RAM memory (128GB).

Installation went smooth and all versions of packages are the same on all hosts. SELinux is enabled.

When running a machine on ovirtnode5.domain.com or moving a machine here it fails everytime.

Steps to Reproduce:
1. Choose any VM
2. Click migrate, choose ovirtnode5.domain.com as destination
3. After a while it fails

The most remarkable error on the vdsm.log side is:

libvirtEventLoop::INFO::2016-02-18 19:01:07,069::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId=u'86b4bb6c-b262-41c1-b7e0-964057153f59')
periodic/7::DEBUG::2016-02-18 19:01:07,070::executor::178::Executor::(_run) Worker was discarded
Thread-177::ERROR::2016-02-18 19:01:07,070::vm::752::virt.vm::(_startUnderlyingVm) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 721, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 2830, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 2889, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid '86b4bb6c-b262-41c1-b7e0-964057153f59'
Thread-177::INFO::2016-02-18 19:01:07,071::vm::1324::virt.vm::(setDownStatus) vmId=`86b4bb6c-b262-41c1-b7e0-964057153f59`::Changed state to Down: VM failed to migrate (code=8)
Thread-177::DEBUG::2016-02-18 19:01:07,071::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"86b4bb6c-b262-41c1-b7e0-964057153f59": {"status": "Down", "timeOffset": "1", "exitReason": 8, "exitMessage": "VM failed to migrate", "exitCode": 1}, "notify_time": 4296088800}, "jsonrpc": "2.0", "method": "|virt|VM_status|86b4bb6c-b262-41c1-b7e0-964057153f59"}

I don't know what does that exactly mean. Any storage is mounted and I can access it and see UUIDs. We use gluster ver. 3.7.2 as client (same on all hosts).

I'm attaching both the vdsm.log and engine.log. I wonder if this might be some hardware incompatibility, although installation and inclusion into cluster went smoothly.

Some legend to read the log:

vm.domain.com -> VM to migrate
ovirtnode4.domain.com -> Node where vm.domain.com currently runs
ovirtnode5.domain.com -> Destination node of migration, where it actually fails
storage.domain.com -> Gluster server

Comment 1 nicolas 2016-02-18 19:32:21 UTC
Created attachment 1128297 [details]
VDSM log of ovirtnode5.domain.com from the moment I click on "migrate"

Comment 2 nicolas 2016-02-19 18:17:39 UTC
Created attachment 1128621 [details]
libvirt log for the VM

I was able to find out something possibly related. For the VM start the line:

Domain id=3 is tainted: hook-script

shows up. See details in this attachment.

Comment 3 nicolas 2016-02-22 09:03:57 UTC
There's a detail I described incorrectly: Direct power on of a VM on the node ovirtnode5.domain.com works, what doesn't work is migration.

Comment 4 nicolas 2016-02-22 11:31:59 UTC
Solved, seems that the issue was related to the cable connected to the migration interface - replaced the cable and now it works. Sorry for the nuisance.


Note You need to log in before you can comment on or make changes to this bug.