Bug 1299480

Summary: Unhandled exception in <NumaInfoMonitor Error in vdsm log while migrating VM/s
Product: [oVirt] vdsm Reporter: Michael Burman <mburman>
Component: CoreAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: bugs, fromani, mavital, michal.skrivanek
Target Milestone: ovirt-3.6.5Flags: michal.skrivanek: ovirt-3.6.z?
mburman: planning_ack?
michal.skrivanek: devel_ack+
mavital: testing_ack+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-21 14:38:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michael Burman 2016-01-18 13:34:05 UTC
Description of problem:
Unhandled exception in <NumaInfoMonitor Error in vdsm log while migrating VM/s in cluster 3.6

The error is shown in the vdsm.log every few migration attempts.    
The migration finished with success.

periodic/1::ERROR::2016-01-18 15:21:13,290::executor::188::Executor::(_execute_task) Unhandled exception in <NumaInfoMonitor vm=7ebb5925-e76f-4d2f-a148-7671c313fe84 at 0x3312610>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 186, in _execute_task
    callable()
  File "/usr/share/vdsm/virt/periodic.py", line 279, in __call__
    self._execute()
  File "/usr/share/vdsm/virt/periodic.py", line 324, in _execute
    self._vm.updateNumaInfo()
  File "/usr/share/vdsm/virt/vm.py", line 5071, in updateNumaInfo
    self._numaInfo = numaUtils.getVmNumaNodeRuntimeInfo(self)
  File "/usr/share/vdsm/numaUtils.py", line 106, in getVmNumaNodeRuntimeInfo
    _get_vcpu_positioning(vm))
  File "/usr/share/vdsm/numaUtils.py", line 129, in _get_vcpu_positioning
    return vm._dom.vcpus()[0]
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2751, in vcpus
    if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '7ebb5925-e76f-4d2f-a148-7671c313fe84' (v2)


periodic/2::ERROR::2016-01-18 15:21:45,927::executor::188::Executor::(_execute_task) Unhandled exception in <NumaInfoMonitor vm=404f96db-b224-4163-a21e-eeb8eb084d7b at 0x7f6298353310>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 186, in _execute_task
    callable()
  File "/usr/share/vdsm/virt/periodic.py", line 279, in __call__
    self._execute()
  File "/usr/share/vdsm/virt/periodic.py", line 324, in _execute
    self._vm.updateNumaInfo()
  File "/usr/share/vdsm/virt/vm.py", line 5071, in updateNumaInfo
    self._numaInfo = numaUtils.getVmNumaNodeRuntimeInfo(self)
  File "/usr/share/vdsm/numaUtils.py", line 106, in getVmNumaNodeRuntimeInfo
    _get_vcpu_positioning(vm))
  File "/usr/share/vdsm/numaUtils.py", line 129, in _get_vcpu_positioning
    return vm._dom.vcpus()[0]
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2751, in vcpus
    if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '404f96db-b224-4163-a21e-eeb8eb084d7b' (v4)
Version-Release number of selected component (if applicable):
3.6.2.5-0.1.el6
vdsm-4.17.17-0.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64
libvirt-1.2.17-13.el7_2.2.x86_64

How reproducible:
60-85

Steps to Reproduce:
1. Migrate a VM between 2 servers in 3.6 cluster 

Actual results:
Every few migration attempts there is an error in vdsm log 

Expected results:
Errors shouldn't spam the vdsm log

Comment 1 Francesco Romani 2016-01-29 14:32:56 UTC
It is caused by a benign race between migration thread and periodic monitoring thread. It is just noise, but working on a patch.

Comment 2 Francesco Romani 2016-01-29 15:56:58 UTC
the patch is supposed to fix not only the specific error described here but also to swallow benign error like this.

Comment 3 Red Hat Bugzilla Rules Engine 2016-01-31 22:47:34 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 4 Red Hat Bugzilla Rules Engine 2016-01-31 22:47:34 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 5 Francesco Romani 2016-02-11 13:10:56 UTC
not yet MODIFIED, we need more patches, and we need backports to 3.6

Comment 6 Francesco Romani 2016-03-08 12:48:19 UTC
http://gerrit.ovirt.org/53056 is not critical to fix this - can actually hide some bugs. Everything else is merged and backported, hence moving to MODIFIED.

Comment 7 Francesco Romani 2016-03-08 12:48:57 UTC
I don't think this BZ requires doc_text. The user should just see less noise in the logs.

Comment 8 Michael Burman 2016-03-31 06:30:37 UTC
Verified on - 3.6.5-0.1.el6 and vdsm-4.17.25-0.el7ev.noarch