1071640 – Incorrect error message received from engine, insufficient memory instead of incompatible number of cores between hosts.

Bug 1071640 - Incorrect error message received from engine, insufficient memory instead of incompatible number of cores between hosts.

Summary: Incorrect error message received from engine, insufficient memory instead of ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1049318
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.5.0
Assignee:	bugs@ovirt.org
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:	sla
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-02 15:18 UTC by Nikolai Sednev
Modified:	2016-02-10 19:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-08-26 11:30:43 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs2.tar.gz (1.50 MB, application/x-gzip) 2014-03-02 15:18 UTC, Nikolai Sednev	no flags	Details
View All

Description Nikolai Sednev 2014-03-02 15:18:05 UTC

Created attachment 869650 [details]
logs2.tar.gz

Description of problem:
Incorrect error message received from engine, insufficient memory instead of incompatible number of cores between hosts. 

Incorrect message printed during failure of live VM migration, instead of printing "MigrateVm failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CPUS,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName nsednev_host_b,$filterName CPU,SCHEDULING_HOST_FILTERED_REASON"

, incorrect message of "MigrateVm failed. Reaso
ns:VAR__ACTION__MIGRATE,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_MEMORY,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName nsednev_host_b
,$filterName Memory,SCHEDULING_HOST_FILTERED_REASON" received.

Version-Release number of selected component (if applicable):
ovirt-engine-3.4.0-0.7.beta2.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.4.x86_64
libvirt-0.10.2-29.el6_5.3.x86_64
sanlock-2.8-1.el6.x86_64

How reproducible:
Rare

Steps to Reproduce:
1.Build Ovirt environment with cluster containing 2 hosts with different number of CPU sockets and one VM with installed OS on it.  
2.Change configuration of VMs CPU's accordingly to your strongest host with maximum number of sockets and cores, and CPU's.
3.Live migrate VM to another host (weaker), and expect to receive message warning you of insufficient memory capacity on target host, while actually there is enough memory on it, yet for some reason incorrect error is received.
4.Put targeted host to maintenance and then activate it back online.
5.Try migrating VM to weaker host again and expect to receive the correct message of incompatible number of cores on targeted host.

Actual results:
Insufficient memory message being received instead of incompatible number of cores between the hosts, until targeted host put to maintenance and then revoked.

Expected results:
From the beginning error message should be about incompatible number of cores.

Additional info:
engine, server logs from engine and vdsm logs from both hosts.
Please look in logs at 2014-03-02 14:06:56 and 2014-03-02 14:09:33.

Comment 1 Sandro Bonazzola 2014-03-04 09:28:03 UTC

This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 2 Nikolai Sednev 2014-04-03 14:20:07 UTC

Reproduced on Red Hat Enterprise Virtualization Manager Version: 3.4.0-0.12.beta2.el6ev as well.

Comment 3 Sandro Bonazzola 2014-05-08 13:55:52 UTC

This is an automated message.

oVirt 3.4.1 has been released.
This issue has been retargeted to 3.5.0 since it has not been marked as high priority or severity issue, please retarget if needed.

Comment 4 Gilad Chaplik 2014-06-01 07:56:39 UTC

Martin, did you solve it already?

Comment 5 Martin Sivák 2014-07-23 08:48:04 UTC

Is this reproducible using a clean state cluster? What I mean by this is that this might have been caused by broken pending memory counter.

There was a bug https://bugzilla.redhat.com/show_bug.cgi?id=1049318 that was caused by this and we fixed couple more code flows where the pending value was not decremented properly.

Can you please retest if this happens when it is the fist migration in the cluster and then try migrating repeatedly between two hosts and wait for this to happen (or not). It should not take more than 8 migrations if your host has 8GB or RAM and the VM is using 1GB.

Comment 6 Nikolai Sednev 2014-08-11 16:03:33 UTC

I tried several migrations with topology of two hosts as described bellow, bug was not reproduced and may be closed, as currently I see that some code changes were applied:



I see that now if two hosts being used, the values of weakest host being used as rule of a thumb, which means if I have 2 hosts:
master-vds10
cores-6
sockets-2
CPUs-24

rose-05
cores-4
sockets-1
CPUs-8

So this impacts me to create the VM with these maximum values:
Total Virtual CPUs-8
Cores per Virtual Socket-4
Virtual Sockets-2

In case I was trying to run a VM with: 
cores-6
sockets-2
CPUs-12

I was getting this error on engine:
	
2014-Aug-11, 18:43
	
Failed to run VM VM2 (User: admin).
	
2014-Aug-11, 18:43
	
Failed to run VM VM2 on Host master-vds10.qa.lab.tlv.redhat.com.
	
2014-Aug-11, 18:43
	
VM VM2 is down with error. Exit message: Maximum CPUs greater than topology limit.







Components were used:
ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch
libvirt-0.10.2-29.el6_5.10.x86_64
sanlock-2.8-1.el6.x86_64
vdsm-4.16.1-6.gita4a4614.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64

Comment 7 Martin Sivák 2014-08-26 11:30:43 UTC

Closing as the cause is related to the referenced bug.

*** This bug has been marked as a duplicate of bug 1049318 ***

Note You need to log in before you can comment on or make changes to this bug.