1275916 – Node put on maintenance distributes VMs on just one node

Bug 1275916 - Node put on maintenance distributes VMs on just one node

Summary: Node put on maintenance distributes VMs on just one node

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	3.5.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ovirt-3.6.5
Target Release:	---
Assignee:	Martin Sivák
QA Contact:	Artyom
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-28 06:25 UTC by nicolas
Modified:	2016-03-27 08:34 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-03-27 08:34:31 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:
Flags:	mavital: needinfo+ dfediuck: ovirt-3.6.z? mgoldboi: planning_ack+ rule-engine: devel_ack? rule-engine: testing_ack?

Attachments	(Terms of Use)
Log file nearby the time of the crash (166.24 KB, application/x-gzip) 2015-10-29 11:03 UTC, nicolas	no flags	Details
engine log (11.36 MB, text/plain) 2016-03-10 11:00 UTC, Artyom	no flags	Details
another test (17.31 MB, text/plain) 2016-03-23 15:18 UTC, Artyom	no flags	Details
View All

Description nicolas 2015-10-28 06:25:39 UTC

Description of problem:

We have 3 oVirt nodes with cca. 40 VMs running on each of them. We needed to restart them one by one, so we put one of them into maintenance state. Despite we have vm_evenly_distributed as cluster policy (KSM, Memory balloon), it seems that oVirt choses only one host where to distribute all the VMs, which in our case leaded to an ugly crash with data loss.

Version-Release number of selected component (if applicable):

3.5.3.1

How reproducible:

When the destination node has a substantial number of VMs and resources are not enough to afford all the machines of the node put into maintenance.

Steps to Reproduce:
1. Node X is put into maintenance (with Y VMs running on it)
2. oVirt choses node Z to distribute the Y VMs, having more hosts where to migrate.

Actual results:

A node crash in our case.

Expected results:

Maybe VMs should be evenly distributed amongst the rest of active nodes?

Comment 1 Doron Fediuck 2015-10-29 07:51:04 UTC

Hi,
What do you mean by node crash?

Can you please provide the log files (engine, vdsm both hosts) from the relevant time?

Comment 2 nicolas 2015-10-29 11:03:13 UTC

Created attachment 1087447 [details]
Log file nearby the time of the crash

The crash time is at 08:44:12, I included a time before that and also after so you might see any relevant facts.

Comment 3 nicolas 2015-10-29 11:08:27 UTC

By crash I mean the following behavior:

1. Node 1 was put in standby
2. Node 3 was chosen as the destination for *all* the VMs running on 1
3. About 20 VMs were migrated from 1 to 3
4. Then migration stopped, and node 3 became unresponsive.
5. The node was unresponsive for about 2 minutes, then changed status to "Up" again, but with 0 machines running on it.

The machines that were running on 3 at crash time were powered off and some of them had data loss.

If you need some additional info don't hesitate to ask.

Comment 4 Martin Sivák 2016-02-16 10:51:09 UTC

I see that EvenGuestDistribution is used before the maintenance command, which should have tried to put the same number of VMs to each host. But it also reports that all hosts are over-utilized..

Can you please describe the cluster a bit more? How many VMs were there on the nodes before the maintenance? Was there any fencing configured? I see IPMI commands in the log and although we report them as not successful they might have caused a reboot.

Also a vdsm log from the affected host would help if you still have it...

Comment 5 nicolas 2016-02-17 16:32:10 UTC

By then, the cluster had 3 hosts, each of them with 64GB of RAM memory, 32CPUs and there were cca. 35-40 VMs running on each host, so the fact it was over-utilized might be correct.

I put one of the nodes on maintenance when I realized all VMs were being migrated to the same host. Fencing was indeed configured and the node was rebooted.

Unfortunately, the log rotated long ago so I have no longer that log file.

Comment 6 Martin Sivák 2016-02-18 14:06:07 UTC

What about the engine log where you cut that snippet from? It would help us to see what happened during the maintenance command. The log usually contains the reasons why hosts were not selected as migration destination.

My guess right now is that the host got too overloaded with all the VMs (there might be more reasons for that.. especially in 3.5) and did not respond in time. The engine then initiated fencing and rebooted it.

Comment 7 nicolas 2016-02-18 18:14:23 UTC

I don't have that log either, sorry, we were not sending those logs to a central server at that time...

Yes, I concur with your guess, but the weird thing is why all the VMs running on the node set on maintenance were then distributed on only one node having another one where balance VMs.

I guess there's not much you can investigate without logs, so it will be ok if you decide to close this bug. I can try to reproduce it on a test environment, however it may take some time as I need to gather some hardware to reproduce the issue.

Comment 8 Roy Golan 2016-02-21 09:37:12 UTC

Meital can you please help with reprodusing this with vm_evenly_distribute policy and 3 hosts?

Comment 9 meital avital 2016-02-22 14:12:48 UTC

Artyom, can you please try to reproduce this?

Comment 11 Artyom 2016-03-03 14:46:06 UTC

I checked it on rhevm-3.6.3.4-0.1.el6.noarch

1) started with 3 hosts(host_1, host_2, host_3)
2) start 40 vms(host_1: 14, host_2: 8, host_3: 18)
3) put host_2 to maintenance

all vms migrated to host_1

vm_evenly_distributed weight module prefer hosts with less vms for migration destination, looks like scheduler choose host_1 for all vms, because it has less vms(so if we really want to distribute it evenly between hosts we need implement the same mechanism as for scheduler memory with pending list of vms), but anyway it must not crush server, because memory filter(if host does not have enough memory, it must be filtered)

Comment 12 Roy Golan 2016-03-09 11:44:21 UTC

this needs to be tested with a larger difference between the host becaue we ended up with H1=22 vs H3=18 . Lets test with H1=16 H2=8 H3=8 and set H1 to maintenance and that case we should see where the 8 vms migrate to.

Comment 13 Artyom 2016-03-10 10:59:05 UTC

I believe you mean set to maintenance H2 or H3
Start with:
H1 20
H2 10
H3 10
Put to maintenance host H2
All vms migrated to H3
H1 20
H2 maintenance
H3 20

Comment 14 Artyom 2016-03-10 11:00:25 UTC

Created attachment 1134810 [details]
engine log

Comment 15 Artyom 2016-03-23 15:18:22 UTC

Created attachment 1139623 [details]
another test

Another test:
Start with cluster policy none:
H1 11
H2 39
H3 10

Change cluster policy to even vm distribution with default parameters and put H2 to maintenance:
H1 30
H2 0
H3 30

so looks like all works fine

Comment 16 Doron Fediuck 2016-03-27 08:34:31 UTC

Based on comment 15, we're unable to reproduce in version 3.6.3.4-0.1.el6.noarch.
If you can reproduce in this version or above please re-open with all relevant information.

Note You need to log in before you can comment on or make changes to this bug.