1541777 – PowerSaving policy does not balance VM's from host with over-utilized memory

Bug 1541777 - PowerSaving policy does not balance VM's from host with over-utilized memory

Summary: PowerSaving policy does not balance VM's from host with over-utilized memory

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.2.1.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.2.3
Target Release:	---
Assignee:	Andrej Krejcir
QA Contact:	Polina
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-04 16:09 UTC by Artyom
Modified:	2018-05-11 07:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-05-11 07:47:48 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+

Attachments	(Terms of Use)
engine log (5.12 MB, text/plain) 2018-02-04 16:12 UTC, Artyom	no flags	Details
new_engine.log (7.81 MB, text/plain) 2018-02-11 13:26 UTC, Artyom	no flags	Details
engine.log for rhv-release-4.1.10-6-001.noarch (1.76 MB, text/plain) 2018-03-19 10:08 UTC, Polina	no flags	Details
test.txt (14.92 KB, text/plain) 2018-04-29 15:15 UTC, Polina	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	87683	master	MERGED	core: Make a few methods in SlaValidator static	2020-02-26 12:19:59 UTC
oVirt gerrit	87684	master	MERGED	core: Remove base class EvenDistributionWeightPolicyUnit	2020-02-26 12:20:01 UTC
oVirt gerrit	87685	master	MERGED	core: power saving weights gives bad scores to hosts below and above utilization thresholds	2020-02-26 12:20:01 UTC
oVirt gerrit	88335	master	ABANDONED	core: PowerSavingCPUWeightPolicyUnit gives bad scores to overutilized hosts	2020-02-26 12:20:01 UTC
oVirt gerrit	90334	ovirt-engine-4.2	MERGED	core: Make a few methods in SlaValidator static	2020-02-26 12:20:01 UTC
oVirt gerrit	90335	ovirt-engine-4.2	MERGED	core: Remove base class EvenDistributionWeightPolicyUnit	2020-02-26 12:20:01 UTC
oVirt gerrit	90336	ovirt-engine-4.2	MERGED	core: power saving weights gives bad scores to hosts below and above utilization thresholds	2020-02-26 12:19:58 UTC

Description Artyom 2018-02-04 16:09:03 UTC

Description of problem:
PowerSaving policy does not balance VM's from host with over-utilized memory

Version-Release number of selected component (if applicable):
rhvm-4.2.1.3-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Below you can check system overview 
2.
3.

Actual results:
VM golden_env_mixed_virtio_0 does not migrate to the host host_mixed_2

Expected results:
VM golden_env_mixed_virtio_0 must migrate to the host host_mixed_2, because of memory balancing condition

Additional info:
System overview
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3177MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "1024MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10655MB", 
                        "id": "8499f5a5-4493-4543-8943-59a600ee68c9", 
                        "memory": "10655MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "258MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "258MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6559MB", 
                        "id": "5d193f25-41bb-49e1-9509-d57234f02de8", 
                        "memory": "6559MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15087MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "239MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "239MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "6405fe75-b642-4494-924c-c418ebe6a39c", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5535", 
            "MinFreeMemoryForUnderUtilized": "9631"
        }
    }, 
}

Comment 1 Artyom 2018-02-04 16:10:10 UTC

You can start look into the log from lines:
2018-02-04 17:49:31,277+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [clusters_update_bc4d2fd2-8ac5-4cb7] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 2 Artyom 2018-02-04 16:12:13 UTC

Created attachment 1391022 [details]
engine log

Comment 3 Martin Sivák 2018-02-06 16:47:15 UTC

Artyom, can you make the golden_env_mixed_virtio_0 VM a bit smaller and try again? We won't migrate when the destination host becomes overloaded itself. And it is not just the 1GB we use in the equation, but also the static and dynamic overhead.

Comment 4 Artyom 2018-02-11 13:26:36 UTC

Created attachment 1394588 [details]
new_engine.log

Hi Martin, I reduced the memory of the VM golden_env_mixed_virtio_0 to 512Mb, but I still can see the same issue.

System Overview:
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3690MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "512MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "512MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10654MB", 
                        "id": "140f7b8f-7c3a-4a0d-8e2c-cb2c31efe8d1", 
                        "memory": "10654MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "259MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "259MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6558MB", 
                        "id": "e4e8d537-bd28-44d8-8093-925bc192a880", 
                        "memory": "6558MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15086MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "240MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "240MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "f07cacab-e3cc-4c8e-ba40-514ce0132b40", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5534", 
            "MinFreeMemoryForUnderUtilized": "9630"
        }
    },
}

You can start looking at log from line:
2018-02-11 15:22:33,521+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [clusters_update_fedaac2b-9540-4176] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 5 Polina 2018-03-19 10:07:11 UTC

The same problem is reproduced on 4.1 build (tested on rhv-release-4.1.10-6-001.noarch)
the engine.log with scheduler debug is attached. Please start looking from 2018-03-19 11:43:28

Comment 6 Polina 2018-03-19 10:08:04 UTC

Created attachment 1409769 [details]
engine.log for rhv-release-4.1.10-6-001.noarch

Comment 7 Polina 2018-04-26 11:40:05 UTC

Problem still happens on rhv-release-4.2.3-2-001.noarch

Comment 10 Polina 2018-04-29 15:15:00 UTC

I've tested it on the latest build rhv-release-4.2.3-4-001.noarch and see that the problem is solved.

the test steps are in the attached test.txt

Comment 11 Polina 2018-04-29 15:15:40 UTC

Created attachment 1428415 [details]
test.txt

Comment 12 Sandro Bonazzola 2018-04-30 07:59:05 UTC

This bug is verified in 4.2.3 but it's targeted to 4.2.4.
Can you please check and eventually move target milestone to 4.2.3?

Comment 13 Sandro Bonazzola 2018-05-11 07:47:48 UTC

This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.