Bug 1541777 - PowerSaving policy does not balance VM's from host with over-utilized memory
Summary: PowerSaving policy does not balance VM's from host with over-utilized memory
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.2.1.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.2.3
: ---
Assignee: Andrej Krejcir
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-04 16:09 UTC by Artyom
Modified: 2018-05-11 07:47 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-05-11 07:47:48 UTC
oVirt Team: SLA
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
engine log (5.12 MB, text/plain)
2018-02-04 16:12 UTC, Artyom
no flags Details
new_engine.log (7.81 MB, text/plain)
2018-02-11 13:26 UTC, Artyom
no flags Details
engine.log for rhv-release-4.1.10-6-001.noarch (1.76 MB, text/plain)
2018-03-19 10:08 UTC, Polina
no flags Details
test.txt (14.92 KB, text/plain)
2018-04-29 15:15 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 87683 0 master MERGED core: Make a few methods in SlaValidator static 2020-02-26 12:19:59 UTC
oVirt gerrit 87684 0 master MERGED core: Remove base class EvenDistributionWeightPolicyUnit 2020-02-26 12:20:01 UTC
oVirt gerrit 87685 0 master MERGED core: power saving weights gives bad scores to hosts below and above utilization thresholds 2020-02-26 12:20:01 UTC
oVirt gerrit 88335 0 master ABANDONED core: PowerSavingCPUWeightPolicyUnit gives bad scores to overutilized hosts 2020-02-26 12:20:01 UTC
oVirt gerrit 90334 0 ovirt-engine-4.2 MERGED core: Make a few methods in SlaValidator static 2020-02-26 12:20:01 UTC
oVirt gerrit 90335 0 ovirt-engine-4.2 MERGED core: Remove base class EvenDistributionWeightPolicyUnit 2020-02-26 12:20:01 UTC
oVirt gerrit 90336 0 ovirt-engine-4.2 MERGED core: power saving weights gives bad scores to hosts below and above utilization thresholds 2020-02-26 12:19:58 UTC

Description Artyom 2018-02-04 16:09:03 UTC
Description of problem:
PowerSaving policy does not balance VM's from host with over-utilized memory

Version-Release number of selected component (if applicable):
rhvm-4.2.1.3-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Below you can check system overview 
2.
3.

Actual results:
VM golden_env_mixed_virtio_0 does not migrate to the host host_mixed_2

Expected results:
VM golden_env_mixed_virtio_0 must migrate to the host host_mixed_2, because of memory balancing condition

Additional info:
System overview
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3177MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "1024MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10655MB", 
                        "id": "8499f5a5-4493-4543-8943-59a600ee68c9", 
                        "memory": "10655MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "258MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "258MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6559MB", 
                        "id": "5d193f25-41bb-49e1-9509-d57234f02de8", 
                        "memory": "6559MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15087MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "239MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "239MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "6405fe75-b642-4494-924c-c418ebe6a39c", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5535", 
            "MinFreeMemoryForUnderUtilized": "9631"
        }
    }, 
}

Comment 1 Artyom 2018-02-04 16:10:10 UTC
You can start look into the log from lines:
2018-02-04 17:49:31,277+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [clusters_update_bc4d2fd2-8ac5-4cb7] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 2 Artyom 2018-02-04 16:12:13 UTC
Created attachment 1391022 [details]
engine log

Comment 3 Martin Sivák 2018-02-06 16:47:15 UTC
Artyom, can you make the golden_env_mixed_virtio_0 VM a bit smaller and try again? We won't migrate when the destination host becomes overloaded itself. And it is not just the 1GB we use in the equation, but also the static and dynamic overhead.

Comment 4 Artyom 2018-02-11 13:26:36 UTC
Created attachment 1394588 [details]
new_engine.log

Hi Martin, I reduced the memory of the VM golden_env_mixed_virtio_0 to 512Mb, but I still can see the same issue.

System Overview:
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3690MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "512MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "512MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10654MB", 
                        "id": "140f7b8f-7c3a-4a0d-8e2c-cb2c31efe8d1", 
                        "memory": "10654MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "259MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "259MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6558MB", 
                        "id": "e4e8d537-bd28-44d8-8093-925bc192a880", 
                        "memory": "6558MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15086MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "240MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "240MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "f07cacab-e3cc-4c8e-ba40-514ce0132b40", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5534", 
            "MinFreeMemoryForUnderUtilized": "9630"
        }
    },
}

You can start looking at log from line:
2018-02-11 15:22:33,521+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [clusters_update_fedaac2b-9540-4176] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 5 Polina 2018-03-19 10:07:11 UTC
The same problem is reproduced on 4.1 build (tested on rhv-release-4.1.10-6-001.noarch)
the engine.log with scheduler debug is attached. Please start looking from 2018-03-19 11:43:28

Comment 6 Polina 2018-03-19 10:08:04 UTC
Created attachment 1409769 [details]
engine.log for rhv-release-4.1.10-6-001.noarch

Comment 7 Polina 2018-04-26 11:40:05 UTC
Problem still happens on rhv-release-4.2.3-2-001.noarch

Comment 10 Polina 2018-04-29 15:15:00 UTC
I've tested it on the latest build rhv-release-4.2.3-4-001.noarch and see that the problem is solved.

the test steps are in the attached test.txt

Comment 11 Polina 2018-04-29 15:15:40 UTC
Created attachment 1428415 [details]
test.txt

Comment 12 Sandro Bonazzola 2018-04-30 07:59:05 UTC
This bug is verified in 4.2.3 but it's targeted to 4.2.4.
Can you please check and eventually move target milestone to 4.2.3?

Comment 13 Sandro Bonazzola 2018-05-11 07:47:48 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.