Bug 1541777

Summary: PowerSaving policy does not balance VM's from host with over-utilized memory
Product: [oVirt] ovirt-engine Reporter: Artyom <alukiano>
Component: Backend.CoreAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.1.3CC: akrejcir, alukiano, bperkins, bugs, pagranat, ykaul
Target Milestone: ovirt-4.2.3Keywords: Automation
Target Release: ---Flags: rule-engine: ovirt-4.2+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-11 07:47:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
new_engine.log
none
engine.log for rhv-release-4.1.10-6-001.noarch
none
test.txt none

Description Artyom 2018-02-04 16:09:03 UTC
Description of problem:
PowerSaving policy does not balance VM's from host with over-utilized memory

Version-Release number of selected component (if applicable):
rhvm-4.2.1.3-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Below you can check system overview 
2.
3.

Actual results:
VM golden_env_mixed_virtio_0 does not migrate to the host host_mixed_2

Expected results:
VM golden_env_mixed_virtio_0 must migrate to the host host_mixed_2, because of memory balancing condition

Additional info:
System overview
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3177MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "1024MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10655MB", 
                        "id": "8499f5a5-4493-4543-8943-59a600ee68c9", 
                        "memory": "10655MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "258MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "258MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6559MB", 
                        "id": "5d193f25-41bb-49e1-9509-d57234f02de8", 
                        "memory": "6559MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15087MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "239MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "239MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "6405fe75-b642-4494-924c-c418ebe6a39c", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5535", 
            "MinFreeMemoryForUnderUtilized": "9631"
        }
    }, 
}

Comment 1 Artyom 2018-02-04 16:10:10 UTC
You can start look into the log from lines:
2018-02-04 17:49:31,277+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-20) [clusters_update_bc4d2fd2-8ac5-4cb7] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 2 Artyom 2018-02-04 16:12:13 UTC
Created attachment 1391022 [details]
engine log

Comment 3 Martin Sivák 2018-02-06 16:47:15 UTC
Artyom, can you make the golden_env_mixed_virtio_0 VM a bit smaller and try again? We won't migrate when the destination host becomes overloaded itself. And it is not just the 1GB we use in the equation, but also the static and dynamic overhead.

Comment 4 Artyom 2018-02-11 13:26:36 UTC
Created attachment 1394588 [details]
new_engine.log

Hi Martin, I reduced the memory of the VM golden_env_mixed_virtio_0 to 512Mb, but I still can see the same issue.

System Overview:
{
    "golden_env_mixed_1": {
        "hosts": {
            "host_mixed_1": {
                "id": "cabaf6e9-b730-4b95-adfe-8b8a8e3fd8c9", 
                "max_scheduling_memory": "3690MB", 
                "status": "up", 
                "vms": {
                    "HostedEngine": {
                        "guaranteed_memory": "8192MB", 
                        "id": "0deec603-834a-42ad-aa77-9271b0297d4a", 
                        "memory": "8192MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_0": {
                        "guaranteed_memory": "512MB", 
                        "id": "32c5cab7-4398-4b4a-84d3-c5a4f79d8ff7", 
                        "memory": "512MB", 
                        "status": "up"
                    }, 
                    "vm_overutilized_0": {
                        "guaranteed_memory": "10654MB", 
                        "id": "140f7b8f-7c3a-4a0d-8e2c-cb2c31efe8d1", 
                        "memory": "10654MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_2": {
                "id": "850b9a24-0807-47ef-a763-33864e8dbb48", 
                "max_scheduling_memory": "7202MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_1": {
                        "guaranteed_memory": "1024MB", 
                        "id": "5e277e7c-9ec9-4c7d-a08e-1c70e051d368", 
                        "memory": "1024MB", 
                        "status": "up"
                    }, 
                    "golden_env_mixed_virtio_5": {
                        "guaranteed_memory": "259MB", 
                        "id": "14c3fdd0-3038-4511-af4f-e9edbe1eb3c1", 
                        "memory": "259MB", 
                        "status": "up"
                    }, 
                    "vm_normalutilized_1": {
                        "guaranteed_memory": "6558MB", 
                        "id": "e4e8d537-bd28-44d8-8093-925bc192a880", 
                        "memory": "6558MB", 
                        "status": "up"
                    }
                }
            }, 
            "host_mixed_3": {
                "id": "a6d3d3e8-9465-47bd-bca0-03e93bdf17c8", 
                "max_scheduling_memory": "15086MB", 
                "status": "up", 
                "vms": {
                    "golden_env_mixed_virtio_4": {
                        "guaranteed_memory": "240MB", 
                        "id": "9e801f02-ff77-410a-a0e2-667a2865ee1f", 
                        "memory": "240MB", 
                        "status": "up"
                    }
                }
            }
        }, 
        "id": "77cb9110-0734-11e8-aac6-001a4a16109f", 
        "policy": {
            "custom_power_saving_memory": {
                "balances": {
                    "OptimalForPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e151"
                    }
                }, 
                "filters": {
                    "CPUOverloaded": {
                        "id": "98842bc5-4094-4b83-8224-7b50f86a94c9"
                    }, 
                    "CpuPinning": {
                        "id": "6d636bf6-a35c-4f9d-b68d-0731f731cddc"
                    }, 
                    "HostDevice": {
                        "id": "728a21f1-f97e-4d32-bc3e-b3cc49756abb"
                    }, 
                    "Memory": {
                        "id": "c9ddbb34-0e1d-4061-a8d7-b0893fa80932"
                    }, 
                    "Migration": {
                        "id": "e659c871-0bf1-4ccc-b748-f28f5d08ddda"
                    }, 
                    "Network": {
                        "id": "72163d1c-9468-4480-99d9-0888664eb143"
                    }, 
                    "PinToHost": {
                        "id": "12262ab6-9690-4bc3-a2b3-35573b172d54"
                    }, 
                    "VmAffinityGroups": {
                        "id": "84e6ddee-ab0d-42dd-82f0-c297779db566"
                    }, 
                    "VmToHostsAffinityGroups": {
                        "id": "e69808a9-8a41-40f1-94ba-dd5d385d82d8"
                    }
                }, 
                "id": "f07cacab-e3cc-4c8e-ba40-514ce0132b40", 
                "weights": {
                    "OptimalForCpuPowerSaving": {
                        "id": "736999d0-1023-46a4-9a75-1316ed50e15b"
                    }, 
                    "OptimalForMemoryPowerSaving": {
                        "id": "9dfe6086-646d-43b8-8eef-4d94de8472c8"
                    }, 
                    "PreferredHosts": {
                        "id": "591cdb81-ba67-45b4-9642-e28f61a97d57"
                    }
                }
            }
        }, 
        "policy_params": {
            "CpuOverCommitDurationMinutes": "1", 
            "HighUtilization": "75", 
            "LowUtilization": "35", 
            "MaxFreeMemoryForOverUtilized": "5534", 
            "MinFreeMemoryForUnderUtilized": "9630"
        }
    },
}

You can start looking at log from line:
2018-02-11 15:22:33,521+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [clusters_update_fedaac2b-9540-4176] EVENT_ID: USER_UPDATE_CLUSTER(811), Host cluster golden_env_mixed_1 was updated by admin@internal-authz

Comment 5 Polina 2018-03-19 10:07:11 UTC
The same problem is reproduced on 4.1 build (tested on rhv-release-4.1.10-6-001.noarch)
the engine.log with scheduler debug is attached. Please start looking from 2018-03-19 11:43:28

Comment 6 Polina 2018-03-19 10:08:04 UTC
Created attachment 1409769 [details]
engine.log for rhv-release-4.1.10-6-001.noarch

Comment 7 Polina 2018-04-26 11:40:05 UTC
Problem still happens on rhv-release-4.2.3-2-001.noarch

Comment 10 Polina 2018-04-29 15:15:00 UTC
I've tested it on the latest build rhv-release-4.2.3-4-001.noarch and see that the problem is solved.

the test steps are in the attached test.txt

Comment 11 Polina 2018-04-29 15:15:40 UTC
Created attachment 1428415 [details]
test.txt

Comment 12 Sandro Bonazzola 2018-04-30 07:59:05 UTC
This bug is verified in 4.2.3 but it's targeted to 4.2.4.
Can you please check and eventually move target milestone to 4.2.3?

Comment 13 Sandro Bonazzola 2018-05-11 07:47:48 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.