Bug 1692355 - Memory overcommitted VMs are not scheduled on different hosts
Summary: Memory overcommitted VMs are not scheduled on different hosts
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.2.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.4.2
: 4.4.2.1
Assignee: Liran Rotenberg
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-25 12:43 UTC by Polina
Modified: 2020-09-18 07:12 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.4.2.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-18 07:12:51 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
mtessun: planning_ack+
ahadas: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
engine with debug, vdsm, qemu (1.85 MB, application/gzip)
2019-03-25 12:43 UTC, Polina
no flags Details
engine & vdsm logs (3.75 MB, application/gzip)
2020-07-09 11:58 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 110262 0 master MERGED core: fix race on VdsManager::lastVmsList 2020-08-18 15:34:32 UTC

Description Polina 2019-03-25 12:43:09 UTC
Created attachment 1547683 [details]
engine with debug, vdsm, qemu

Description of problem: two VMs configured with memory ==  free memory on hosts, so that they can't start on the same host. Also, there is a positive affinity rule for these VMs. Start VMs asynchronously by rest API. Sometimes they start on the same host 
The test is part os regression: a goal of the test - Change memory of VM's to prevent the possibility of starting two VM's on the same host and check if soft positive affinity does not prevent this (polarion link and test log will be added in private attachment). 

Version-Release number of selected component (if applicable): ovirt-engine-4.3.2.1-0.1.el7.noarch


How reproducible: happened in the last automation run. and I succeed to reproduce it sometimes (~40%)


Steps to Reproduce:
1. configures two VMs with the memories (xml output http://pastebin.test.redhat.com/743629):
30359 - VM 1
30248 - VM 2

while on hosts we have :
host1 Max free Memory for scheduling new VMs: 31624 MB
host2 Max free Memory for scheduling new VMs: 23216 MB (HE VM on it)
host3 Max free Memory for scheduling new VMs: 31509 MB
2.Create positive affinity group for these VMs (http://pastebin.test.redhat.com/743635)
3. Run both VMs by rest api asynchronously.

Actual results: Vms start on the same host

Expected results: VMs must be scheduled on different hosts.


Additional info: for QE - reproduced with the test: TestSoftPositiveAffinityVsMemoryFilter in rhevmtests/compute/sla/scheduler_tests/affinity/affinity_test.py on 5.yaml environment

Comment 2 RHEL Program Management 2019-03-26 00:06:04 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Polina 2019-05-20 10:15:01 UTC
this occurred also in the last automation run for rhv-4.2.10-1

Comment 4 Michal Skrivanek 2020-03-17 12:17:05 UTC
deprecating SLA team usage, moving to Virt

Comment 5 Michal Skrivanek 2020-03-19 13:11:34 UTC
doesn't look like a recent regression, removing blocker flag.

still need to check what's wrong with affinity...moving to 4.3.3 for investigation just in case it's important...

Comment 6 RHEL Program Management 2020-03-19 13:11:43 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 7 Liran Rotenberg 2020-03-22 15:12:21 UTC
After looking into the logs what can be seen is:
First VM:
2019-03-25 12:39:55,299+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-74) [vms_syncAction_69c91583-3621-4342] Scheduling started, correlation Id: vms_syncAction_69c91583-3621-4342
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasPhysMemoryToRunVM: host 'host_mixed_3'; free memory is : 25951.0 MB (+ 0 MB pending); free swap is: 16063 MB, required memory is 30482.0 MB; Guest overhead 123 MB
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30482.0 <= ???  42014.0
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasPhysMemoryToRunVM: host 'host_mixed_2'; free memory is : 18036.0 MB (+ 0 MB pending); free swap is: 16127 MB, required memory is 30482.0 MB; Guest overhead 123 MB
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30482.0 <= ???  34163.0
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasPhysMemoryToRunVM: host 'host_mixed_1'; free memory is : 25913.0 MB (+ 0 MB pending); free swap is: 16127 MB, required memory is 30482.0 MB; Guest overhead 123 MB
2019-03-25 12:39:55,313+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30482.0 <= ???  42040.0
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasOvercommitMemoryToRunVM: host 'host_mixed_3'; max scheduling memory : 31509.0 MB; required memory is 30562.0 MB; Guest overhead 203 MB
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30562.0 <= ???  31509.0
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasOvercommitMemoryToRunVM: host 'host_mixed_2'; max scheduling memory : 23216.0 MB; required memory is 30562.0 MB; Guest overhead 203 MB
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30562.0 <= ???  23216.0
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] hasOvercommitMemoryToRunVM: host 'host_mixed_1'; max scheduling memory : 31624.0 MB; required memory is 30562.0 MB; Guest overhead 203 MB
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_69c91583-3621-4342] 30562.0 <= ???  31624.0
2019-03-25 12:39:55,314+02 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.MemoryPolicyUnit] (default task-74) [vms_syncAction_69c91583-3621-4342] Host 'host_mixed_2' is already too close to the memory overcommitment limit. It can only accept 23216.0 MB of additional memory load.
2019-03-25 12:39:55,314+02 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-74) [vms_syncAction_69c91583-3621-4342] Candidate host 'host_mixed_2' ('7ef3d318-90fe-4487-a570-c0c184e7abb2') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: vms_syncAction_69c91583-3621-4342)

Which is alright and the VM started on 34f317a3-17b1-48ee-a8bb-710093cddf1d, host_mixed_3.

As for the second VM:
2019-03-25 12:40:06,549+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-74) [vms_syncAction_161039e1-fd13-43bc] Scheduling started, correlation Id: vms_syncAc
tion_161039e1-fd13-43bc
2019-03-25 12:40:06,561+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasPhysMemoryToRunVM: host 'host_mixed_3'; free mem
ory is : 25753.0 MB (+ 0 MB pending); free swap is: 16063 MB, required memory is 30371.0 MB; Guest overhead 123 MB
2019-03-25 12:40:06,561+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30371.0 <= ???  41816.0
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasPhysMemoryToRunVM: host 'host_mixed_2'; free mem
ory is : 18036.0 MB (+ 0 MB pending); free swap is: 16127 MB, required memory is 30371.0 MB; Guest overhead 123 MB
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30371.0 <= ???  34163.0
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasPhysMemoryToRunVM: host 'host_mixed_1'; free mem
ory is : 25924.0 MB (+ 0 MB pending); free swap is: 16127 MB, required memory is 30371.0 MB; Guest overhead 123 MB
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30371.0 <= ???  42051.0
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasOvercommitMemoryToRunVM: host 'host_mixed_3'; ma
x scheduling memory : 31509.0 MB; required memory is 30451.0 MB; Guest overhead 203 MB
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30451.0 <= ???  31509.0
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasOvercommitMemoryToRunVM: host 'host_mixed_2'; ma
x scheduling memory : 23216.0 MB; required memory is 30451.0 MB; Guest overhead 203 MB
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30451.0 <= ???  23216.0
2019-03-25 12:40:06,563+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasOvercommitMemoryToRunVM: host 'host_mixed_1'; ma
x scheduling memory : 31624.0 MB; required memory is 30451.0 MB; Guest overhead 203 MB
2019-03-25 12:40:06,563+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30451.0 <= ???  31624.0
2019-03-25 12:40:06,563+02 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.MemoryPolicyUnit] (default task-74) [vms_syncAction_161039e1-fd13-43bc] Host 'host_mixed_2' is already too 
close to the memory overcommitment limit. It can only accept 23216.0 MB of additional memory load.
2019-03-25 12:40:06,563+02 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-74) [vms_syncAction_161039e1-fd13-43bc] Candidate host 'host_mixed_2' ('7ef3d318-90fe-
4487-a570-c0c184e7abb2') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: vms_syncAction_161039e1-fd13-43bc)

The second VM started on 34f317a3-17b1-48ee-a8bb-710093cddf1d, host_mixed_3 as well.

We can see that that the first VM didn't change the memory host_mixed_3 has:
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] hasOvercommitMemoryToRunVM: host 'host_mixed_3'; ma
x scheduling memory : 31509.0 MB; required memory is 30451.0 MB; Guest overhead 203 MB
2019-03-25 12:40:06,562+02 DEBUG [org.ovirt.engine.core.bll.scheduling.SlaValidator] (default task-74) [vms_syncAction_161039e1-fd13-43bc] 30451.0 <= ???  31509.0

I couldn't reproduce it on my environment and asked Polina for help with it. From initial point of view, it looks like the host statistics didn't update when we expect it will be.

Comment 8 Liran Rotenberg 2020-03-22 15:57:17 UTC
Might be solved in BZ 1651406 (https://gerrit.ovirt.org/c/98270 and https://gerrit.ovirt.org/c/98275/).

Comment 10 Lukas Svaty 2020-04-01 09:35:15 UTC
moving to 4.4.0, due to regression keyword

Comment 12 Lukas Svaty 2020-04-01 11:35:30 UTC
low reproducibility -> removing Regression

Comment 14 Polina 2020-07-09 11:58:20 UTC
Created attachment 1700426 [details]
engine & vdsm logs

logs for the reproduce with the Liran's jar

Comment 15 Liran Rotenberg 2020-07-09 12:07:41 UTC
From the log we see that there is override in:
2020-07-09 14:15:29,129+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] freeMemory: 7649.0 getMemComm
ited: 23978 getResMem: 321
2020-07-09 14:15:29,129+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] freeMemory: 7649.0 getMemComm
ited: 23978 getResMem: 321
2020-07-09 14:15:29,129+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [] maxSchedulingMemory:7649.0
getFreeVirtualMemory:7649.0
getPendingVmemSize:0
[org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.VDS.calculateFreeSchedulingMemoryCache(VDS.java:1327)
 org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.VDS.setMaxVdsMemoryOverCommit(VDS.java:971)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl.lambda$new$0(VdsDaoImpl.java:335)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:93)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:667)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:605)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:657)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:688)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:741)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.doExecute(PostgresDbEngineDialect.java:118)
 org.springframework.4.RELEASE//org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:198)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:135)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeReadList(SimpleJdbcCallsHandler.java:105)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:56)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl$Proxy$_$$_WeldSubclass.get(Unknown Source)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:51)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl$Proxy$_$$_WeldSubclass.get$$super(Unknown Source)
 jdk.internal.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
 java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.base/java.lang.reflect.Method.invoke(Method.java:566)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:51)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:78)
 org.ovirt.engine.core.common//org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12)
 jdk.internal.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
 java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.base/java.lang.reflect.Method.invoke(Method.java:566)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79)
 org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68)
 org.ovirt.engine.core.dal//org.ovirt.engine.core.dao.VdsDaoImpl$Proxy$_$$_WeldSubclass.get(Unknown Source)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.VdsManager.refreshCachedVds(VdsManager.java:398)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.VdsManager.refreshImpl(VdsManager.java:279)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.VdsManager.refresh(VdsManager.java:265)
 java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
 java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 java.base/java.lang.Thread.run(Thread.java:834)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)]
2020-07-09 14:15:29,207+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] freeMemory: 31562.0 getMemCommited: 65 getResMem: 321
2020-07-09 14:15:29,207+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] freeMemory: 31562.0 getMemCommited: 65 getResMem: 321
2020-07-09 14:15:29,207+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] maxSchedulingMemory:31562.0
getFreeVirtualMemory:31562.0
getPendingVmemSize:0
[org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.VDS.calculateFreeSchedulingMemoryCache(VDS.java:1327)
 org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.VDS.setMemCommited(VDS.java:650)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshCommitedMemory(HostMonitoring.java:891)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:165)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring$GetStatsAsyncCallback.onResponse(HostMonitoring.java:521)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand$GetStatsVDSCommandCallback.onResponse(GetStatsAsyncVDSCommand.java:50)
 org.ovirt.vdsm-jsonrpc-java//org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.lambda$processResponse$1(JsonRpcClient.java:190)
 java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedFutureTask.run(ManagedFutureTask.java:141)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$101(ManagedScheduledThreadPoolExecutor.java:383)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:532)
 java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 java.base/java.lang.Thread.run(Thread.java:834)
 org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)]
2020-07-09 14:15:29,506+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-97) [] freeMemory: 31562.0 getMemCommited: 65 getResMem: 321
2020-07-09 14:15:29,506+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-97) [] freeMemory: 31562.0 getMemCommited: 65 getResMem: 321
2020-07-09 14:15:29,506+03 ERROR [org.ovirt.engine.core.common.businessentities.VDS] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-97) [] maxSchedulingMemory:31562.0
getFreeVirtualMemory:31562.0
getPendingVmemSize:0

...

This suggesting these lines problematic:
 org.ovirt.engine.core.common//org.ovirt.engine.core.common.businessentities.VDS.setMemCommited(VDS.java:650)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshCommitedMemory(HostMonitoring.java:891)
 deployment.engine.ear//org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:165)

In those methods, the line:
saveVdsDynamic |= refreshCommitedMemory(vds, vdsManager.getLastVmsList(), resourceManager);
From this, vdsManager.getLastVmsList() looks suspicious as we "lose" the VM in the calculation.

Comment 16 Polina 2020-08-12 15:47:37 UTC
verified on ovirt-engine-4.4.2.1-0.15.el8ev.noarch by running automation rhevmtests.compute.sla.scheduler_tests.affinity.affinity_test.TestSoftPositiveAffinityVsMemoryFilter.

Comment 17 Sandro Bonazzola 2020-09-18 07:12:51 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.