Bug 1398953

Summary: [scale] getVcpuNumaMemoryMapping get called 4 time per VM per 1 minute - flooding supervdsm.log when many VMs (100) are running
Product: [oVirt] vdsm Reporter: guy chen <guchen>
Component: SuperVDSMAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED DUPLICATE QA Contact: guy chen <guchen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.18.15.2CC: bugs, eberman, guchen, msivak
Target Milestone: ---Keywords: Performance
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 12:39:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description guy chen 2016-11-27 15:29:06 UTC
Description of problem:
I have a rhv 4.1 system with 1 host and 100 VMS up and running.
when i grep getVcpuNumaMemoryMapping on /var/log/vdsm/supervdsm.log i see 400 calls per minute.

How reproducible:

All the time

Steps to Reproduce:
1.build rhv 4.1 system
2.start 100 VMS
3.choose a minute and run on 1 minute, for example run  : grep "16:06" /var/log/vdsm/supervdsm.log |grep -c call

Actual results:

400 calls requests per minute

Expected results:

Should be less, on large systems the logs get filled with the requests.

Comment 1 Yaniv Kaul 2016-11-27 15:59:44 UTC
Guy - please attach supervdsm.log to show the issue.
There are two issues here:
1. The log flood - supervdsm.log is quite flooded with the logs. Short sample below.
2. Is there a performance overhead to this? I did not see it on supervdsm (the real work is a simple read from /proc per vCPU) + simple map operation.

The log snippet (for just 0.1 second):

MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,000::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,001::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,014::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,014::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,018::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,019::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,029::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,030::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,034::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,034::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,046::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,046::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,049::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,049::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,061::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,061::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,065::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,065::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {}
MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,077::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,077::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}

Comment 2 Yaniv Kaul 2016-11-27 16:06:45 UTC
Looking at getVcpuNumaMemoryMapping() it is actually calling getVmPid() - which is a very lightweight function, but it does open a file on the local disk, which may be slow. I assume it is cached, but who knows.
There's also a call to numa.getVcpuPid() - who knows how much it costs.

All in all, though, doesn't look too bad. The log flood is a bit worse, especially as it provides very little value for debugging.

Comment 4 Yaniv Bronhaim 2016-11-28 10:55:37 UTC
It's DEBUG log. change severity if needed, but for debugging I prefer to leave this wrap print to see what calls we get in supervdsm. The issue is the burst call to getVcpuNumaMemoryMapping. I'm not sure if its engine side issue or vdsm's. please consider to share engine and vdsm logs and assign to sla to check deeper

Comment 5 Martin Sivák 2016-11-28 12:39:01 UTC

*** This bug has been marked as a duplicate of bug 1396910 ***

Comment 6 Yaniv Kaul 2016-12-05 13:21:10 UTC
(In reply to Martin Sivák from comment #5)
> 
> *** This bug has been marked as a duplicate of bug 1396910 ***

Martin, should getVcpuNumaMemoryMapping be called 4 times per VM?
This is the current situation in 4.1, therefore I'm not sure how this bug is a duplicate of a 3.6.x bug... Please explain. In a high scale-up situation (in this scenario - 100 VMs in a single host), you'd get hundreds of calls per minute - probably not very efficient?

Comment 7 Martin Sivák 2016-12-05 13:32:59 UTC
I believe the NUMA sampling hasn't been touched since 3.6 and the issue in 4.1 has the same root cause as the one in the 3.6 bug.

It is not efficient, but we need to check whether the sampling can get bulk data or not and how often it should be executed. I asked Michal Skrivanek to help us with that as they already handled the original report and we are very short on people atm.