Hide Forgot
Description of problem: I have a rhv 4.1 system with 1 host and 100 VMS up and running. when i grep getVcpuNumaMemoryMapping on /var/log/vdsm/supervdsm.log i see 400 calls per minute. How reproducible: All the time Steps to Reproduce: 1.build rhv 4.1 system 2.start 100 VMS 3.choose a minute and run on 1 minute, for example run : grep "16:06" /var/log/vdsm/supervdsm.log |grep -c call Actual results: 400 calls requests per minute Expected results: Should be less, on large systems the logs get filled with the requests.
Guy - please attach supervdsm.log to show the issue. There are two issues here: 1. The log flood - supervdsm.log is quite flooded with the logs. Short sample below. 2. Is there a performance overhead to this? I did not see it on supervdsm (the real work is a simple read from /proc per vCPU) + simple map operation. The log snippet (for just 0.1 second): MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,000::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,001::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,014::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,014::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,018::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,019::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,029::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,030::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,034::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,034::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,046::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,046::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,049::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,049::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,061::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,061::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,065::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,065::supervdsmServer::92::SuperVdsm.ServerCallback::(wrapper) call getVcpuNumaMemoryMapping with () {} MainProcess|periodic/2::DEBUG::2016-11-27 17:53:06,077::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]} MainProcess|periodic/3::DEBUG::2016-11-27 17:53:06,077::supervdsmServer::99::SuperVdsm.ServerCallback::(wrapper) return getVcpuNumaMemoryMapping with {0: [0, 1, 2, 3]}
Looking at getVcpuNumaMemoryMapping() it is actually calling getVmPid() - which is a very lightweight function, but it does open a file on the local disk, which may be slow. I assume it is cached, but who knows. There's also a call to numa.getVcpuPid() - who knows how much it costs. All in all, though, doesn't look too bad. The log flood is a bit worse, especially as it provides very little value for debugging.
It's DEBUG log. change severity if needed, but for debugging I prefer to leave this wrap print to see what calls we get in supervdsm. The issue is the burst call to getVcpuNumaMemoryMapping. I'm not sure if its engine side issue or vdsm's. please consider to share engine and vdsm logs and assign to sla to check deeper
*** This bug has been marked as a duplicate of bug 1396910 ***
(In reply to Martin Sivák from comment #5) > > *** This bug has been marked as a duplicate of bug 1396910 *** Martin, should getVcpuNumaMemoryMapping be called 4 times per VM? This is the current situation in 4.1, therefore I'm not sure how this bug is a duplicate of a 3.6.x bug... Please explain. In a high scale-up situation (in this scenario - 100 VMs in a single host), you'd get hundreds of calls per minute - probably not very efficient?
I believe the NUMA sampling hasn't been touched since 3.6 and the issue in 4.1 has the same root cause as the one in the 3.6 bug. It is not efficient, but we need to check whether the sampling can get bulk data or not and how often it should be executed. I asked Michal Skrivanek to help us with that as they already handled the original report and we are very short on people atm.