Description of problem: MoM 0.5.5 uses an excessive amountof calls to 'vmGetIoTunePolicy' to Vdsm, causing high load on the Vdsm server. Considering one box with 110 active VMs, we see 6329 calls in 15m19s (roughly 6.88 calls per second!) Vdsm load with MOM running, reported by the Vdsm health monitor: Thread-129::DEBUG::2016-08-12 07:01:25,708::health::122::health::(_check_resources) user=27.78%, sys=9.55%, rss=412200 kB (+272), threads=47 Thread-129::DEBUG::2016-08-12 07:02:25,885::health::122::health::(_check_resources) user=27.97%, sys=9.00%, rss=412360 kB (+160), threads=50 Thread-129::DEBUG::2016-08-12 07:03:26,065::health::122::health::(_check_resources) user=25.28%, sys=8.43%, rss=410700 kB (-1660), threads=47 Thread-129::DEBUG::2016-08-12 07:04:26,238::health::122::health::(_check_resources) user=24.88%, sys=8.08%, rss=410456 kB (-244), threads=47 Thread-129::DEBUG::2016-08-12 07:05:26,413::health::122::health::(_check_resources) user=25.60%, sys=8.47%, rss=410116 kB (-340), threads=47 Thread-129::DEBUG::2016-08-12 07:06:26,591::health::122::health::(_check_resources) user=34.38%, sys=11.73%, rss=410764 kB (+648), threads=47 Thread-129::DEBUG::2016-08-12 07:07:26,766::health::122::health::(_check_resources) user=27.57%, sys=8.82%, rss=410748 kB (-16), threads=47 Thread-129::DEBUG::2016-08-12 07:08:26,947::health::122::health::(_check_resources) user=28.97%, sys=9.28%, rss=411192 kB (+444), threads=48 Thread-129::DEBUG::2016-08-12 07:09:27,134::health::122::health::(_check_resources) user=28.60%, sys=9.37%, rss=411864 kB (+672), threads=47 Thread-129::DEBUG::2016-08-12 07:10:27,312::health::122::health::(_check_resources) user=30.82%, sys=10.32%, rss=411840 kB (-24), threads=50 Vdsm load with MOM disabled: Thread-129::DEBUG::2016-08-12 07:20:29,174::health::122::health::(_check_resources) user=6.30%, sys=2.82%, rss=410392 kB (+40), threads=47 Thread-129::DEBUG::2016-08-12 07:21:29,364::health::122::health::(_check_resources) user=6.28%, sys=2.77%, rss=411708 kB (+1316), threads=47 Thread-129::DEBUG::2016-08-12 07:22:29,542::health::122::health::(_check_resources) user=6.12%, sys=2.63%, rss=410372 kB (-1336), threads=47 Thread-129::DEBUG::2016-08-12 07:23:29,743::health::122::health::(_check_resources) user=6.47%, sys=2.90%, rss=410372 kB (+0), threads=47 Thread-129::DEBUG::2016-08-12 07:24:29,925::health::122::health::(_check_resources) user=6.23%, sys=2.78%, rss=411692 kB (+1320), threads=47 Thread-129::DEBUG::2016-08-12 07:25:30,131::health::122::health::(_check_resources) user=6.05%, sys=2.80%, rss=411804 kB (+112), threads=47 Thread-129::DEBUG::2016-08-12 07:26:30,315::health::122::health::(_check_resources) user=6.20%, sys=2.88%, rss=411812 kB (+8), threads=47 Thread-129::DEBUG::2016-08-12 07:27:30,493::health::122::health::(_check_resources) user=6.07%, sys=2.68%, rss=411796 kB (-16), threads=47 Thread-129::DEBUG::2016-08-12 07:28:30,671::health::122::health::(_check_resources) user=6.08%, sys=2.92%, rss=410584 kB (-1212), threads=47 Thread-129::DEBUG::2016-08-12 07:29:30,849::health::122::health::(_check_resources) user=6.33%, sys=2.88%, rss=412068 kB (+1484), threads=47 Version-Release number of selected component (if applicable): MOM 0.5.5 Vdsm 4.18.11 How reproducible: 100% Steps to Reproduce: 1. Run 110 VMs to have comparable numbers has above Actual results: high load on Vdsm Expected results: less load on Vdsm Additional info: Will probably requires new API to Vdsm
Happens also with vdsm 4.18.10 and 4.18.6
(In reply to Francesco Romani from comment #1) > Happens also with vdsm 4.18.10 and 4.18.6 Can you elaborate please what in your box? CPU's Memory etc.. Can we estimate the ratio ? like 110vms running x amount of 'vmGetIoTunePolicy'
(In reply to Eldad Marciano from comment #2) > (In reply to Francesco Romani from comment #1) > > Happens also with vdsm 4.18.10 and 4.18.6 > > Can you elaborate please what in your box? > CPU's Memory etc.. The data I mentioned in description is a QE box with RHEL 7.2. It's running 32 Xeon(R) CPU E5-2630 v3 @ 2.40GHz with 64 GiB of RAM I'll attach one example xml > Can we estimate the ratio ? like 110vms running x amount of > 'vmGetIoTunePolicy' IIRC MOM calls vmGetIoTunePolicy every 15s, for each vm.
(In reply to Francesco Romani from comment #0) > Description of problem: This is way less clear than it should. Let me clarify a bit. > MoM 0.5.5 uses an excessive amountof calls to 'vmGetIoTunePolicy' to Vdsm, > causing high load on the Vdsm server. Considering one box with 110 active > VMs, we see > > 6329 calls in 15m19s (roughly 6.88 calls per second!) To get this number, this is what I did: on the testing host, count the number of calls, using # xzgrep 'RPC call vmGetIoTune' /var/log/vdsm/vdsm.log.3.xz > calls.txt excerpt of 'calls.txt' Thread-2936835::INFO::2016-08-17 11:01:05,885::bindingxmlrpc::1282::vds::(wrapper) RPC call vmGetIoTunePolicy finished (code=0) in 0.82 seconds Thread-2936836::INFO::2016-08-17 11:01:05,886::bindingxmlrpc::1282::vds::(wrapper) RPC call vmGetIoTunePolicy finished (code=0) in 0.82 seconds Thread-2936837::INFO::2016-08-17 11:01:05,890::bindingxmlrpc::1282::vds::(wrapper) RPC call vmGetIoTunePolicy finished (code=0) in 0.00 seconds This is logged by the RPC subsystem once every call ends. We don't care now about how a single call took, we just need the numbers of them. So we do: # wc -l calls.txt 12757 calls.txt Now in this run I have less calls. Let's see the time interval. The first call in this set # head -n 1 calls.txt Thread-2936835::INFO::2016-08-17 11:01:05,885::bindingxmlrpc::1282::vds::(wrapper) RPC call vmGetIoTunePolicy finished (code=0) in 0.82 seconds while the last call in the set # tail -n 1 calls.txt Thread-2950401::INFO::2016-08-17 12:01:00,921::bindingxmlrpc::1282::vds::(wrapper) RPC call vmGetIoTunePolicy finished (code=0) in 0.00 seconds We can now compute the timespan, considering begin time = 11:01:06 and end time = 12:01:01 -> this gives us 59m55s -> 3595s so 12757 calls in 3595s is 3.55calls/s assuming uniform rate, which is debatable, but I want to highlight the high number of calls to vmGetIoTunePolicy, which is easily understandable grepping the vdsm log. Furthermore, the following is a proof of the impact on Vdsm of those calls, but it is unrelated to the number of calls > Vdsm load with MOM running, reported by the Vdsm health monitor: > > Thread-129::DEBUG::2016-08-12 > 07:01:25,708::health::122::health::(_check_resources) user=27.78%, > sys=9.55%, rss=412200 kB (+272), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:02:25,885::health::122::health::(_check_resources) user=27.97%, > sys=9.00%, rss=412360 kB (+160), threads=50 > Thread-129::DEBUG::2016-08-12 > 07:03:26,065::health::122::health::(_check_resources) user=25.28%, > sys=8.43%, rss=410700 kB (-1660), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:04:26,238::health::122::health::(_check_resources) user=24.88%, > sys=8.08%, rss=410456 kB (-244), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:05:26,413::health::122::health::(_check_resources) user=25.60%, > sys=8.47%, rss=410116 kB (-340), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:06:26,591::health::122::health::(_check_resources) user=34.38%, > sys=11.73%, rss=410764 kB (+648), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:07:26,766::health::122::health::(_check_resources) user=27.57%, > sys=8.82%, rss=410748 kB (-16), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:08:26,947::health::122::health::(_check_resources) user=28.97%, > sys=9.28%, rss=411192 kB (+444), threads=48 > Thread-129::DEBUG::2016-08-12 > 07:09:27,134::health::122::health::(_check_resources) user=28.60%, > sys=9.37%, rss=411864 kB (+672), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:10:27,312::health::122::health::(_check_resources) user=30.82%, > sys=10.32%, rss=411840 kB (-24), threads=50 > > Vdsm load with MOM disabled: > Thread-129::DEBUG::2016-08-12 > 07:20:29,174::health::122::health::(_check_resources) user=6.30%, sys=2.82%, > rss=410392 kB (+40), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:21:29,364::health::122::health::(_check_resources) user=6.28%, sys=2.77%, > rss=411708 kB (+1316), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:22:29,542::health::122::health::(_check_resources) user=6.12%, sys=2.63%, > rss=410372 kB (-1336), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:23:29,743::health::122::health::(_check_resources) user=6.47%, sys=2.90%, > rss=410372 kB (+0), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:24:29,925::health::122::health::(_check_resources) user=6.23%, sys=2.78%, > rss=411692 kB (+1320), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:25:30,131::health::122::health::(_check_resources) user=6.05%, sys=2.80%, > rss=411804 kB (+112), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:26:30,315::health::122::health::(_check_resources) user=6.20%, sys=2.88%, > rss=411812 kB (+8), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:27:30,493::health::122::health::(_check_resources) user=6.07%, sys=2.68%, > rss=411796 kB (-16), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:28:30,671::health::122::health::(_check_resources) user=6.08%, sys=2.92%, > rss=410584 kB (-1212), threads=47 > Thread-129::DEBUG::2016-08-12 > 07:29:30,849::health::122::health::(_check_resources) user=6.33%, sys=2.88%, > rss=412068 kB (+1484), threads=47 I think it is better if I repharase this bug to highlight that MOM is having high performance impact on Vdsm (demonstrated by the health check above), most likely related to vmGetIoTunePolicy.
*** Bug 1337229 has been marked as a duplicate of this bug. ***
The MOM side is fixed, but there is no VDSM counterpart ready in 4.0.5. You need master branch vdsm and a config file change to test this or just sanity for 4.0.5 as the defaults there won't use anything from this new functionality by default.
verifying on top of 4.0.5.8 with 110 vms stil we have the same amount of calls -=>>cat /var/log/vdsm/vdsm.log |grep '2016-11-03 17:04' |grep 'vmGetIoTunePolicy finished' |wc -l |awk '{print $1 / 60}' 6.4
(In reply to Eldad Marciano from comment #8) > verifying on top of 4.0.5.8 > with 110 vms > stil we have the same amount of calls > > -=>>cat /var/log/vdsm/vdsm.log |grep '2016-11-03 17:04' |grep > 'vmGetIoTunePolicy finished' |wc -l |awk '{print $1 / 60}' > 6.4 also we have the same load Thread-13::DEBUG::2016-11-03 17:06:11,008::health::122::health::(_check_resources) user=38.03%, sys=10.05%, rss=359656 kB (+26744), threads=63
That sounds weird. Are you using the new interface: vdsmjsonrpcbulk?
(In reply to Jenny Tokar from comment #10) > That sounds weird. > Are you using the new interface: vdsmjsonrpcbulk? Yep, mom version mom-0.5.8-1.el7ev.noarch
(In reply to Eldad Marciano from comment #11) > (In reply to Jenny Tokar from comment #10) > > That sounds weird. > > Are you using the new interface: vdsmjsonrpcbulk? > > Yep, mom version mom-0.5.8-1.el7ev.noarch any further steps required ? we using default installation setup
Eldad: did you read comment #7 that explicitly states that the 4.0.5 VDSM does NOT use the new functionality?
(In reply to Martin Sivák from comment #13) > Eldad: did you read comment #7 that explicitly states that the 4.0.5 VDSM > does NOT use the new functionality? So could you change the target release ? since I verified i have the same version as the target release.
verified on : mom-0.5.8-1.el7ev.noarch vdsm-4.18.15-23.git9656d58.el7.centos.x86_64 load : Thread-15::DEBUG::2016-11-13 09:48:37,366::health::122::health::(_check_resources) user=6.92%, sys=3.65%, rss=347040 kB (-6340), threads=63 Thread-15::DEBUG::2016-11-13 09:49:37,367::health::90::health::(_check) Checking health Thread-15::DEBUG::2016-11-13 09:49:37,660::health::97::health::(_check_garbage) Collected 424 objects Thread-15::DEBUG::2016-11-13 09:49:37,663::health::122::health::(_check_resources) user=6.98%, sys=3.62%, rss=347096 kB (+56), threads=63 policies Calls: -=>>cat /var/log/vdsm/vdsm.log | grep 'getAllVmIoTunePolicies succeeded' | grep "2016-11-13 10:21" | wc -l 4