Created attachment 933262 [details] Screenshoot Description of problem: Wrong warning messages appears in the event log every 10-15 minutes, about available swap memory of host [1023MB]. When the actual swap memory size of host is [1024MB] cat /proc/meminfo SwapTotal: 1048568 kB (1023.99219 MB) SwapFree: 1048568 kB (1023.99219 MB) This warnings started to appear in the last build. Version-Release number of selected component (if applicable): oVirt Engine Version: 3.5.0-0.0.master.20140821064931.gitb794d66.el6 How reproducible: always Steps to Reproduce: 1. Working setup with host installed 2. 3. Actual results: Wrong warnings every 10-15 minutes in the event log about available swap memory of host. Expected results: Not see such warning messages in the event log. Additional info:
Eli - we should just round up values from VDSM before doing the comparison.
Seems like the truncation logic resides in VDSM. It should round it up instead of truncating. Moving it to VDSM.
can you check in host side the output of vdsClient -s 0 getVdsStats | grep -i swap im quite sure this engine's side thing. nothing was changed in vdsm really long time in that area (utils.py: readMemInfo) and it seems to work fine and in engine's side it was modified not so long ago - http://gerrit.ovirt.org/#/c/10865/6/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/VdsUpdateRunTimeInfo.java,cm except that - you see that the message in audit log is: VDS_LOW_SWAP=Available swap memory of host ${HostName} [${AvailableSwapMemory} MB] is under defined threshold [${Threshold} MB]. and the values you attached are right - free 1023, available 1024 . so it must be wrong in engine's side. please attach also the values you have for LogPhysicalMemoryThresholdInMB LogMaxPhysicalMemoryUsedThresholdInPercentage in your vdc_options table. still want me to handle it?
I had tried in one of my hosts : vdsClient -s 0 getVdsStats | grep -i swap swapFree = 15999 swapTotal = 15999 [root@pluto-vdsa ~]# cat /proc/meminfo |grep -i swap SwapTotal: 16383992 kB SwapFree: 16383992 kB The value is in KB , so lets turn it to MB : 16383992/1024 = 15999.99609375 so IMO VDSM should report 16000 if it rounds the value , but since it reports 15999 it is clear that the value is truncated by VDSM....
vdsClient -s 0 getVdsStats | grep -i swap swapFree = 1023 swapTotal = 1023 cat /proc/meminfo |grep -i swap SwapCached: 0 kB SwapTotal: 1048572 kB (1023.99609 MB) SwapFree: 1048572 kB (1023.99609 MB) Should report 1024MB if round the value. It reports 1023, the value is truncated.
oh, that's I can check and fix. but according to the description this is not the bug. the bug is the engine's reports on law swap space which is wrong iiuc, no?
Hi, I'm not sure where is the value truncated is done, not sure if that's engine side or the vdsm, but the bug is that the event log shows wrong warning messages about available swap memory every 10-15 minutes.
I'm not sure if rounding the value up is the right behavior. although I post a patch and lets see what others say
Why is this a bug? The host has less than 1024MiB free memory, so you get a warning. The only question is why this behavior is new - Vdsm did not change anything there.
(In reply to Dan Kenigsberg from comment #9) > Why is this a bug? The host has less than 1024MiB free memory, so you get a > warning. > > The only question is why this behavior is new - Vdsm did not change anything > there. IMO we should be a bit more permissive in such cases. It is true that we can change the test in the engine side to follow some threshold, however I still think that rounding up sounds right in this case.
Hi dan, 1. This is a new behavior from the last builds, upstream and downstream, so it means that something has changed. 2. The host has: SwapTotal: 1048572 kB (1023.99609 MB) SwapFree: 1048572 kB (1023.99609 MB) It is closer to 1024MB, then to 1023MB. The available memory swap on hosts hasn't changed 3. This warnings displayed in the event log every 10-15 minutes and i don't think that as administrator you would love to see the event log full with this warnings every 10 minutes.
Barak, Dan - please give your final thoughts if it is a bug or not. If not, does the event logging behavior reasonable or not (you can always set different threshold)
Michael, if you install an older Vdsm on your host, what would it report? I'm sure that just like in 3.5.0, it would report 1023MiB too. The new behavior is not on Vdsm side. Could it be that the recent change is in the guest? Which kernel is running there? Did it change it changes its swap accounting? Oved, if you manage 3.4 hosts with rhevm-3.5.0, you would still see this annoying error. Bottom line: you should either use hosts with bigger swap space, or lower the the Engine threshold to 1023. It should be fixed on Engine or not fixed at all.
- With older vdsm it's the same, report the same in the event log. vdsm-4.14.13-2.el6ev.x86_64 - Like i said, it started in the last 2 builds(upstream+downstream) - kernel 2.6.32-431.el6.x86_64
Started in the last 2 builds- moving back to regression
Could you see if an older guest kernel changes things?
2.6.32-431.el6.x86_64 2.6.32-431.23.3.el6.x86_64
I'll try to rephrase my question: does a much older guest kernels report different swap usage? I'm trying to understand what is the change that triggered the annoying reports that you see.
Much older kernel is kernel-2.6.32-358.el6.x86_64 and it's for rhel6.4 Just because you asked, i did a test and installed rhel6.4 with this kernel in my setup(vt3.1). As it seems for now, with this host i doesn't get this annoying messages. But with kernel 2.6.32-431.el6.x86_64 and above for (rhel6.5) i get this messages. And also with kernel 3.10.0-123.el7 for rhel7 i get this messages.
we'll handle that on the engine side.
oVirt 3.5.1 has been released and since this bug is targeted 3.5.1 and in modified state, it should be included in this release. Please re-target and move nack to modified if this assumption is not valid for this bug.