Description of problem:
getVdsStas crashed if host have numa node with 0 memory, in result we have host in 'Non operational' stat.
Traceback from vdsm log:
Thread-12::ERROR::2015-07-27 04:38:06,448::sampling::599::vds::(run) Error while sampling stats
Traceback (most recent call last):
File "/usr/share/vdsm/virt/sampling.py", line 585, in run
sample = HostSample(self._pid)
File "/usr/share/vdsm/virt/sampling.py", line 288, in __init__
self.numaNodeMem = NumaNodeMemorySample()
File "/usr/share/vdsm/virt/sampling.py", line 183, in __init__
int(100.0 * int(memInfo['free']) / int(memInfo['total']))
ZeroDivisionError: float division by zero
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Add host with numa node that have 0 memory to engine
2. Wait a few minutes
Host dropped to 'Non Operation' state
Host must stay in state UP
Output of numactl -H
# numactl -H
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 58477 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 64237 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node 0 1 16 17
0: 10 20 40 40
1: 20 10 40 40
16: 40 40 10 20
17: 40 40 20 10
I believe it not common case, that you have numa node with zero memory(looks like we have cpu on socket 1 and 17 but memory slots for this sockets empty), so I not put this bug to urgent, but it possible situation for NUMA architecture, so we need to add check if int(memInfo['total'] != 0 before divide on it.
Another question what we gonna do under engine, when we have such node, by my opinion we should show this node, but if user will try to pin VNUMA node to such PNUMA node under strict mode we must block it with appropriate error message
I guess this issue could be serious if a memory card needs maintenance and is pooled out(or not placed beforhand)
No on hit this yet but when they will we have no workaround.
Engine should cope with 0 and will prevent pinning vms on this node.
The fix is rather small so we should consider 3.5.z
what's your view on adding this to 3.5.z?
(In reply to Doron Fediuck from comment #2)
> what's your view on adding this to 3.5.z?
small fix, which adds to stability is a valid zstream.
but what would be the functional implications of using numa node without memory?
(In reply to Moran Goldboim from comment #3)
> (In reply to Doron Fediuck from comment #2)
> > Moran,
> > what's your view on adding this to 3.5.z?
> small fix, which adds to stability is a valid zstream.
> but what would be the functional implications of using numa node without
This is rare and probably a result of malfunction.
This is why I'm not keen to add it to .z unless there's a specific
request for it.
Verified on vdsm-4.17.8-1.el7ev.noarch
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.