Description of problem: getVdsStas crashed if host have numa node with 0 memory, in result we have host in 'Non operational' stat. Traceback from vdsm log: Thread-12::ERROR::2015-07-27 04:38:06,448::sampling::599::vds::(run) Error while sampling stats Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 585, in run sample = HostSample(self._pid) File "/usr/share/vdsm/virt/sampling.py", line 288, in __init__ self.numaNodeMem = NumaNodeMemorySample() File "/usr/share/vdsm/virt/sampling.py", line 183, in __init__ int(100.0 * int(memInfo['free']) / int(memInfo['total'])) ZeroDivisionError: float division by zero Version-Release number of selected component (if applicable): vdsm-4.17.0.8-1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Add host with numa node that have 0 memory to engine 2. Wait a few minutes 3. Actual results: Host dropped to 'Non Operation' state Expected results: Host must stay in state UP Additional info: Output of numactl -H # numactl -H available: 4 nodes (0-1,16-17) node 0 cpus: 0 8 16 24 32 node 0 size: 65536 MB node 0 free: 58477 MB node 1 cpus: 40 48 56 64 72 node 1 size: 0 MB node 1 free: 0 MB node 16 cpus: 80 88 96 104 112 node 16 size: 65536 MB node 16 free: 64237 MB node 17 cpus: 120 128 136 144 152 node 17 size: 0 MB node 17 free: 0 MB node distances: node 0 1 16 17 0: 10 20 40 40 1: 20 10 40 40 16: 40 40 10 20 17: 40 40 20 10 I believe it not common case, that you have numa node with zero memory(looks like we have cpu on socket 1 and 17 but memory slots for this sockets empty), so I not put this bug to urgent, but it possible situation for NUMA architecture, so we need to add check if int(memInfo['total'] != 0 before divide on it. Another question what we gonna do under engine, when we have such node, by my opinion we should show this node, but if user will try to pin VNUMA node to such PNUMA node under strict mode we must block it with appropriate error message
I guess this issue could be serious if a memory card needs maintenance and is pooled out(or not placed beforhand) No on hit this yet but when they will we have no workaround. Engine should cope with 0 and will prevent pinning vms on this node. The fix is rather small so we should consider 3.5.z
Moran, what's your view on adding this to 3.5.z?
(In reply to Doron Fediuck from comment #2) > Moran, > what's your view on adding this to 3.5.z? small fix, which adds to stability is a valid zstream. but what would be the functional implications of using numa node without memory?
(In reply to Moran Goldboim from comment #3) > (In reply to Doron Fediuck from comment #2) > > Moran, > > what's your view on adding this to 3.5.z? > > small fix, which adds to stability is a valid zstream. > but what would be the functional implications of using numa node without > memory? This is rare and probably a result of malfunction. This is why I'm not keen to add it to .z unless there's a specific request for it.
Verified on vdsm-4.17.8-1.el7ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html