Bug 1247058

Summary: getVdsStas crashed if host have numa node with 0 memory
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: vdsmAssignee: Roman Mohr <rmohr>
Status: CLOSED ERRATA QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: bazulay, dfediuck, gklein, lpeer, lsurette, mavital, mgoldboi, rgolan, rmohr, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Triaged
Target Release: 3.6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
NUMA nodes can exist without memory (for example, when hotswapping memory modules). This was not considered in VDSM, causing the statistics reporting mechanism (getVdsStats) to break. Now, this error has been fixed by explicitly checking for NUMA nodes with zero memory, and returning a memory usage of 100%.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:42:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artyom 2015-07-27 09:00:56 UTC
Description of problem:
getVdsStas crashed if host have numa node with 0 memory, in result we have host in 'Non operational' stat.
Traceback from vdsm log:
Thread-12::ERROR::2015-07-27 04:38:06,448::sampling::599::vds::(run) Error while sampling stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 585, in run
    sample = HostSample(self._pid)
  File "/usr/share/vdsm/virt/sampling.py", line 288, in __init__
    self.numaNodeMem = NumaNodeMemorySample()
  File "/usr/share/vdsm/virt/sampling.py", line 183, in __init__
    int(100.0 * int(memInfo['free']) / int(memInfo['total']))
ZeroDivisionError: float division by zero


Version-Release number of selected component (if applicable):
vdsm-4.17.0.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add host with numa node that have 0 memory to engine
2. Wait a few minutes
3.

Actual results:
Host dropped to 'Non Operation' state

Expected results:
Host must stay in state UP

Additional info:
Output of numactl -H
# numactl -H
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 58477 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 64237 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node   0   1  16  17 
  0:  10  20  40  40 
  1:  20  10  40  40 
 16:  40  40  10  20 
 17:  40  40  20  10 
I believe it not common case, that you have numa node with zero memory(looks like we have cpu on socket 1 and 17 but memory slots for this sockets empty), so I not put this bug to urgent, but it possible situation for NUMA architecture, so we need to add check if int(memInfo['total'] != 0 before divide on it.
Another question what we gonna do under engine, when we have such node, by my opinion we should show this node, but if user will try to pin VNUMA node to such  PNUMA node under strict mode we must block it with appropriate error message

Comment 1 Roy Golan 2015-07-30 06:33:08 UTC
I guess this issue could be serious if a memory card needs maintenance and is pooled out(or not placed beforhand)

No on hit this yet but when they will we have no workaround. 

Engine should cope with 0 and will prevent pinning vms on this node.



The fix is rather small so we should consider 3.5.z

Comment 2 Doron Fediuck 2015-07-30 07:28:53 UTC
Moran,
what's your view on adding this to 3.5.z?

Comment 3 Moran Goldboim 2015-08-13 06:09:59 UTC
(In reply to Doron Fediuck from comment #2)
> Moran,
> what's your view on adding this to 3.5.z?

small fix, which adds to stability is a valid zstream.
but what would be the functional implications of using numa node without memory?

Comment 4 Doron Fediuck 2015-08-16 11:56:36 UTC
(In reply to Moran Goldboim from comment #3)
> (In reply to Doron Fediuck from comment #2)
> > Moran,
> > what's your view on adding this to 3.5.z?
> 
> small fix, which adds to stability is a valid zstream.
> but what would be the functional implications of using numa node without
> memory?

This is rare and probably a result of malfunction.
This is why I'm not keen to add it to .z unless there's a specific
request for it.

Comment 5 Artyom 2015-09-24 08:25:59 UTC
Verified on vdsm-4.17.8-1.el7ev.noarch

Comment 8 errata-xmlrpc 2016-03-09 19:42:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html