Bug 1247058 - getVdsStas crashed if host have numa node with 0 memory
Summary: getVdsStas crashed if host have numa node with 0 memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.0
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Roman Mohr
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-27 09:00 UTC by Artyom
Modified: 2016-03-09 19:42 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
NUMA nodes can exist without memory (for example, when hotswapping memory modules). This was not considered in VDSM, causing the statistics reporting mechanism (getVdsStats) to break. Now, this error has been fixed by explicitly checking for NUMA nodes with zero memory, and returning a memory usage of 100%.
Clone Of:
Environment:
Last Closed: 2016-03-09 19:42:57 UTC
oVirt Team: SLA
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 44121 master MERGED sampling: Handle numa nodes with zero memory assigned Never
oVirt gerrit 44550 ovirt-3.6 MERGED sampling: Handle numa nodes with zero memory assigned Never

Description Artyom 2015-07-27 09:00:56 UTC
Description of problem:
getVdsStas crashed if host have numa node with 0 memory, in result we have host in 'Non operational' stat.
Traceback from vdsm log:
Thread-12::ERROR::2015-07-27 04:38:06,448::sampling::599::vds::(run) Error while sampling stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 585, in run
    sample = HostSample(self._pid)
  File "/usr/share/vdsm/virt/sampling.py", line 288, in __init__
    self.numaNodeMem = NumaNodeMemorySample()
  File "/usr/share/vdsm/virt/sampling.py", line 183, in __init__
    int(100.0 * int(memInfo['free']) / int(memInfo['total']))
ZeroDivisionError: float division by zero


Version-Release number of selected component (if applicable):
vdsm-4.17.0.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add host with numa node that have 0 memory to engine
2. Wait a few minutes
3.

Actual results:
Host dropped to 'Non Operation' state

Expected results:
Host must stay in state UP

Additional info:
Output of numactl -H
# numactl -H
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 58477 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 64237 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node   0   1  16  17 
  0:  10  20  40  40 
  1:  20  10  40  40 
 16:  40  40  10  20 
 17:  40  40  20  10 
I believe it not common case, that you have numa node with zero memory(looks like we have cpu on socket 1 and 17 but memory slots for this sockets empty), so I not put this bug to urgent, but it possible situation for NUMA architecture, so we need to add check if int(memInfo['total'] != 0 before divide on it.
Another question what we gonna do under engine, when we have such node, by my opinion we should show this node, but if user will try to pin VNUMA node to such  PNUMA node under strict mode we must block it with appropriate error message

Comment 1 Roy Golan 2015-07-30 06:33:08 UTC
I guess this issue could be serious if a memory card needs maintenance and is pooled out(or not placed beforhand)

No on hit this yet but when they will we have no workaround. 

Engine should cope with 0 and will prevent pinning vms on this node.



The fix is rather small so we should consider 3.5.z

Comment 2 Doron Fediuck 2015-07-30 07:28:53 UTC
Moran,
what's your view on adding this to 3.5.z?

Comment 3 Moran Goldboim 2015-08-13 06:09:59 UTC
(In reply to Doron Fediuck from comment #2)
> Moran,
> what's your view on adding this to 3.5.z?

small fix, which adds to stability is a valid zstream.
but what would be the functional implications of using numa node without memory?

Comment 4 Doron Fediuck 2015-08-16 11:56:36 UTC
(In reply to Moran Goldboim from comment #3)
> (In reply to Doron Fediuck from comment #2)
> > Moran,
> > what's your view on adding this to 3.5.z?
> 
> small fix, which adds to stability is a valid zstream.
> but what would be the functional implications of using numa node without
> memory?

This is rare and probably a result of malfunction.
This is why I'm not keen to add it to .z unless there's a specific
request for it.

Comment 5 Artyom 2015-09-24 08:25:59 UTC
Verified on vdsm-4.17.8-1.el7ev.noarch

Comment 8 errata-xmlrpc 2016-03-09 19:42:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.