Bug 1247058 - getVdsStas crashed if host have numa node with 0 memory
getVdsStas crashed if host have numa node with 0 memory
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.6.0
All Linux
high Severity high
: ovirt-3.6.0-rc
: 3.6.0
Assigned To: Roman Mohr
Artyom
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-27 05:00 EDT by Artyom
Modified: 2016-03-09 14:42 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
NUMA nodes can exist without memory (for example, when hotswapping memory modules). This was not considered in VDSM, causing the statistics reporting mechanism (getVdsStats) to break. Now, this error has been fixed by explicitly checking for NUMA nodes with zero memory, and returning a memory usage of 100%.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-09 14:42:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 44121 master MERGED sampling: Handle numa nodes with zero memory assigned Never
oVirt gerrit 44550 ovirt-3.6 MERGED sampling: Handle numa nodes with zero memory assigned Never

  None (edit)
Description Artyom 2015-07-27 05:00:56 EDT
Description of problem:
getVdsStas crashed if host have numa node with 0 memory, in result we have host in 'Non operational' stat.
Traceback from vdsm log:
Thread-12::ERROR::2015-07-27 04:38:06,448::sampling::599::vds::(run) Error while sampling stats
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 585, in run
    sample = HostSample(self._pid)
  File "/usr/share/vdsm/virt/sampling.py", line 288, in __init__
    self.numaNodeMem = NumaNodeMemorySample()
  File "/usr/share/vdsm/virt/sampling.py", line 183, in __init__
    int(100.0 * int(memInfo['free']) / int(memInfo['total']))
ZeroDivisionError: float division by zero


Version-Release number of selected component (if applicable):
vdsm-4.17.0.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Add host with numa node that have 0 memory to engine
2. Wait a few minutes
3.

Actual results:
Host dropped to 'Non Operation' state

Expected results:
Host must stay in state UP

Additional info:
Output of numactl -H
# numactl -H
available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 65536 MB
node 0 free: 58477 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 65536 MB
node 16 free: 64237 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node   0   1  16  17 
  0:  10  20  40  40 
  1:  20  10  40  40 
 16:  40  40  10  20 
 17:  40  40  20  10 
I believe it not common case, that you have numa node with zero memory(looks like we have cpu on socket 1 and 17 but memory slots for this sockets empty), so I not put this bug to urgent, but it possible situation for NUMA architecture, so we need to add check if int(memInfo['total'] != 0 before divide on it.
Another question what we gonna do under engine, when we have such node, by my opinion we should show this node, but if user will try to pin VNUMA node to such  PNUMA node under strict mode we must block it with appropriate error message
Comment 1 Roy Golan 2015-07-30 02:33:08 EDT
I guess this issue could be serious if a memory card needs maintenance and is pooled out(or not placed beforhand)

No on hit this yet but when they will we have no workaround. 

Engine should cope with 0 and will prevent pinning vms on this node.



The fix is rather small so we should consider 3.5.z
Comment 2 Doron Fediuck 2015-07-30 03:28:53 EDT
Moran,
what's your view on adding this to 3.5.z?
Comment 3 Moran Goldboim 2015-08-13 02:09:59 EDT
(In reply to Doron Fediuck from comment #2)
> Moran,
> what's your view on adding this to 3.5.z?

small fix, which adds to stability is a valid zstream.
but what would be the functional implications of using numa node without memory?
Comment 4 Doron Fediuck 2015-08-16 07:56:36 EDT
(In reply to Moran Goldboim from comment #3)
> (In reply to Doron Fediuck from comment #2)
> > Moran,
> > what's your view on adding this to 3.5.z?
> 
> small fix, which adds to stability is a valid zstream.
> but what would be the functional implications of using numa node without
> memory?

This is rare and probably a result of malfunction.
This is why I'm not keen to add it to .z unless there's a specific
request for it.
Comment 5 Artyom 2015-09-24 04:25:59 EDT
Verified on vdsm-4.17.8-1.el7ev.noarch
Comment 8 errata-xmlrpc 2016-03-09 14:42:57 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html

Note You need to log in before you can comment on or make changes to this bug.