Bug 1252055
Summary: | Host installation fails or host activation fails with NPE if numaNodeDistance is null | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gordon Watson <gwatson> | ||||
Component: | ovirt-engine | Assignee: | Martin Sivák <msivak> | ||||
Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.5.0 | CC: | alukiano, amureini, bcholler, dfediuck, eedri, gklein, gwatson, istein, lpeer, lsurette, mavital, melewis, mgoldboi, mwest, rbalakri, Rhev-m-bugs, yeylon, ykaul | ||||
Target Milestone: | ovirt-3.6.2 | Keywords: | Triaged, ZStream | ||||
Target Release: | 3.6.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously, NUMA distances were not always reported properly and this cause the engine to crash with NPE during host installation or activation. Now, the engine invents virtual NUMA distances when none are provided by the hardware so that the engine does not crash and the host can be activated.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1284245 (view as bug list) | Environment: | |||||
Last Closed: | 2016-03-09 21:11:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1284245 | ||||||
Attachments: |
|
Description
Gordon Watson
2015-08-10 14:54:51 UTC
Just curiosity, can you please provide output of numactl -H, because I really not understand how can be distance equal to zero. Thanks Artyom, Here's the 'numactl' info from the customer's system. Regards, GFW. # numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 32759 MB node 0 free: 29840 MB No distance information available. cat /sys/devices/system/node/node0/distance - I assume the info is extracted from there. So its probably missing. A missing distance seems like and I wonder if should handle it, and by that hide the problem (maybe the OS isn't supported in that case?) Gordon - Is this OS is certified/supported under those conditions? Is this machine is production ready? marking as an exception, would be nice to have if possible to beta. didn't make it, moving to 3.6.1 Created attachment 1100173 [details]
engine log
Checked on rhevm-3.6.1-0.2.el6.noarch
1) Edit /usr/share/vdsm/caps.py
caps['numaNodeDistance'] = {}
2) vdsClient -s 0 getVdsCaps | grep numaNodeDistance
numaNodeDistance = {}
3) Add host to engine
Action failed with error message:
Failed to configure management network: Failed to configure management network on host alma05.qa.lab.tlv.redhat.com due to setup networks failure.
Artyom this is a different issue. Why did you fail it? Ok the source of the confusion is that we fixed a case of where the distances value as null and not {} . One of the comments above stated that. So just need an update to handle empty dictionary. RHEV 3.6.0 BETA2 is out, any open bugs are moved to the BETA3 milestone. Verified on rhevm-3.6.2-0.1.el6.noarch 1) Change caps['numaNodeDistance'] = {} 2) restart vdsmd 3) Install host to engine Engine succeed to deploy host without any errors and numa distance equal to 0 under engine. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html It is some corner case, that happened because hardware error, so I prefer do not add it to QE coverage. |