Bug 1834873
Summary: | cpu_topology.online_cpus returns integers instead of string causing host failures | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Sandro Bonazzola <sbonazzo> | |
Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> | |
Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.4.0 | CC: | bugs, lsurette, michal.skrivanek, mzamazal, pelauter, rdlugyhe, srevivo, ycui | |
Target Milestone: | ovirt-4.4.1 | Keywords: | ZStream | |
Target Release: | 4.3.11 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | vdsm-4.30.47 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, retrieving host capabilities failed for specific non-NUMA CPU topologies. The current release fixes this issue and correctly reports the host capabilities for those topologies.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1852315 (view as bug list) | Environment: | ||
Last Closed: | 2020-08-04 13:27:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1852315 |
Description
Sandro Bonazzola
2020-05-12 15:01:15 UTC
IRC transcription: (14:59:44) IEF: I'm wondering if anyone can help me out (15:00:09) IEF: I'm attempting to install oVirt 4.4 RC. natively installing ovirt-engine (15:00:38) IEF: engine-setup ran, trying to add the host to the engine now (15:00:44) IEF: gets stuck on 'Unassigned' (15:02:49) IEF: VDSM Soleus03 command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': 'Attempt to call function: <bound method Global.getCapabilities of <vdsm.API.Global object at 0x7f203473ee80>> with arguments: () error: sequence item 0: expected str instance, int found'} (15:02:49) IEF: 5/12/203:01:50 PM (15:03:05) IEF: doesn't seem right :) (15:21:39) sbonazzo: IEF hi, thanks for testing 4.4 RC (15:22:02) sbonazzo: asocha: ^^^ can you have a look at above message? (15:22:22) IEF`: it looks like a bug when attempting to enumerate the number of online CPUs. (15:22:49) sbonazzo: IEF can you share vdsm.log? (15:25:33) IEF: need the full vdsm.log? (15:25:36) IEF: https://pastebin.com/tsrcat8x (15:25:49) IEF: it keeps giving out the same exceptions (15:30:18) IEF: this is the output from virsh capabilities in case you're wondering: (15:30:20) IEF: <cpu> (15:30:20) IEF: <arch>x86_64</arch> (15:30:20) IEF: <model>EPYC-IBPB</model> (15:30:20) IEF: <vendor>AMD</vendor> (15:30:20) IEF: <microcode version='137367604'/> (15:30:22) IEF: <counter name='tsc' frequency='1996250000'/> (15:30:24) IEF: <topology sockets='1' cores='64' threads='1'/> (15:47:58) asocha: sbonazzo, IEF I don't even pretend to know anythin about this code but I found some quite old comments about numa.cpu_topology() [1] that is used by the line throwing an error [2] (15:48:00) asocha: [1] https://github.com/oVirt/vdsm/blob/2f56f70105ff4188d39ea75e6995a17ab7e4a054/lib/vdsm/numa.py#L73 (15:48:13) asocha: [2] https://github.com/oVirt/vdsm/blob/00a7be2419be231082eb78b2bea482e1c5971d0d/lib/vdsm/host/caps.py#L91 (16:40:01) IEF: in any case. I literally worked around it by patching caps.py (16:40:16) IEF: hardcoded my online CPU string (16:40:18) IEF: caps['onlineCpus'] = '0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63' (16:40:27) IEF: Host actually comes up now. pastebin link from comment #1 added in comment #0 works with "caps['onlineCpus'] = ','.join(list(map(str,cpu_topology.online_cpus))) ". Milan? There are different types returned by different CPU info retrieval functions, depending on whether libvirt reports NUMA cells or not. This is not a new bug in 4.4. All my machines report NUMA cells, whether they have NUMA or not, so the non-NUMA case with the type error is perhaps not that frequent and was missed in previous testing. Fix posted. if not new - any chance this is relevant on RHEL 7 as well? If yes please backport to 4.3 as well, just in case. Yes, it fails on 4.3 too. I'll backport the patch. verified on ovirt-engine-4.4.1.2-0.10.el8ev.noarch libvirt-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64 /usr/lib/python3.6/site-packages/vdsm/host/caps.py on host. caps['onlineCpus'] = ','.join([str(cpu_id) for cpu_id in cpu_topology.online_cpus]) the current libvirt version always synthesize a single NUMA cell even if the host doesn't report any NUMA topology, Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |