Bug 1852315
| Summary: | cpu_topology.online_cpus returns integers instead of string causing host failures [RHV clone - 4.3.11] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
| Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> |
| Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4.0 | CC: | ahadas, bugs, lsurette, michal.skrivanek, mzamazal, pelauter, rdlugyhe, srevivo, ycui |
| Target Milestone: | ovirt-4.3.11 | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rhv-4.3.11-2 | Doc Type: | Bug Fix |
| Doc Text: |
Host capabilities retrieval was failing for certain non-NUMA CPU topologies. That has been fixed and host capabilities should be correctly reported now for those topologies.
|
Story Points: | --- |
| Clone Of: | 1834873 | Environment: | |
| Last Closed: | 2020-09-30 10:09:52 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1834873 | ||
| Bug Blocks: | |||
|
Description
RHV bug bot
2020-06-30 06:23:12 UTC
IRC transcription:
(14:59:44) IEF: I'm wondering if anyone can help me out
(15:00:09) IEF: I'm attempting to install oVirt 4.4 RC. natively installing ovirt-engine
(15:00:38) IEF: engine-setup ran, trying to add the host to the engine now
(15:00:44) IEF: gets stuck on 'Unassigned'
(15:02:49) IEF: VDSM Soleus03 command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': 'Attempt to call function: <bound method Global.getCapabilities of <vdsm.API.Global object at 0x7f203473ee80>> with arguments: () error: sequence item 0: expected str instance, int found'}
(15:02:49) IEF: 5/12/203:01:50 PM
(15:03:05) IEF: doesn't seem right :)
(15:21:39) sbonazzo: IEF hi, thanks for testing 4.4 RC
(15:22:02) sbonazzo: asocha: ^^^ can you have a look at above message?
(15:22:22) IEF`: it looks like a bug when attempting to enumerate the number of online CPUs.
(15:22:49) sbonazzo: IEF can you share vdsm.log?
(15:25:33) IEF: need the full vdsm.log?
(15:25:36) IEF: https://pastebin.com/tsrcat8x
(15:25:49) IEF: it keeps giving out the same exceptions
(15:30:18) IEF: this is the output from virsh capabilities in case you're wondering:
(15:30:20) IEF: <cpu>
(15:30:20) IEF: <arch>x86_64</arch>
(15:30:20) IEF: <model>EPYC-IBPB</model>
(15:30:20) IEF: <vendor>AMD</vendor>
(15:30:20) IEF: <microcode version='137367604'/>
(15:30:22) IEF: <counter name='tsc' frequency='1996250000'/>
(15:30:24) IEF: <topology sockets='1' cores='64' threads='1'/>
(15:47:58) asocha: sbonazzo, IEF I don't even pretend to know anythin about this code but I found some quite old comments about numa.cpu_topology() [1] that is used by the line throwing an error [2]
(15:48:00) asocha: [1] https://github.com/oVirt/vdsm/blob/2f56f70105ff4188d39ea75e6995a17ab7e4a054/lib/vdsm/numa.py#L73
(15:48:13) asocha: [2] https://github.com/oVirt/vdsm/blob/00a7be2419be231082eb78b2bea482e1c5971d0d/lib/vdsm/host/caps.py#L91
(16:40:01) IEF: in any case. I literally worked around it by patching caps.py
(16:40:16) IEF: hardcoded my online CPU string
(16:40:18) IEF: caps['onlineCpus'] = '0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63'
(16:40:27) IEF: Host actually comes up now.
(Originally by Sandro Bonazzola)
pastebin link from comment #1 added in comment #0 (Originally by Sandro Bonazzola) works with "caps['onlineCpus'] = ','.join(list(map(str,cpu_topology.online_cpus))) ". Milan? (Originally by michal.skrivanek) There are different types returned by different CPU info retrieval functions, depending on whether libvirt reports NUMA cells or not. This is not a new bug in 4.4. All my machines report NUMA cells, whether they have NUMA or not, so the non-NUMA case with the type error is perhaps not that frequent and was missed in previous testing. Fix posted. (Originally by Milan Zamazal) if not new - any chance this is relevant on RHEL 7 as well? If yes please backport to 4.3 as well, just in case. (Originally by michal.skrivanek) Yes, it fails on 4.3 too. I'll backport the patch. (Originally by Milan Zamazal) Verified on - ovirt-engine-4.3.11-0.1.el7.noarch libvirt-4.5.0-36.el7.x86_64 /usr/lib/python2.7/site-packages/vdsm/host/caps.py caps['onlineCpus'] = ','.join([str(cpu_id) for cpu_id in cpu_topology.online_cpus]) The current libvirt version always synthesize a single NUMA cell even if the host doesn't report any NUMA topology, Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Virtualization RHEL Host (ovirt-host) 4.3.11), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4113 |