Bug 1852315 - cpu_topology.online_cpus returns integers instead of string causing host failures [RHV clone - 4.3.11]
Summary: cpu_topology.online_cpus returns integers instead of string causing host fail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.3.11
: ---
Assignee: Milan Zamazal
QA Contact: Polina
URL:
Whiteboard:
Depends On: 1834873
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-30 06:23 UTC by RHV bug bot
Modified: 2020-09-30 10:09 UTC (History)
9 users (show)

Fixed In Version: rhv-4.3.11-2
Doc Type: Bug Fix
Doc Text:
Host capabilities retrieval was failing for certain non-NUMA CPU topologies. That has been fixed and host capabilities should be correctly reported now for those topologies.
Clone Of: 1834873
Environment:
Last Closed: 2020-09-30 10:09:52 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 108993 0 master MERGED numa: Always use int values for online cpu ids 2021-01-14 11:22:02 UTC
oVirt gerrit 109011 0 ovirt-4.3 MERGED numa: Always use int values for online cpu ids 2021-01-14 11:22:02 UTC

Description RHV bug bot 2020-06-30 06:23:12 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1834873 +++
======================================================================

Initially reported on #ovirt IRC channel by IEF user

2020-05-12 15:24:06,551+0200 ERROR (jsonrpc/6) [jsonrpc.JsonRpcServer] Internal server error (__init__:350)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/rpc/Bridge.py", line 198, in _dynamicMethod
    result = fn(*methodArgs)
  File "<decorator-gen-430>", line 2, in getCapabilities
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 1341, in getCapabilities
    c = caps.get()
  File "/usr/lib/python3.6/site-packages/vdsm/host/caps.py", line 91, in get
    caps['onlineCpus'] = ','.join(cpu_topology.online_cpus)
TypeError: sequence item 0: expected str instance, int found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
    res = method(**params)
  File "/usr/lib/python3.6/site-packages/vdsm/rpc/Bridge.py", line 201, in _dynamicMethod
    raise InvalidCall(fn, methodArgs, e)
vdsm.rpc.Bridge.InvalidCall: Attempt to call function: <bound method Global.getCapabilities of <vdsm.API.Global object at 0x7f20346aa8d0>> with arguments: () error: sequence item 0: expected str instance, int found
2020-05-12 15:24:06,551+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getCapabilities failed (error -32603) in 0.00 seconds (__init__:312)
2020-05-12 15:24:09,552+0200 INFO  (jsonrpc/7) [api.host] START getCapabilities() from=::ffff:94.142.246.3,48852 (api:48)
2020-05-12 15:24:09,552+0200 INFO  (jsonrpc/7) [api.host] FINISH getCapabilities error=sequence item 0: expected str instance, int found from=::ffff:94.142.246.3,48852 (api:52)
2020-05-12 15:24:09,552+0200 ERROR (jsonrpc/7) [DynamicBridge] TypeError raised by dispatched function (Bridge:200)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/rpc/Bridge.py", line 198, in _dynamicMethod
    result = fn(*methodArgs)
  File "<decorator-gen-430>", line 2, in getCapabilities
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 1341, in getCapabilities
    c = caps.get()
  File "/usr/lib/python3.6/site-packages/vdsm/host/caps.py", line 91, in get
    caps['onlineCpus'] = ','.join(cpu_topology.online_cpus)
TypeError: sequence item 0: expected str instance, int found

(Originally by Sandro Bonazzola)

Comment 1 RHV bug bot 2020-06-30 06:23:15 UTC
IRC transcription:

(14:59:44) IEF: I'm wondering if anyone can help me out
(15:00:09) IEF: I'm attempting to install oVirt 4.4 RC. natively installing ovirt-engine
(15:00:38) IEF: engine-setup ran, trying to add the host to the engine now
(15:00:44) IEF: gets stuck on 'Unassigned'
(15:02:49) IEF: VDSM Soleus03 command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': 'Attempt to call function: <bound method Global.getCapabilities of <vdsm.API.Global object at 0x7f203473ee80>> with arguments: () error: sequence item 0: expected str instance, int found'}
(15:02:49) IEF: 5/12/203:01:50 PM
(15:03:05) IEF: doesn't seem right :)
(15:21:39) sbonazzo: IEF hi, thanks for testing 4.4 RC
(15:22:02) sbonazzo: asocha: ^^^ can you have a look at above message?
(15:22:22) IEF`: it looks like a bug when attempting to enumerate the number of online CPUs.
(15:22:49) sbonazzo: IEF can you share vdsm.log?
(15:25:33) IEF: need the full vdsm.log?
(15:25:36) IEF: https://pastebin.com/tsrcat8x
(15:25:49) IEF: it keeps giving out the same exceptions
(15:30:18) IEF: this is the output from virsh capabilities in case you're wondering:
(15:30:20) IEF:     <cpu>
(15:30:20) IEF:       <arch>x86_64</arch>
(15:30:20) IEF:       <model>EPYC-IBPB</model>
(15:30:20) IEF:       <vendor>AMD</vendor>
(15:30:20) IEF:       <microcode version='137367604'/>
(15:30:22) IEF:       <counter name='tsc' frequency='1996250000'/>
(15:30:24) IEF:       <topology sockets='1' cores='64' threads='1'/>
(15:47:58) asocha: sbonazzo, IEF I don't even pretend to know anythin about this code but I found some quite old comments about numa.cpu_topology()  [1] that is used by the line throwing an error [2]
(15:48:00) asocha: [1] https://github.com/oVirt/vdsm/blob/2f56f70105ff4188d39ea75e6995a17ab7e4a054/lib/vdsm/numa.py#L73
(15:48:13) asocha: [2] https://github.com/oVirt/vdsm/blob/00a7be2419be231082eb78b2bea482e1c5971d0d/lib/vdsm/host/caps.py#L91
(16:40:01) IEF: in any case. I literally worked around it by patching caps.py
(16:40:16) IEF: hardcoded my online CPU string 
(16:40:18) IEF: caps['onlineCpus'] = '0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63'
(16:40:27) IEF: Host actually comes up now.

(Originally by Sandro Bonazzola)

Comment 2 RHV bug bot 2020-06-30 06:23:17 UTC
pastebin link from comment #1 added in comment #0

(Originally by Sandro Bonazzola)

Comment 3 RHV bug bot 2020-06-30 06:23:19 UTC
works with "caps['onlineCpus'] =  ','.join(list(map(str,cpu_topology.online_cpus))) ". Milan?

(Originally by michal.skrivanek)

Comment 4 RHV bug bot 2020-06-30 06:23:21 UTC
There are different types returned by different CPU info retrieval functions, depending on whether libvirt reports NUMA cells or not. This is not a new bug in 4.4. All my machines report NUMA cells, whether they have NUMA or not, so the non-NUMA case with the type error is perhaps not that frequent and was missed in previous testing. Fix posted.

(Originally by Milan Zamazal)

Comment 5 RHV bug bot 2020-06-30 06:23:23 UTC
if not new - any chance this is relevant on RHEL 7 as well? If yes please backport to 4.3 as well, just in case.

(Originally by michal.skrivanek)

Comment 6 RHV bug bot 2020-06-30 06:23:25 UTC
Yes, it fails on 4.3 too. I'll backport the patch.

(Originally by Milan Zamazal)

Comment 12 Polina 2020-07-09 08:01:40 UTC
Verified on - 
ovirt-engine-4.3.11-0.1.el7.noarch
libvirt-4.5.0-36.el7.x86_64

/usr/lib/python2.7/site-packages/vdsm/host/caps.py
caps['onlineCpus'] = ','.join([str(cpu_id) for cpu_id in cpu_topology.online_cpus])

The current libvirt version always synthesize a single NUMA cell even if the host doesn't report any NUMA topology,

Comment 18 errata-xmlrpc 2020-09-30 10:09:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Virtualization RHEL Host (ovirt-host) 4.3.11), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4113


Note You need to log in before you can comment on or make changes to this bug.