Bug 1876742

Summary: libvirtd[32318]: internal error: Invalid unsigned integer value '-1' in file '/sys/devices/system/cpu/cpu0/topology/die_id'
Product: Red Hat Enterprise Linux 8 Reporter: Frank Liang <xiliang>
Component: libvirtAssignee: Daniel Henrique Barboza (IBM) <dbarboza>
Status: CLOSED ERRATA QA Contact: Frank Liang <xiliang>
Severity: low Docs Contact:
Priority: low    
Version: 8.3CC: abologna, dbarboza, drjones, jdenemar, jsuchane, lcapitulino, leiwang, linl, ribarry, virt-maint, vkuznets, ymao
Target Milestone: rc   
Target Release: 8.3   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-6.0.0-29.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 15:21:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: Red Hat1885655    

Description Frank Liang 2020-09-08 06:17:47 UTC
See below internal error in RHEL-8.3 aarch64 kvm host.
Sep 07 21:52:08 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com libvirtd[32318]: hostname: hp-moonshot-03-c03.lab.eng.rdu2.redhat.com
Sep 07 21:52:08 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com libvirtd[32318]: this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Sep 07 21:52:08 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com libvirtd[32318]: this function is not supported by the connection driver: cannot detect host CPU model for aarch64 architecture
Sep 07 21:52:08 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com libvirtd[32318]: internal error: Invalid unsigned integer value '-1' in file '/sys/devices/system/cpu/cpu0/topology/die_id'
Sep 07 21:52:08 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com libvirtd[32318]: Failed to query host NUMA topology, faking single NUMA node
Sep 07 21:52:15 hp-moonshot-03-c03.lab.eng.rdu2.redhat.com systemd[1]: Listening on Virtual machine log manager socket.

[root@hp-moonshot-03-c03 os_tests_result]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           4
NUMA node(s):        1
Vendor ID:           APM
Model:               1
Model name:          X-Gene
Stepping:            0x0
BogoMIPS:            100.00
NUMA node0 CPU(s):   0-7
Flags:               fp asimd evtstrm cpuid

[root@hp-moonshot-03-c03 os_tests_result]# rpm -qa|grep libvirt-6
python3-libvirt-6.0.0-1.module+el8.3.0+6423+e4cb6418.aarch64
libvirt-6.0.0-27.module+el8.3.0+7602+4b93512e.aarch64

Below patch may be required to fix it on aarch64 platform.
commit 0137bf0dab2738d5443e2f407239856e2aa25bb3
Author: Daniel Henrique Barboza <danielhb413>
Date:   Mon Mar 16 21:01:34 2020 -0300

    virhostcpu.c: fix 'die_id' parsing for Power hosts
   
    Commit 7b79ee2f78 makes assumptions about die_id parsing in
    the sysfs that aren't true for Power hosts. In both Power8
    and Power9, running 5.6 and 4.18 kernel respectively,
    'die_id' is set to -1:
   
    $ cat /sys/devices/system/cpu/cpu0/topology/die_id
    -1
   
    This breaks virHostCPUGetDie() parsing because it is trying to
    retrieve an unsigned integer, causing problems during VM start:
   
    virFileReadValueUint:4128 : internal error: Invalid unsigned integer
    value '-1' in file '/sys/devices/system/cpu/cpu0/topology/die_id'
   
    This isn't necessarily a PowerPC only behavior. Linux kernel commit
    0e344d8c70 added in the former Documentation/cputopology.txt, now
    Documentation/admin-guide/cputopology.rst, that:
   
      To be consistent on all architectures, include/linux/topology.h
      provides default definitions for any of the above macros that are
      not defined by include/asm-XXX/topology.h:
   
      1) topology_physical_package_id: -1
      2) topology_die_id: -1
      (...)
   
    This means that it might be expected that an architecture that
    does not implement the die_id element will mark it as -1 in
    sysfs.
   
    It is not required to change die_id implementation from uInt to
    Int because of that. Instead, let's change the parsing of the
    die_id in virHostCPUGetDie() to read an integer value and, in
    case it's -1, default it to zero like in case of file not found.
    This is enough to solve the issue Power hosts are experiencing.
   
    Fixes: 7b79ee2f78bbf2af76df2f6466919e19ae05aeeb
    Signed-off-by: Daniel Henrique Barboza <danielhb413>
    Reviewed-by: Michal Privoznik <mprivozn>

Version-Release number of selected components (if applicable):

RHEL Version:
RHEL-8.3(4.18.0-235.el8.aarch64)

How reproducible:
100%

Steps to Reproduce:
1. Start a RHEL-8.3 aarch64 host
2. Install and enable libvirtd
3. Install a RHEL guest and check journal log.

Actual results:
Found internal error from libvirtd.

Expected results:
No such error from libvirtd.

Additional info:
- N/A

Comment 1 Andrew Jones 2020-09-08 07:09:31 UTC
We don't have a die ID on AArch64, so I agree with the proposed solution in comment 0.

Comment 3 Jaroslav Suchanek 2020-10-02 18:10:31 UTC
Daniel, would you please backport you upstream patch mentioned in comment 0? Seems like this is non-x86_64 problem.

Thanks.

Comment 4 Daniel Henrique Barboza (IBM) 2020-10-05 14:30:44 UTC
Hi,

(In reply to Jaroslav Suchanek from comment #3)
> Daniel, would you please backport you upstream patch mentioned in comment 0?
> Seems like this is non-x86_64 problem.
> 
> Thanks.


Backport of upstream 0137bf0dab2738d544 to RHEL-8.3.0 was posted
downstream.

Comment 8 Luiz Capitulino 2020-10-15 03:40:11 UTC
Xiao,

Would you take this as the QA contact and help verify it?

Actually, I just realized that you reproduced this issue on
RHEL-8.3 non-AV. For aarch64, only AV is supported for a single
customer.

Maybe it's good to have the fix anyways, but would you help
checking if AV has this issue (probably not, due to the date
of the upstream commit...)

Thanks!

Comment 9 Daniel Henrique Barboza (IBM) 2020-10-15 11:13:46 UTC
(In reply to Luiz Capitulino from comment #8)
> Maybe it's good to have the fix anyways, but would you help
> checking if AV has this issue (probably not, due to the date
> of the upstream commit...)
> 

The commit is already present in the rhel-av-8.3.0 tree. The issue
shouldn't be reproduced with AV.


Thanks,


DHB

Comment 10 Frank Liang 2020-10-16 02:20:58 UTC
(In reply to Luiz Capitulino from comment #8)
> Xiao,
> 
> Would you take this as the QA contact and help verify it?
yes.
> 
> Actually, I just realized that you reproduced this issue on
> RHEL-8.3 non-AV. For aarch64, only AV is supported for a single
> customer.
> 
> Maybe it's good to have the fix anyways, but would you help
> checking if AV has this issue (probably not, due to the date
> of the upstream commit...)
> 
AV does not have this issue.
# rpm -qa|grep libvirt-6
python3-libvirt-6.0.0-1.module+el8.3.0+6423+e4cb6418.aarch64
libvirt-6.6.0-6.module+el8.3.0+8125+aefcf088.aarch64

# journalctl -u libvirtd
-- Logs begin at Thu 2020-10-15 22:10:05 EDT, end at Fri 2020-10-16 02:10:24 EDT. --
Oct 16 02:10:19 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com systemd[1]: Starting Virtualization daemon...
Oct 16 02:10:19 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com systemd[1]: Started Virtualization daemon.
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1965]: listening on virbr0(#3): 192.168.122.1
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: started, version 2.79 cachesize 150
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipse>
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq-dhcp[1984]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq-dhcp[1984]: DHCP, sockets bound exclusively to interface virbr0
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: reading /etc/resolv.conf
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: using nameserver 10.19.42.41#53
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: using nameserver 10.11.5.19#53
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: using nameserver 10.5.30.160#53
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: read /etc/hosts - 2 addresses
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq[1984]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Oct 16 02:10:20 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com dnsmasq-dhcp[1984]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Oct 15 22:12:26 ampere-hr330a-05.khw4.lab.eng.bos.redhat.com systemd[1]: libvirtd.service: Succeeded.

Comment 11 Frank Liang 2020-12-23 08:29:02 UTC
Tested pass in RHEL-8.4.0-20201222.n.0, so move it to 'VERIFIED'.

# rpm -qa |grep libvirt-6
python3-libvirt-6.0.0-1.module+el8.3.0+6423+e4cb6418.aarch64
libvirt-6.0.0-32.module+el8.4.0+9172+b707c649.aarch64
# uname -r
4.18.0-265.el8.aarch64

# cat /sys/devices/system/cpu/cpu0/topology/die_id
-1

Dec 23 03:19:25 ampere-hr330a-04.khw4.lab.eng.bos.redhat.com libvirtd[22765]: libvirt version: 6.0.0, package: 32.module+el8.4.0+9172+b707c649 (Red Hat, Inc. <http://bugz>
Dec 23 03:19:25 ampere-hr330a-04.khw4.lab.eng.bos.redhat.com libvirtd[22765]: hostname: ampere-hr330a-04.khw4.lab.eng.bos.redhat.com
Dec 23 03:19:25 ampere-hr330a-04.khw4.lab.eng.bos.redhat.com libvirtd[22765]: this function is not supported by the connection driver: cannot detect host CPU model for aa>
Dec 23 03:19:25 ampere-hr330a-04.khw4.lab.eng.bos.redhat.com libvirtd[22765]: this function is not supported by the connection driver: cannot detect host CPU model for aa>
Dec 23 03:19:26 ampere-hr330a-04.khw4.lab.eng.bos.redhat.com systemd[1]: Listening on Virtual machine log manager socket.

Comment 13 errata-xmlrpc 2021-05-18 15:21:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1762