Bug 1882157
Summary: | [Azure][RHEL-7]lshw command showing wrong memory information in azure m or mv2 series type of instances. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | rcheerla | ||||||||
Component: | lshw | Assignee: | ltao | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Jeff Bastian <jbastian> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7.8 | CC: | hhei, klaas, ltao, mheslin, ribarry, ruyang, rvr, xiliang, xuli, yacao, yuxisun | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1882619 (view as bug list) | Environment: | |||||||||
Last Closed: | 2021-03-17 06:20:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1882619 | ||||||||||
Attachments: |
|
Description
rcheerla
2020-09-23 22:51:40 UTC
Created attachment 1716171 [details]
The command which indicate the issue.
Created attachment 1716172 [details]
the command output in json format.
Created attachment 1716173 [details]
Showing correct result by disabling DMI as an option to the lshw.
Can reproduce it in RHEL-7.9 and 8.3 if VM memory > 32G, e.g. D16s_v3(64G), E8s_v3(64G), NV6(56G), M32ts(192G)... Cannot reproduce in small sizes, e.g. E4_v3(32G),D8s_v3(32G),F1(2G)... And tested it on Hyper-V, if VM memory >=36G can reproduce this issue. **Hyper-V**: If memory >= 36G, the "lshw -short-C memory" shows 1780TiB system memory:(e.g.40G VM output:) # lshw -short -C memory H/W path Device Class Description =================================================== /0/0 memory 64KiB BIOS /0/51 memory 1780TiB System Memory /0/51/0 memory 3968MiB /0/51/1 memory 1780TiB /0/51/2 memory 4225MiB If 72G, it shows 3561TiB ~= 2*1780TiB: # lshw -short -C memory H/W path Device Class Description =================================================== /0/0 memory 64KiB BIOS /0/51 memory 3561TiB System Memory /0/51/0 memory 3968MiB /0/51/1 memory 1780TiB /0/51/2 memory 1780TiB /0/51/3 memory 4226MiB **Azure**: If 128G(E16s_v3): it shows 3*1780TiB: # lshw -short -C memory H/W path Device Class Description ======================================================== /0/0 memory 64KiB BIOS /0/51 memory 5342TiB System Memory /0/51/0 memory 1GiB /0/51/1 memory 1780TiB /0/51/2 memory 1780TiB /0/51/3 memory 1780TiB /0/51/4 memory 31GiB If 192G(M32ts): it shows 8904Tib ~= 5*1780TiB: H/W path Device Class Description ======================================================== /0/0 memory 64KiB BIOS /0/51 memory 8904TiB System Memory /0/51/0 memory 1GiB /0/51/1 memory 1780TiB /0/51/2 memory 1780TiB /0/51/3 memory 1780TiB /0/51/4 memory 1780TiB /0/51/5 memory 1780TiB /0/51/6 memory 31GiB lshw packages: RHEL-8.3: lshw-B.02.19.2-2.el8.x86_64 RHEL-7.9: lshw-B.02.18-17.el7.x86_64 RHEL-7.8: lshw-B.02.18-14.el7.x86_64 Xiliang helped to test it in AWS VM and didn't see this issue. m5.12xlarge: 72G kernel-3.10.0-1158.el7.x86_64 # lshw -C memory|more *-firmware description: BIOS vendor: Amazon EC2 physical id: 0 version: 1.0 date: 10/16/2017 size: 64KiB capacity: 64KiB capabilities: pci edd acpi virtualmachine *-memory description: System memory physical id: 1 size: 69GiB m5.12xlarge: 192G kernel-3.10.0-1160.2.1.el7.x86_64 [root@ip-10-116-2-133 ec2-user]# lshw -C memory *-firmware description: BIOS vendor: Amazon EC2 physical id: 0 version: 1.0 date: 10/16/2017 size: 64KiB capacity: 64KiB capabilities: pci edd acpi virtualmachine *-memory description: System memory physical id: 1 size: 189GiB I think I see the problem: the SMBIOS tables provided by Azure do not follow the spec [0] and it's confusing lshw. Specifically, the SMBIOS identifies itself as following spec version 2.3, but it's using the "Extended Size" feature from spec version 2.7 to describe the DIMM size for the virtual DIMM in slot 1. As a result, lshw is using the the ASCII strings appended to the Type 17 record as the Extended Size value and computes a garbage value for the DIMM size. [0] https://www.dmtf.org/standards/smbios The raw SMBIOS data: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [root@wala79e8sv301081516-vm1 ~]# xxd /sys/firmware/dmi/entries/17-1/raw 0000000: 111b 5700 5100 5000 ffff ffff ff7f 0200 ..W.Q.P......... 0000010: 0102 0104 0000 0003 0202 024d 3100 4e6f ...........M1.No 0000020: 6e65 004d 6963 726f 736f 6674 0000 ne.Microsoft.. [root@wala79e8sv301081516-vm1 ~]# dmidecode -H 0x57 -u # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 2.3 present. 338 structures occupying 17307 bytes. Table at 0x000F93D0. Handle 0x0057, DMI type 17, 27 bytes Header and Data: 11 1B 57 00 51 00 50 00 FF FF FF FF FF 7F 02 00 01 02 01 04 00 00 00 03 02 02 02 Strings: 4D 31 00 "M1" 4E 6F 6E 65 00 "None" 4D 69 63 72 6F 73 6F 66 74 00 "Microsoft" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking at the first row of raw bytes: 11 1B 57 00 51 00 50 00 FF FF FF FF FF 7F 02 00 ^^ ^^^^^ | | | + size of memory device | +-- length of the structure, 0x1B is used for SMBIOS spec 2.3 On the Size field, the spec says: Size of the memory device If the value is 0, no memory device is installed in the socket; if the size is unknown, the field value is FFFFh. If the size is 32 GB-1 MB or greater, the field value is 7FFFh and the actual size is stored in the Extended Size field. As you can see, the size here is 0x7FFF which means refer to the Extended Size field which starts at offset 0x1C. However, the table is only 0x1B bytes long, so offset 0x1C is the second byte of the appended ASCII strings. lshw is interpreting the bytes 31 00 4E 6F (the ASCII chars "1", NUL, "N", and "o" from "M1" and "None" strings) as the Extended Size. Looking at the lshw source code [1], the size is calculated in this scenario in src/core/dmi.cc: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $ nl -ba src/core/dmi.cc ... 1568 // size 1569 u = data[0x0D] << 8 | data[0x0C]; 1570 if(u == 0x7FFF) { 1571 unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C]; 1572 extendsize &= 0x7FFFFFFFUL; 1573 size = extendsize * 1024ULL * 1024ULL; 1574 } ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [1] https://github.com/lyonel/lshw/blob/master/src/core/dmi.cc#L1568 Simple test program with the raw data combined with the above chunk of code: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdint.h> uint8_t raw[] = { 0x11, 0x1B, 0x57, 0x00, 0x51, 0x00, 0x50, 0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x7F, 0x02, 0x00, 0x01, 0x02, 0x01, 0x04, 0x00, 0x00, 0x00, 0x03, 0x02, 0x02, 0x02, 0x4D, 0x31, 0x00, 0x4E, 0x6F, 0x6E, 0x65, 0x00, 0x4D, 0x69, 0x63, 0x72, 0x6F, 0x73, 0x6F, 0x66, 0x74, 0x00 }; int main(void) { uint32_t u = 0; unsigned long long size = 0; uint8_t *data; data = raw; // size u = data[0x0D] << 8 | data[0x0C]; if(u == 0x7FFF) { unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C]; extendsize &= 0x7FFFFFFFUL; size = extendsize * 1024ULL * 1024ULL; } printf("size = %llu bytes\n", size); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Running it indeed reports a gigantic size: $ ./a.out size = 1958092821495808 bytes It seems this is not a bug in lshw, but rather Microsoft needs to update the SMBIOS table for Azure VMs to version 2.7 or newer of the spec and provide a proper Extended Size field. I retract my comment 10: this is not a bug in the Azure SMBIOS tables after all. Upon further reflection, I realized this chunk of code in lshw is not checking the SMBIOS version number. If the version is less than 2.7, then it should treat the size value of 0x7FFF as a raw value and not a special code. Here is a simple patch to add a version check: diff --git a/src/core/dmi.cc b/src/core/dmi.cc index 30b3ab3b995c..d33d4879bdca 100644 --- a/src/core/dmi.cc +++ b/src/core/dmi.cc @@ -1567,10 +1567,13 @@ int dmiversionrev) // size u = data[0x0D] << 8 | data[0x0C]; - if(u == 0x7FFF) { - unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C]; - extendsize &= 0x7FFFFFFFUL; - size = extendsize * 1024ULL * 1024ULL; + if ((dmiversionmaj > 2) + || ((dmiversionmaj == 2) && (dmiversionmin >= 7))) { + if(u == 0x7FFF) { + unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C]; + extendsize &= 0x7FFFFFFFUL; + size = extendsize * 1024ULL * 1024ULL; + } } else if (u != 0xFFFF) With this patch in place, lshw reports correct values: [root@wala79e8sv301081516-vm1 ~]# rpm -q lshw lshw-B.02.18-17.bz1882157.el7.x86_64 [root@wala79e8sv301081516-vm1 ~]# lshw -short -C memory H/W path Device Class Description ======================================================== /0/0 memory 64KiB BIOS /0/51 memory 64GiB System Memory /0/51/0 memory 1GiB /0/51/1 memory 31GiB /0/51/2 memory 31GiB /0/51/3 memory [empty] /0/51/4 memory [empty] /0/51/5 memory [empty] /0/51/6 memory [empty] ... Upstream pull request: https://github.com/lyonel/lshw/pull/60 |