Bug 1882157

Summary: [Azure][RHEL-7]lshw command showing wrong memory information in azure m or mv2 series type of instances.
Product: Red Hat Enterprise Linux 7 Reporter: rcheerla
Component: lshwAssignee: ltao
Status: CLOSED WONTFIX QA Contact: Jeff Bastian <jbastian>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.8CC: hhei, klaas, ltao, mheslin, ribarry, ruyang, rvr, xiliang, xuli, yacao, yuxisun
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1882619 (view as bug list) Environment:
Last Closed: 2021-03-17 06:20:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1882619    
Attachments:
Description Flags
The command which indicate the issue.
none
the command output in json format.
none
Showing correct result by disabling DMI as an option to the lshw. none

Description rcheerla 2020-09-23 22:51:40 UTC
Description of problem: lshw command showing wrong memory information in azure m or mv2 series type of instances.


Version-Release number of selected component (if applicable): lshw-B.02.18-14.el7.x86_64


How reproducible: Always.


Steps to Reproduce:
1. Install RHEL 7.8 in m or mv2 series type of instance in azure env(ex:- Standard_M32ts)
2. Install the lshw package if not alreay installed.
3. Then check # lshw -short -C memory

Actual results:

less 0070-lshw_json_C_memory | grep size
    "size" : 65536,
    "size" : 9790498482946048, < bytes >   <<<-----  9.7 Pib
        "size" : 1073741824
        "size" : 1958092821495808
        "size" : 1958092822544384
        "size" : 1958092823592960
        "size" : 1958092824641536
        "size" : 1958092825690112
        "size" : 33291239424



Expected results:

cat 0060-lshw_disableDMI_C_memory
  *-memory
       description: System memory
       physical id: 0
       size: 192GiB  <<--

Additional info:

Comment 3 rcheerla 2020-09-23 23:17:05 UTC
Created attachment 1716171 [details]
The command which indicate the issue.

Comment 4 rcheerla 2020-09-23 23:18:30 UTC
Created attachment 1716172 [details]
the command output in json format.

Comment 5 rcheerla 2020-09-23 23:20:02 UTC
Created attachment 1716173 [details]
Showing correct result by disabling DMI as an option to the lshw.

Comment 6 Yuxin Sun 2020-09-25 07:31:43 UTC
Can reproduce it in RHEL-7.9 and 8.3 if VM memory > 32G, e.g. D16s_v3(64G), E8s_v3(64G), NV6(56G), M32ts(192G)... Cannot reproduce in small sizes, e.g. E4_v3(32G),D8s_v3(32G),F1(2G)...
And tested it on Hyper-V, if VM memory >=36G can reproduce this issue.

**Hyper-V**:
If memory >= 36G, the "lshw -short-C memory" shows 1780TiB system memory:(e.g.40G VM output:)
# lshw -short -C memory
H/W path      Device     Class          Description
===================================================
/0/0                     memory         64KiB BIOS
/0/51                    memory         1780TiB System Memory
/0/51/0                  memory         3968MiB 
/0/51/1                  memory         1780TiB 
/0/51/2                  memory         4225MiB

If 72G, it shows 3561TiB ~= 2*1780TiB:
# lshw -short -C memory
H/W path      Device     Class          Description
===================================================
/0/0                     memory         64KiB BIOS
/0/51                    memory         3561TiB System Memory
/0/51/0                  memory         3968MiB 
/0/51/1                  memory         1780TiB 
/0/51/2                  memory         1780TiB 
/0/51/3                  memory         4226MiB 


**Azure**:
If 128G(E16s_v3): it shows 3*1780TiB:
# lshw -short -C memory
H/W path          Device      Class          Description
========================================================
/0/0                          memory         64KiB BIOS
/0/51                         memory         5342TiB System Memory
/0/51/0                       memory         1GiB 
/0/51/1                       memory         1780TiB 
/0/51/2                       memory         1780TiB 
/0/51/3                       memory         1780TiB 
/0/51/4                       memory         31GiB 

If 192G(M32ts): it shows 8904Tib ~= 5*1780TiB:
H/W path          Device      Class          Description
========================================================
/0/0                          memory         64KiB BIOS
/0/51                         memory         8904TiB System Memory
/0/51/0                       memory         1GiB 
/0/51/1                       memory         1780TiB 
/0/51/2                       memory         1780TiB 
/0/51/3                       memory         1780TiB 
/0/51/4                       memory         1780TiB 
/0/51/5                       memory         1780TiB 
/0/51/6                       memory         31GiB 

lshw packages:
RHEL-8.3: lshw-B.02.19.2-2.el8.x86_64
RHEL-7.9: lshw-B.02.18-17.el7.x86_64
RHEL-7.8: lshw-B.02.18-14.el7.x86_64

Comment 7 Yuxin Sun 2020-09-25 08:30:13 UTC
Xiliang helped to test it in AWS VM and didn't see this issue.

m5.12xlarge: 72G
kernel-3.10.0-1158.el7.x86_64
# lshw -C memory|more
  *-firmware                          
       description: BIOS
       vendor: Amazon EC2
       physical id: 0
       version: 1.0
       date: 10/16/2017
       size: 64KiB
       capacity: 64KiB
       capabilities: pci edd acpi virtualmachine
  *-memory  
       description: System memory
       physical id: 1
       size: 69GiB

m5.12xlarge: 192G
kernel-3.10.0-1160.2.1.el7.x86_64
[root@ip-10-116-2-133 ec2-user]# lshw -C memory 
  *-firmware                          
       description: BIOS
       vendor: Amazon EC2
       physical id: 0
       version: 1.0
       date: 10/16/2017
       size: 64KiB
       capacity: 64KiB
       capabilities: pci edd acpi virtualmachine
  *-memory  
       description: System memory
       physical id: 1
       size: 189GiB

Comment 10 Jeff Bastian 2021-01-08 21:37:10 UTC
I think I see the problem: the SMBIOS tables provided by Azure do not follow the spec [0] and it's confusing lshw.  Specifically, the SMBIOS identifies itself as following spec version 2.3, but it's using the "Extended Size" feature from spec version 2.7 to describe the DIMM size for the virtual DIMM in slot 1.  As a result, lshw is using the the ASCII strings appended to the Type 17 record as the Extended Size value and computes a garbage value for the DIMM size.

[0] https://www.dmtf.org/standards/smbios


The raw SMBIOS data:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@wala79e8sv301081516-vm1 ~]# xxd /sys/firmware/dmi/entries/17-1/raw
0000000: 111b 5700 5100 5000 ffff ffff ff7f 0200  ..W.Q.P.........
0000010: 0102 0104 0000 0003 0202 024d 3100 4e6f  ...........M1.No
0000020: 6e65 004d 6963 726f 736f 6674 0000       ne.Microsoft..

[root@wala79e8sv301081516-vm1 ~]# dmidecode -H 0x57 -u
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.3 present.
338 structures occupying 17307 bytes.
Table at 0x000F93D0.

Handle 0x0057, DMI type 17, 27 bytes
	Header and Data:
		11 1B 57 00 51 00 50 00 FF FF FF FF FF 7F 02 00
		01 02 01 04 00 00 00 03 02 02 02
	Strings:
		4D 31 00
		"M1"
		4E 6F 6E 65 00
		"None"
		4D 69 63 72 6F 73 6F 66 74 00
		"Microsoft"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Looking at the first row of raw bytes:

		11 1B 57 00 51 00 50 00 FF FF FF FF FF 7F 02 00
                   ^^                               ^^^^^
                   |                                |
                   |                                + size of memory device
                   |
                   +-- length of the structure, 0x1B is used for SMBIOS spec 2.3


On the Size field, the spec says:

  Size of the memory device

  If the value is 0, no memory device is installed in the
  socket; if the size is unknown, the field value is
  FFFFh. If the size is 32 GB-1 MB or greater, the
  field value is 7FFFh and the actual size is stored in
  the Extended Size field.

As you can see, the size here is 0x7FFF which means refer to the Extended Size field which starts at offset 0x1C.  However, the table is only 0x1B bytes long, so offset 0x1C is the second byte of the appended ASCII strings.  lshw is interpreting the bytes 31 00 4E 6F (the ASCII chars "1", NUL, "N", and "o" from "M1" and "None" strings) as the Extended Size.

Looking at the lshw source code [1], the size is calculated in this scenario in src/core/dmi.cc:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ nl -ba src/core/dmi.cc
...
  1568  // size
  1569            u = data[0x0D] << 8 | data[0x0C];
  1570            if(u == 0x7FFF) {
  1571               unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C];
  1572               extendsize &= 0x7FFFFFFFUL;
  1573               size = extendsize * 1024ULL * 1024ULL;
  1574            }
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[1] https://github.com/lyonel/lshw/blob/master/src/core/dmi.cc#L1568


Simple test program with the raw data combined with the above chunk of code:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <stdio.h>
#include <stdint.h>

uint8_t raw[] = {
    0x11, 0x1B, 0x57, 0x00, 0x51, 0x00, 0x50, 0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x7F, 0x02, 0x00,
    0x01, 0x02, 0x01, 0x04, 0x00, 0x00, 0x00, 0x03, 0x02, 0x02, 0x02,
    0x4D, 0x31, 0x00,
    0x4E, 0x6F, 0x6E, 0x65, 0x00,
    0x4D, 0x69, 0x63, 0x72, 0x6F, 0x73, 0x6F, 0x66, 0x74, 0x00
};

int main(void)
{
    uint32_t u = 0;
    unsigned long long size = 0;
    uint8_t *data;

    data = raw;

// size
    u = data[0x0D] << 8 | data[0x0C];
    if(u == 0x7FFF) {
        unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C];
        extendsize &= 0x7FFFFFFFUL;
        size = extendsize * 1024ULL * 1024ULL;
    }

    printf("size = %llu bytes\n", size);

    return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running it indeed reports a gigantic size:

$ ./a.out
size = 1958092821495808 bytes


It seems this is not a bug in lshw, but rather Microsoft needs to update the SMBIOS table for Azure VMs to version 2.7 or newer of the spec and provide a proper Extended Size field.

Comment 13 Jeff Bastian 2021-01-08 23:16:20 UTC
I retract my comment 10: this is not a bug in the Azure SMBIOS tables after all.  Upon further reflection, I realized this chunk of code in lshw is not checking the SMBIOS version number.  If the version is less than 2.7, then it should treat the size value of 0x7FFF as a raw value and not a special code.

Here is a simple patch to add a version check:

diff --git a/src/core/dmi.cc b/src/core/dmi.cc
index 30b3ab3b995c..d33d4879bdca 100644
--- a/src/core/dmi.cc
+++ b/src/core/dmi.cc
@@ -1567,10 +1567,13 @@ int dmiversionrev)
 
 // size
           u = data[0x0D] << 8 | data[0x0C];
-          if(u == 0x7FFF) {
-             unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C];
-             extendsize &= 0x7FFFFFFFUL;
-             size = extendsize * 1024ULL * 1024ULL;
+          if ((dmiversionmaj > 2)
+            || ((dmiversionmaj == 2) && (dmiversionmin >= 7))) {
+             if(u == 0x7FFF) {
+                unsigned long long extendsize = (data[0x1F] << 24) | (data[0x1E] << 16) | (data[0x1D] << 8) | data[0x1C];
+                extendsize &= 0x7FFFFFFFUL;
+                size = extendsize * 1024ULL * 1024ULL;
+             }
           }
 	  else
           if (u != 0xFFFF)



With this patch in place, lshw reports correct values:

[root@wala79e8sv301081516-vm1 ~]# rpm -q lshw
lshw-B.02.18-17.bz1882157.el7.x86_64

[root@wala79e8sv301081516-vm1 ~]# lshw -short -C memory
H/W path          Device      Class          Description
========================================================
/0/0                          memory         64KiB BIOS
/0/51                         memory         64GiB System Memory
/0/51/0                       memory         1GiB 
/0/51/1                       memory         31GiB 
/0/51/2                       memory         31GiB 
/0/51/3                       memory         [empty]
/0/51/4                       memory         [empty]
/0/51/5                       memory         [empty]
/0/51/6                       memory         [empty]
...

Comment 15 Jeff Bastian 2021-01-08 23:26:26 UTC
Upstream pull request:

https://github.com/lyonel/lshw/pull/60