1733235 – Installed worker nodes/machines have different amounts of memory

Bug 1733235 - Installed worker nodes/machines have different amounts of memory

Summary: Installed worker nodes/machines have different amounts of memory

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Andrew McDermott
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1731011
TreeView+	depends on / blocked

Reported:	2019-07-25 13:40 UTC by Andrew McDermott
Modified:	2023-12-15 16:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-06 13:05:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrew McDermott 2019-07-25 13:40:58 UTC

Installing on either Azure or AWS I see that worker nodes have different amounts of RAM as reported by node 'capacity'.

If I scale up the available machinesets:

oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2a -p '{"spec":{"replicas":10}}' --type=merge;
oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2b -p '{"spec":{"replicas":10}}' --type=merge;
oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2c -p '{"spec":{"replicas":10}}' --type=merge;

and wait for the nodes to become "Ready" then looking at the reported memory capacity for each node I see different amounts. I was expecting the worker nodes to report identical values.

$ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c
      3 "16420436Ki" # Masters
     16 "8162892Ki"
     14 "8162900Ki"

Doing the same on Azure I see:

$ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus1 -p '{"spec":{"replicas":10}}' --type=merge;
machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus1 patched
aim@spicy:~/go-projects/openshift-installer/src/github.com/openshift/installer
$ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus2 -p '{"spec":{"replicas":10}}' --type=merge;
machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus2 patched
aim@spicy:~/go-projects/openshift-installer/src/github.com/openshift/installer
$ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus3 -p '{"spec":{"replicas":10}}' --type=merge;
machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus3 patched

$ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq  -c
      3 "16399092Ki" # Masters
      2 "8141552Ki"
     27 "8141560Ki"
      1 "8141752Ki"


Sometimes I see identical values reported after installation completes, but that tends to happen way more often on Auzre than AWS right now. I tripped over the discrepancy, post initial installation, whilst investigating:

https://bugzilla.redhat.com/show_bug.cgi?id=1731011

$ ./bin/openshift-install version
./bin/openshift-install unreleased-master-1404-ge090d19dda2ba9d80dba652e168fcc8ed54f1b55
built from commit e090d19dda2ba9d80dba652e168fcc8ed54f1b55
release image registry.svc.ci.openshift.org/origin/release:4.2


(Apologies if this proves to be filed against the wrong domain/component.)

Comment 1 Andrew McDermott 2019-07-25 14:30:22 UTC

Note: when a node/machine reports differing amount of memory the actual amount reported is derived from /proc/meminfo. If you ssh to each worker node then you can see machines in the same machineset can report different values in /proc/meminfo.

Comment 2 Robert Krawitz 2019-07-25 15:32:19 UTC

Andrew, if you do 'oc adm node-logs' (or dmesg on the node), take a look for the BIOS-provided memory map.  Is the memory map from the BIOS identical across nodes reporting different amount of memory?

Comment 3 Andrew McDermott 2019-07-25 16:04:46 UTC

(In reply to Robert Krawitz from comment #2)
> Andrew, if you do 'oc adm node-logs' (or dmesg on the node), take a look for
> the BIOS-provided memory map.  Is the memory map from the BIOS identical
> across nodes reporting different amount of memory?

Right now I have:

$ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c
      3 "16420436Ki"
      1 "8162892Ki"
      2 "8162900Ki"

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-139-28.us-east-2.compute.internal    Ready    master   6h26m   v1.14.0+bd34733a7
ip-10-0-143-189.us-east-2.compute.internal   Ready    worker   109m    v1.14.0+bd34733a7
ip-10-0-155-36.us-east-2.compute.internal    Ready    worker   109m    v1.14.0+bd34733a7
ip-10-0-157-15.us-east-2.compute.internal    Ready    master   6h26m   v1.14.0+bd34733a7
ip-10-0-167-148.us-east-2.compute.internal   Ready    worker   7m2s    v1.14.0+bd34733a7
ip-10-0-170-40.us-east-2.compute.internal    Ready    master   6h26m   v1.14.0+bd34733a7


$ ssh ip-10-0-143-189.us-east-2.compute.internal dmesg |grep Mem
[    0.000000] Memory: 3937920K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289296K reserved, 0K cma-reserved)
[    0.201067] x86/mm: Memory block size: 128MB

$ ssh ip-10-0-143-189.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]'
MemTotal:        8162900 kB
MemFree:         4470312 kB
MemAvailable:    7003856 kB




$ ssh ip-10-0-155-36.us-east-2.compute.internal dmesg |grep Mem
[    0.000000] Memory: 3937920K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289296K reserved, 0K cma-reserved)
[    0.237069] x86/mm: Memory block size: 128MB

$ ssh ip-10-0-155-36.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]'
MemTotal:        8162900 kB
MemFree:         3576548 kB
MemAvailable:    6200228 kB




$ ssh ip-10-0-167-148.us-east-2.compute.internal dmesg |grep Mem
[    0.000000] Memory: 3908240K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289304K reserved, 0K cma-reserved)
[    0.206061] x86/mm: Memory block size: 128MB

$ ssh ip-10-0-167-148.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]'
MemTotal:        8162892 kB
MemFree:         4883292 kB
MemAvailable:    6526712 kB

Note the last node has 8162892 whereas the other worker nodes both have 8162900. And the output from dmesg shows:

- Memory: 3937920K/8388212K
- Memory: 3937920K/8388212K
- Memory: 3908240K/8388212K

Also if you look at https://bugzilla.redhat.com/show_bug.cgi?id=1731011#c3 you can get to a point where they are all equal, irrespective (it seems) of the availability zone.

Comment 4 Robert Krawitz 2019-07-25 17:25:02 UTC

I want more detailed data than that, the entire memory map.  Something like this:

[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000030f42fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000030f43000-0x0000000043de3fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000043de4000-0x0000000043de4fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000043de5000-0x000000004ff29fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000004ff2a000-0x000000004ffbefff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000004ffbf000-0x000000004fffefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000004ffff000-0x0000000057ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000058600000-0x000000005e7fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed84000-0x00000000fed84fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000109f7fffff] usable

If they're not identical, there's probably not much you can do.

Comment 5 Andrew McDermott 2019-07-26 08:13:56 UTC

Gathering via:

$ echo $workers
ip-10-0-143-189.us-east-2.compute.internal ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute.
internal

$ type g
g is a function
g () 
{ 
    for i in $workers;
    do
        echo "Memory map for: $i";
        ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto 'BIOS' | tee /tmp/$i.meminfo;
        echo;
    done
}

$ diff3 ip-10-0-143-189.us-east-2.compute.internal ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute.internal
$ echo $?
0

And the output:


Memory map for: ip-10-0-143-189.us-east-2.compute.internal
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
SMBIOS 2.7 present.
DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
intel_idle: Please enable MWAIT in BIOS SETUP
piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr


Memory map for: ip-10-0-155-36.us-east-2.compute.internal
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
SMBIOS 2.7 present.
DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
intel_idle: Please enable MWAIT in BIOS SETUP
piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr


Memory map for: ip-10-0-167-148.us-east-2.compute.internal
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
SMBIOS 2.7 present.
DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
intel_idle: Please enable MWAIT in BIOS SETUP
piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr

Comment 6 Andrew McDermott 2019-07-26 08:15:44 UTC

(In reply to Andrew McDermott from comment #5)
> Gathering via:
> 
> $ echo $workers
> ip-10-0-143-189.us-east-2.compute.internal
> ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute.
> internal
> 
> $ type g
> g is a function
> g () 
> { 
>     for i in $workers;
>     do
>         echo "Memory map for: $i";
>         ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto
> 'BIOS' | tee /tmp/$i.meminfo;
>         echo;
>     done
> }
> 
> $ diff3 ip-10-0-143-189.us-east-2.compute.internal
> ip-10-0-155-36.us-east-2.compute.internal
> ip-10-0-167-148.us-east-2.compute.internal
> $ echo $?
> 0
> 
> And the output:
> 
> 
> Memory map for: ip-10-0-143-189.us-east-2.compute.internal
> BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
> BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
> BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
> BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
> SMBIOS 2.7 present.
> DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> intel_idle: Please enable MWAIT in BIOS SETUP
> piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or
> use force_addr=0xaddr
> 
> 
> Memory map for: ip-10-0-155-36.us-east-2.compute.internal
> BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
> BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
> BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
> BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
> SMBIOS 2.7 present.
> DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> intel_idle: Please enable MWAIT in BIOS SETUP
> piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or
> use force_addr=0xaddr
> 
> 
> Memory map for: ip-10-0-167-148.us-east-2.compute.internal
> BIOS-provided physical RAM map:
> BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
> BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
> BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable
> BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
> BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable
> SMBIOS 2.7 present.
> DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> intel_idle: Please enable MWAIT in BIOS SETUP
> piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or
> use force_addr=0xaddr

Just correcting/confirming the function for gathering was actually:

[core@ssh-bastion-65fd55cb7f-bff4x tmp]$ type g
g is a function
g () 
{ 
    for i in $workers;
    do
        echo;
        echo "Memory map for: $i";
        ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto --color=auto 'BIOS' | tee /tmp/$i;
        echo;
    done
}

Comment 11 Andrew McDermott 2019-07-26 08:40:13 UTC

Trying to narrow this done a bit more grepping for either 'BIOS' or 'mem' gives:

[core@ssh-bastion-65fd55cb7f-bff4x ~]$ type g
g is a function
g () 
{ 
    for i in $workers;
    do
        echo;
        echo "Memory map for: $i";
        ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto --color=auto 'BIOS|mem' | tee /tmp/$i;
        echo;
    done
}

[core@ssh-bastion-65fd55cb7f-bff4x tmp]$ diff3 $workers
====3
1:16c
2:16c
  NODE_DATA(0) allocated [mem 0x20ffd6000-0x20fffffff]
3:16c
  NODE_DATA(0) allocated [mem 0x20ffd5000-0x20fffefff]
====3
1:57c
2:57c
  [TTM] Zone  kernel: Available graphics memory: 4081450 kiB
3:57c
  [TTM] Zone  kernel: Available graphics memory: 4081446 kiB

So perhaps this ^^ is the difference. Hmm.

Comment 12 Robert Krawitz 2019-07-26 14:53:59 UTC

Note that mcelog can offline individual pages based on soft error rates: http://www.mcelog.org/badpageofflining.html

Comment 15 Andrew McDermott 2019-08-05 14:40:33 UTC

I also see differences on GCP instances:

$ kubectl get nodes
NAME                                                    STATUS   ROLES    AGE     VERSION
amcder-9s2hb-m-0.c.openshift-gce-devel.internal         Ready    master   5h10m   v1.14.0+0261aa0df
amcder-9s2hb-m-1.c.openshift-gce-devel.internal         Ready    master   5h10m   v1.14.0+0261aa0df
amcder-9s2hb-m-2.c.openshift-gce-devel.internal         Ready    master   5h10m   v1.14.0+0261aa0df
amcder-9s2hb-w-a-g5cb7.c.openshift-gce-devel.internal   Ready    worker   4h56m   v1.14.0+0261aa0df
amcder-9s2hb-w-b-7sr2x.c.openshift-gce-devel.internal   Ready    worker   4h56m   v1.14.0+0261aa0df
amcder-9s2hb-w-c-s4xsq.c.openshift-gce-devel.internal   Ready    worker   4h56m   v1.14.0+0261aa0df

aim@spicy:~/go-projects/openshift-cluster-api/src/sigs.k8s.io/cluster-api
$ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c
     11 "15389264Ki"
     21 "15389280Ki"
      1 "15389988Ki"
aim@spicy:~/go-projects/openshift-cluster-api/src/sigs.k8s.io/cluster-api

$ oc get machinesets --all-namespaces
NAMESPACE               NAME               DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   amcder-9s2hb-w-a   10        10        10      10          5h32m
openshift-machine-api   amcder-9s2hb-w-b   10        10        10      10          5h32m
openshift-machine-api   amcder-9s2hb-w-c   10        10        10      10          5h32m

Comment 18 Alberto 2019-11-06 13:05:33 UTC

This mem discrepancy was affecting the autoscaler which was fix by https://github.com/kubernetes/autoscaler/commit/e8b3c2a111eb9b3b4fe15b4d4081175267ff6d76#diff-dfab69cae1cc7f7d024593ed57d7371a.
I think we can assume the deviation as inherent to the cloud provider and close this, please reopen if relevant.

Note You need to log in before you can comment on or make changes to this bug.