Installing on either Azure or AWS I see that worker nodes have different amounts of RAM as reported by node 'capacity'. If I scale up the available machinesets: oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2a -p '{"spec":{"replicas":10}}' --type=merge; oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2b -p '{"spec":{"replicas":10}}' --type=merge; oc patch -n openshift-machine-api machineset/amcdermo-st695-worker-us-east-2c -p '{"spec":{"replicas":10}}' --type=merge; and wait for the nodes to become "Ready" then looking at the reported memory capacity for each node I see different amounts. I was expecting the worker nodes to report identical values. $ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c 3 "16420436Ki" # Masters 16 "8162892Ki" 14 "8162900Ki" Doing the same on Azure I see: $ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus1 -p '{"spec":{"replicas":10}}' --type=merge; machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus1 patched aim@spicy:~/go-projects/openshift-installer/src/github.com/openshift/installer $ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus2 -p '{"spec":{"replicas":10}}' --type=merge; machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus2 patched aim@spicy:~/go-projects/openshift-installer/src/github.com/openshift/installer $ oc patch -n openshift-machine-api machineset/amcdermo-190725-t9l84-worker-centralus3 -p '{"spec":{"replicas":10}}' --type=merge; machineset.machine.openshift.io/amcdermo-190725-t9l84-worker-centralus3 patched $ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c 3 "16399092Ki" # Masters 2 "8141552Ki" 27 "8141560Ki" 1 "8141752Ki" Sometimes I see identical values reported after installation completes, but that tends to happen way more often on Auzre than AWS right now. I tripped over the discrepancy, post initial installation, whilst investigating: https://bugzilla.redhat.com/show_bug.cgi?id=1731011 $ ./bin/openshift-install version ./bin/openshift-install unreleased-master-1404-ge090d19dda2ba9d80dba652e168fcc8ed54f1b55 built from commit e090d19dda2ba9d80dba652e168fcc8ed54f1b55 release image registry.svc.ci.openshift.org/origin/release:4.2 (Apologies if this proves to be filed against the wrong domain/component.)
Note: when a node/machine reports differing amount of memory the actual amount reported is derived from /proc/meminfo. If you ssh to each worker node then you can see machines in the same machineset can report different values in /proc/meminfo.
Andrew, if you do 'oc adm node-logs' (or dmesg on the node), take a look for the BIOS-provided memory map. Is the memory map from the BIOS identical across nodes reporting different amount of memory?
(In reply to Robert Krawitz from comment #2) > Andrew, if you do 'oc adm node-logs' (or dmesg on the node), take a look for > the BIOS-provided memory map. Is the memory map from the BIOS identical > across nodes reporting different amount of memory? Right now I have: $ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c 3 "16420436Ki" 1 "8162892Ki" 2 "8162900Ki" $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-139-28.us-east-2.compute.internal Ready master 6h26m v1.14.0+bd34733a7 ip-10-0-143-189.us-east-2.compute.internal Ready worker 109m v1.14.0+bd34733a7 ip-10-0-155-36.us-east-2.compute.internal Ready worker 109m v1.14.0+bd34733a7 ip-10-0-157-15.us-east-2.compute.internal Ready master 6h26m v1.14.0+bd34733a7 ip-10-0-167-148.us-east-2.compute.internal Ready worker 7m2s v1.14.0+bd34733a7 ip-10-0-170-40.us-east-2.compute.internal Ready master 6h26m v1.14.0+bd34733a7 $ ssh ip-10-0-143-189.us-east-2.compute.internal dmesg |grep Mem [ 0.000000] Memory: 3937920K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289296K reserved, 0K cma-reserved) [ 0.201067] x86/mm: Memory block size: 128MB $ ssh ip-10-0-143-189.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]' MemTotal: 8162900 kB MemFree: 4470312 kB MemAvailable: 7003856 kB $ ssh ip-10-0-155-36.us-east-2.compute.internal dmesg |grep Mem [ 0.000000] Memory: 3937920K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289296K reserved, 0K cma-reserved) [ 0.237069] x86/mm: Memory block size: 128MB $ ssh ip-10-0-155-36.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]' MemTotal: 8162900 kB MemFree: 3576548 kB MemAvailable: 6200228 kB $ ssh ip-10-0-167-148.us-east-2.compute.internal dmesg |grep Mem [ 0.000000] Memory: 3908240K/8388212K available (12292K kernel code, 2101K rwdata, 3816K rodata, 2356K init, 3320K bss, 289304K reserved, 0K cma-reserved) [ 0.206061] x86/mm: Memory block size: 128MB $ ssh ip-10-0-167-148.us-east-2.compute.internal cat /proc/meminfo | egrep 'Mem[Total|Free|Available]' MemTotal: 8162892 kB MemFree: 4883292 kB MemAvailable: 6526712 kB Note the last node has 8162892 whereas the other worker nodes both have 8162900. And the output from dmesg shows: - Memory: 3937920K/8388212K - Memory: 3937920K/8388212K - Memory: 3908240K/8388212K Also if you look at https://bugzilla.redhat.com/show_bug.cgi?id=1731011#c3 you can get to a point where they are all equal, irrespective (it seems) of the availability zone.
I want more detailed data than that, the entire memory map. Something like this: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000030f42fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000030f43000-0x0000000043de3fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000043de4000-0x0000000043de4fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x0000000043de5000-0x000000004ff29fff] reserved [ 0.000000] BIOS-e820: [mem 0x000000004ff2a000-0x000000004ffbefff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x000000004ffbf000-0x000000004fffefff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000004ffff000-0x0000000057ffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000058600000-0x000000005e7fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed84000-0x00000000fed84fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000109f7fffff] usable If they're not identical, there's probably not much you can do.
Gathering via: $ echo $workers ip-10-0-143-189.us-east-2.compute.internal ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute. internal $ type g g is a function g () { for i in $workers; do echo "Memory map for: $i"; ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto 'BIOS' | tee /tmp/$i.meminfo; echo; done } $ diff3 ip-10-0-143-189.us-east-2.compute.internal ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute.internal $ echo $? 0 And the output: Memory map for: ip-10-0-143-189.us-east-2.compute.internal BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable SMBIOS 2.7 present. DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 intel_idle: Please enable MWAIT in BIOS SETUP piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr Memory map for: ip-10-0-155-36.us-east-2.compute.internal BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable SMBIOS 2.7 present. DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 intel_idle: Please enable MWAIT in BIOS SETUP piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr Memory map for: ip-10-0-167-148.us-east-2.compute.internal BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable SMBIOS 2.7 present. DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 intel_idle: Please enable MWAIT in BIOS SETUP piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
(In reply to Andrew McDermott from comment #5) > Gathering via: > > $ echo $workers > ip-10-0-143-189.us-east-2.compute.internal > ip-10-0-155-36.us-east-2.compute.internal ip-10-0-167-148.us-east-2.compute. > internal > > $ type g > g is a function > g () > { > for i in $workers; > do > echo "Memory map for: $i"; > ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto > 'BIOS' | tee /tmp/$i.meminfo; > echo; > done > } > > $ diff3 ip-10-0-143-189.us-east-2.compute.internal > ip-10-0-155-36.us-east-2.compute.internal > ip-10-0-167-148.us-east-2.compute.internal > $ echo $? > 0 > > And the output: > > > Memory map for: ip-10-0-143-189.us-east-2.compute.internal > BIOS-provided physical RAM map: > BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable > BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved > BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable > BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved > BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable > SMBIOS 2.7 present. > DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 > intel_idle: Please enable MWAIT in BIOS SETUP > piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or > use force_addr=0xaddr > > > Memory map for: ip-10-0-155-36.us-east-2.compute.internal > BIOS-provided physical RAM map: > BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable > BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved > BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable > BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved > BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable > SMBIOS 2.7 present. > DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 > intel_idle: Please enable MWAIT in BIOS SETUP > piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or > use force_addr=0xaddr > > > Memory map for: ip-10-0-167-148.us-east-2.compute.internal > BIOS-provided physical RAM map: > BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable > BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved > BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable > BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved > BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable > SMBIOS 2.7 present. > DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 > intel_idle: Please enable MWAIT in BIOS SETUP > piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or > use force_addr=0xaddr Just correcting/confirming the function for gathering was actually: [core@ssh-bastion-65fd55cb7f-bff4x tmp]$ type g g is a function g () { for i in $workers; do echo; echo "Memory map for: $i"; ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto --color=auto 'BIOS' | tee /tmp/$i; echo; done }
Trying to narrow this done a bit more grepping for either 'BIOS' or 'mem' gives: [core@ssh-bastion-65fd55cb7f-bff4x ~]$ type g g is a function g () { for i in $workers; do echo; echo "Memory map for: $i"; ssh -o StrictHostKeyChecking=no $i dmesg -t | egrep --color=auto --color=auto 'BIOS|mem' | tee /tmp/$i; echo; done } [core@ssh-bastion-65fd55cb7f-bff4x tmp]$ diff3 $workers ====3 1:16c 2:16c NODE_DATA(0) allocated [mem 0x20ffd6000-0x20fffffff] 3:16c NODE_DATA(0) allocated [mem 0x20ffd5000-0x20fffefff] ====3 1:57c 2:57c [TTM] Zone kernel: Available graphics memory: 4081450 kiB 3:57c [TTM] Zone kernel: Available graphics memory: 4081446 kiB So perhaps this ^^ is the difference. Hmm.
Note that mcelog can offline individual pages based on soft error rates: http://www.mcelog.org/badpageofflining.html
I also see differences on GCP instances: $ kubectl get nodes NAME STATUS ROLES AGE VERSION amcder-9s2hb-m-0.c.openshift-gce-devel.internal Ready master 5h10m v1.14.0+0261aa0df amcder-9s2hb-m-1.c.openshift-gce-devel.internal Ready master 5h10m v1.14.0+0261aa0df amcder-9s2hb-m-2.c.openshift-gce-devel.internal Ready master 5h10m v1.14.0+0261aa0df amcder-9s2hb-w-a-g5cb7.c.openshift-gce-devel.internal Ready worker 4h56m v1.14.0+0261aa0df amcder-9s2hb-w-b-7sr2x.c.openshift-gce-devel.internal Ready worker 4h56m v1.14.0+0261aa0df amcder-9s2hb-w-c-s4xsq.c.openshift-gce-devel.internal Ready worker 4h56m v1.14.0+0261aa0df aim@spicy:~/go-projects/openshift-cluster-api/src/sigs.k8s.io/cluster-api $ oc get nodes -o json | jq '.items[].status.capacity["memory"]' | sort | uniq -c 11 "15389264Ki" 21 "15389280Ki" 1 "15389988Ki" aim@spicy:~/go-projects/openshift-cluster-api/src/sigs.k8s.io/cluster-api $ oc get machinesets --all-namespaces NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api amcder-9s2hb-w-a 10 10 10 10 5h32m openshift-machine-api amcder-9s2hb-w-b 10 10 10 10 5h32m openshift-machine-api amcder-9s2hb-w-c 10 10 10 10 5h32m
This mem discrepancy was affecting the autoscaler which was fix by https://github.com/kubernetes/autoscaler/commit/e8b3c2a111eb9b3b4fe15b4d4081175267ff6d76#diff-dfab69cae1cc7f7d024593ed57d7371a. I think we can assume the deviation as inherent to the cloud provider and close this, please reopen if relevant.