Bug 1956502 - hugepages not being allocated on one specific node
Summary: hugepages not being allocated on one specific node
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.8.0
Assignee: Harshal Patil
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 19:36 UTC by Daniel Del Ciancio
Modified: 2021-05-06 08:42 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Daniel Del Ciancio 2021-05-03 19:36:44 UTC
Description of problem:



Version-Release number of selected component (if applicable):
4.5

How reproducible:

Hugepages are being configured and applied at boot time via a MachineConfig and it works well for all of the node:


apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-hp
  name: 05-worker-hp-kernelarg-hugepage-1g
spec:
  config:
    ignition:
      version: 2.2.0
  kernelArguments:
    - default_hugepagesz=1G
    - hugepagesz=1G
    - hugepages=40


But for some reason, one specific node (infra3) always reports error and the kubelet is not able to register. This is the error it generates while registering:


Apr 19 15:24:52 infra3 hyperkube[704805]: E0419 15:24:52.920376  704805 kubelet_node_status.go:402] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"NetworkUnavailable\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"allocatable\":{\"hugepages-1Gi\":\"40Gi\",\"memory\":\"549858388Ki\"},\"capacity\":{\"hugepages-1Gi\":\"40Gi\"},\"conditions\":[{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"Ready\"}]}}" for node "infra3": Node "infra3" is invalid: [status.capacity.hugepages-1Gi: Invalid value: resource.Quantity{i:resource.int64Amount{value:42949672960, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.openshift.io/mellnic1: Invalid value: resource.Quantity{i:resource.int64Amount{value:8, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"8", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:608340889600, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"594082900Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-1Gi: Invalid value: resource.Quantity{i:resource.int64Amount{value:42949672960, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.ephemeral-storage: Invalid value: resource.Quantity{i:resource.int64Amount{value:214433168642, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"214433168642", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/mellnic0: Invalid value: resource.Quantity{i:resource.int64Amount{value:16, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"16", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.pods: Invalid value: resource.Quantity{i:resource.int64Amount{value:250, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"250", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.devices.kubevirt.io/tun: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/mellnic2: Invalid value: resource.Quantity{i:resource.int64Amount{value:8, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"8", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/mellnic1: Invalid value: resource.Quantity{i:resource.int64Amount{value:8, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"8", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:563054989312, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"549858388Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.cpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:68, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"68", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.devices.kubevirt.io/kvm: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.devices.kubevirt.io/vhost-net: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]
Apr 19 15:24:52 infra3 hyperkube[704805]: E0419 15:24:52.943169  704805 kubelet_node_status.go:402] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"NetworkUnavailable\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"allocatable\":{\"hugepages-1Gi\":\"40Gi\",\"memory\":\"549858388Ki\"},\"capacity\":{\"hugepages-1Gi\":\"40Gi\"},\"conditions\":[{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2021-04-19T19:24:52Z\",\"type\":\"Ready\"}]}}" for node "infra3": Node "infra3" is invalid: [status.capacity.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:83886080, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.openshift.io/workerens3f1: Invalid value: resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.ephemeral-storage: Invalid value: resource.Quantity{i:resource.int64Amount{value:239452123136, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"233839964Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.devices.kubevirt.io/kvm: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.devices.kubevirt.io/tun: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.memory: Invalid value: resource.Quantity{i:resource.int64Amount{value:608340889600, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"594082900Ki", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.cpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:72, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"72", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.openshift.io/mellnic1: Invalid value: resource.Quantity{i:resource.int64Amount{value:8, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"8", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:83886080, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/mellnic1: Invalid value: resource.Quantity{i:resource.int64Amount{value:8, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"8", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/workerens3f0: Invalid value: resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.openshift.io/workerens3f1: Invalid value: resource.Quantity{i:resource.int64Amount{value:0, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"0", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]


After investigating, I can see it actually does not properly configure the node:


# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
0
# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 
0
# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 
0
# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 
0


Hugepage status:


[root@infra3 ~]# cat /proc/meminfo | grep Huge
AnonHugePages:     63488 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

[root@infra3 ~]# cat /sys/devices/system/node/node*/meminfo | fgrep Huge
Node 0 AnonHugePages:      6144 kB
Node 0 ShmemHugePages:        0 kB
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 AnonHugePages:         0 kB
Node 1 ShmemHugePages:        0 kB
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 AnonHugePages:     14336 kB
Node 2 ShmemHugePages:        0 kB
Node 2 HugePages_Total:     0
Node 2 HugePages_Free:      0
Node 2 HugePages_Surp:      0
Node 3 AnonHugePages:     43008 kB
Node 3 ShmemHugePages:        0 kB
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0


This is the current kernel arguments:


$ignition_firstboot rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal rd.luks.options=discard selinux=0 default_hugepagesz=1G hugepages=40


But, if I manually set the following parameters and restart the kubelet, it shows the correct hugepage and the node registers properly:


echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
echo 10 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
echo 10 > /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages
echo 10 > /sys/devices/system/node/node3/hugepages/hugepages-1048576kB/nr_hugepages



Steps to Reproduce:
1.  (see steps above)
2.
3.

Actual results:

Hugepages not allocated for the node.


Expected results:

Hugepages should be allocated for the node.


Additional info:

Comment 1 Yu Qi Zhang 2021-05-04 00:31:53 UTC
Moving over to the node team to take a look at the hugepages error


Note You need to log in before you can comment on or make changes to this bug.