Bug 1960014 - 429 Too Many Requests from meta-data/local-ipv4
Summary: 429 Too Many Requests from meta-data/local-ipv4
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Matthew Booth
QA Contact: Jon Uriarte
URL:
Whiteboard:
: 1925214 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-12 19:33 UTC by Dan Seals
Modified: 2022-07-25 09:08 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-25 09:08:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker PLTSRHOS-853 0 None None None 2021-11-15 15:04:28 UTC

Description Dan Seals 2021-05-12 19:33:14 UTC
Description of problem:
Master nodes continuously trying to retrieve from the http://169.254.169.254/2009-04-04/meta-data/local-ipv4 resulting in "429 Too Many Requests"

Counted over 78k requests to that endpoint from the must-gather

Results in nodes unable to obtain IP from local-ipv4 endpoint


DESCRIPTION FROM CASE:
From master-0
etcd logs
raft2021/05/04 14:54:12 INFO: bb291d9b944ac65e received MsgVoteResp from bb291d9b944ac65e at term 23656
raft2021/05/04 14:54:12 INFO: bb291d9b944ac65e [logterm: 19767, index: 483489970] sent MsgVote request to a09f1ec97a6209a0 at term 23656
raft2021/05/04 14:54:12 INFO: bb291d9b944ac65e [logterm: 19767, index: 483489970] sent MsgVote request to f7e02227b720de5b at term 23656
2021-05-04 14:54:13.818569 W | rafthttp: health check for peer f7e02227b720de5b could not connect: dial tcp 10.4.0.16:2380: connect: connection refused
2021-05-04 14:54:13.818624 W | rafthttp: health check for peer a09f1ec97a6209a0 could not connect: dial tcp 10.4.0.22:2380: connect: connection refused
2021-05-04 14:54:13.818628 W | rafthttp: health check for peer f7e02227b720de5b could not connect: dial tcp 10.4.0.16:2380: connect: connection refused
2021-05-04 14:54:13.818657 W | rafthttp: health check for peer a09f1ec97a6209a0 could not connect: dial tcp 10.4.0.22:2380: connect: connection refused
raft2021/05/04 14:54:13 INFO: bb291d9b944ac65e is starting a new election at term 23656


From master-1 or master-2
etcd pod information/logs
[root@ocp-kqhbj-master-1 manifests]# crictl ps -a | grep etcd
506b2841f6b2a       a0124870edb66a4628e0d755c3b5c9d5a4a28f77049e709403471c73d89de6ae   44 seconds ago       Exited              etcd-ensure-env-vars                          44                  5878b14785de0

[root@ocp-kqhbj-master-1 manifests]# crictl logs 506b2841f6b2a
Expected node IP to be  got 10.4.0.16

[root@ocp-kqhbj-master-1 manifests]# crictl inspect f4290c4e32c3c | grep "NODE_IP="
          "NODE_IP=",


Seems that {"name":"NODE_IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}} in /etc/kubernetes/manifests/etcd-pod.yaml is not returning any value.  Hard-coding the correct IP brings etcd back to life.



Version-Release number of selected component (if applicable):
OPenshift 4.7 on OpenStack



Additional info:
master-1 and -2 return the following when attempting to connect to this endpoint.

$ curl http://169.254.169.254/2009-04-04/meta-data/local-ipv4
<html><body><h1>429 Too Many Requests</h1>
You have sent too many requests in a given amount of time.
</body></html>


master-0 returns its IP as expected.


The kubelet on the 2 of the masters contain the errors:
May 05 10:45:22.986601 ocp-kqhbj-master-2 hyperkube[2296]: I0505 10:45:22.986548    2296 cloud_request_manager.go:115] Node addresses from cloud provider for node "ocp-kqhbj-master-2" not collected: unexpected status 
code when reading instance address from http://169.254.169.254/2009-04-04/meta-data/local-ipv4: 429 Too Many Requests

May 05 10:45:31.444336 ocp-kqhbj-master-1 hyperkube[181517]: I0505 10:45:31.444302  181517 cloud_request_manager.go:115] Node addresses from cloud provider for node "ocp-kqhbj-master-1" not collected: unexpected status code when reading instance address from http://169.254.169.254/2009-04-04/meta-data/local-ipv4: 429 Too Many Requests

May 05 10:45:28.684861 ocp-kqhbj-master-1 hyperkube[181517]: E0505 10:45:28.684825  181517 kubelet_node_status.go:586] Failed to set some node status fields: failed to get node address from cloud provider: unexpected status code when reading instance address from http://169.254.169.254/2009-04-04/meta-data/local-ipv4: 429 Too Many Requests

Comment 5 Mike Fedosin 2021-05-18 12:35:44 UTC
I suggest to use this PR as a fix: https://github.com/openshift/installer/pull/2734. When we enable config drive, it will read data from there avoiding metadata service

Comment 6 Matthew Booth 2021-05-18 12:51:29 UTC
(In reply to Mike Fedosin from comment #5)
> I suggest to use this PR as a fix:
> https://github.com/openshift/installer/pull/2734. When we enable config
> drive, it will read data from there avoiding metadata service

Unfortunately this won't work, as we don't use config drive to fetch node addresses anyway. We would fix this in some other way, likely involving caching.

Comment 7 Matthew Booth 2021-05-18 15:58:25 UTC
This is almost certainly related to https://bugzilla.redhat.com/show_bug.cgi?id=1952645, although I'm yet to prove this definitively.

In this change we rate limited nova metadata requests by servers to 2 specific urls to 20 requests every 60 minutes. Requests above that limit will return 429. This will result in a lot of noise in the logs, but as long as kubelet has successfully fetched this data at least once since starting it's just noise.

The immediate cause here seems to be that the etcd static pod starts, whatever mechanism resolves `status.podIP` for it using the downward api does not get a value. Unfortunately I don't yet fully understand what that mechanism is, but I am trying to find out.

I believe the issue will most likely relates to restarting a node. Presumably at the point that a node restarts it has already consumed its allocation of 20 metadata requests for the hour, and then has to wait up to an hour before there is another successful request. This means that kubelet won't populate node addresses for up to an hour after restart. I suspect therefore that status.podIP for a static pod using host networking is resolved by kubelet using node addresses, and that at the time etcd is started it has no value. I also suspect that kubelet doesn't have any mechanism to notice that it now has node addresses and restart any affected pods. This is supposition on my part, though.

Bug 1952645 is an important fix to reduce load on PSI's nova metadata service. I think we need to understand 2 things better before we can come up with a fix:

* A more precise understanding of the metadata request rate of kubelet during boot (i.e. do we need to account for bursting at all?).
* A precise understanding of exactly how NODE_ID comes to be empty in the static etcd pod so we can ensure we provide node addresses before it fails.

Comment 8 Matthew Booth 2021-05-18 16:04:37 UTC
Etcd static pod is defined here: https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml

Comment 9 Joel Speed 2021-06-09 10:30:10 UTC
Have discussed with Matt, not planning to get anything into 4.8 as a direct result of this bug

Comment 11 Yaniv Kaul 2021-07-20 14:54:27 UTC
What's the next step here with this BZ?

Comment 21 Matthew Booth 2021-11-24 16:53:22 UTC
*** Bug 1925214 has been marked as a duplicate of this bug. ***

Comment 22 ShiftStack Bugwatcher 2021-11-25 16:11:43 UTC
Removing the Triaged keyword because:
* the priority assessment is missing
* the target release value is missing

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 24 Pierre Prinetti 2022-01-10 15:25:59 UTC
FYI as a response to Bug 2033953, we are instructing machine-config-operator to read metadata from the config-drive and only fetch the metadata server as a fallback. This change may or may not interfere with this very bug.

Comment 25 Matthew Booth 2022-01-11 21:52:42 UTC
(In reply to Pierre Prinetti from comment #24)
> FYI as a response to Bug 2033953, we are instructing machine-config-operator
> to read metadata from the config-drive and only fetch the metadata server as
> a fallback. This change may or may not interfere with this very bug.

It won't unfortunately. The offending node addresses check is hard-coded to use the metadata service.

Comment 30 Noam Manos 2022-04-27 09:59:50 UTC
Seeing similar issue ("failed to get node address from cloud provider") on OCP version 4.9.0 as well.

Though I'm not sure if that could be the root cause for one of the master nodes (master-1) in our PSI project, to become "NotReady":

$ oc get nodes
NAME                               STATUS     ROLES    AGE   VERSION
default-cl2-dblg8-master-0         Ready      master   21d   v1.22.0-rc.0+894a78b
default-cl2-dblg8-master-1         NotReady   master   21d   v1.22.0-rc.0+894a78b
default-cl2-dblg8-master-2         Ready      master   21d   v1.22.0-rc.0+894a78b
default-cl2-dblg8-worker-0-jslbw   Ready      worker   21d   v1.22.0-rc.0+894a78b
default-cl2-dblg8-worker-0-lb96m   Ready      worker   21d   v1.22.0-rc.0+894a78b

# In OCP events I see that "openshift-kube-scheduler" pod failed on CreateContainerError:

openshift-cluster-kube-scheduler-operator-status-controller-statussyncer_kube-scheduler   Normal    OperatorStatusChanged                         Status for clusteroperator/kube-scheduler changed: Degraded message changed from "NodeControllerDegraded: The master nodes not ready: node \"default-cl2-dblg8-master-1\" not ready since 2022-04-26 11:53:03 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)\nStaticPodsDegraded: pod/openshift-kube-scheduler-default-cl2-dblg8-master-2 container \"kube-scheduler\" is waiting: CreateContainerError: Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving ctr name k8s_kube-scheduler_openshift-kube-scheduler-default-cl2-dblg8-master-2_openshift-kube-scheduler_872725b2-fd09-4cad-bcdd-670f52f74a84_10 for id 33c634a391d51c51dbed7f3d29157d4e502f7428f371ce07eb95adbb82c05ff8: name is reserved" to "NodeControllerDegraded: The master nodes not ready: node \"default-cl2-dblg8-master-1\" not ready since 2022-04-26 11:53:03 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)\nStaticPodsDegraded: pod/openshift-kube-scheduler-default-cl2-dblg8-master-2 container \"kube-scheduler\" is waiting: CreateContainerError: context deadline exceeded"


# Looking at the journal logs inside the "master-2" node (where the kube-scheduler failed):

default-cl2-dblg8-master-2 hyperkube[1890]: E0426 11:52:09.524563    1890 pod_workers.go:820] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CreateContainerError: \"context deadline exceeded\"" pod="openshift-kube-scheduler/openshift-kube-scheduler-default-cl2-dblg8-master-2" podUID=872725b2-fd09-4cad-bcdd-670f52f74a84

default-cl2-dblg8-master-2 hyperkube[1890]: E0426 11:52:11.419265    1890 kubelet_node_status.go:613] "Failed to set some node status fields" err="failed to get node address from cloud provider: unexpected status code when reading instance address from http://169.254.169.254/2009-04-04/meta-data/local-ipv4: 429 Too Many Requests" node="default-cl2-dblg8-master-2"


# Attaching Journal log of "master-2" node.

# Note:
A soft reboot to the PSI instance of "master-1" node fixes the status, but only for a while (after few days of usages, it becomes unready again).

Comment 33 Matthew Booth 2022-07-25 09:08:50 UTC
This problem finally goes away in 4.12. As nobody is currently complaining about it actively I'm going to close this.


Note You need to log in before you can comment on or make changes to this bug.