1964590 – rook operator pod log flooded with "the server has received too many requests and has asked us to try again later"

Bug 1964590 - rook operator pod log flooded with "the server has received too many requests and has asked us to try again later"

Summary: rook operator pod log flooded with "the server has received too many requests...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Santosh Pillai
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-25 18:42 UTC by Neha Berry
Modified:	2023-08-09 17:03 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-15 18:24:39 UTC
Embargoed:

Attachments	(Terms of Use)

Description Neha Berry 2021-05-25 18:42:13 UTC

Description of problem (please be detailed as possible and provide log
snippests):
==================================================================
In OCS 4.8(Internal, internal-attached and external modes) , rook logs contain a continuous flow of following messages, which though harmless, maybe giving incorrect impression on the server being busy(since we saw in all setups, its unlikely that the server was really busy)

Also, the messages are for CephClient" and CephNFS amongst others. Full log shared in [1]

>>Failed to watch messages

2021-05-24T10:17:50.913472560Z E0524 10:17:50.913389       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephObjectStore: the server has received too many requests and has asked us to try again later (get cephobjectstores.ceph.rook.io)

2021-05-24T10:17:51.824687354Z E0524 10:17:51.824612       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephFilesystem: the server has received too many requests and has asked us to try again later (get cephfilesystems.ceph.rook.io)

2021-05-24T10:17:52.183190146Z E0524 10:17:52.183135       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephClient: the server has received too many requests and has asked us to try again later (get cephclients.ceph.rook.io)

2021-05-24T10:17:52.263015707Z E0524 10:17:52.262953       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephBlockPool: the server has received too many requests and has asked us to try again later (get cephblockpools.ceph.rook.io)

>> PodDisruptionBudget is deprecated in v1.21+,

2021-05-24T10:25:17.888693188Z W0524 10:25:17.888539       8 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget

We didn't see similar messages in 4.7 logs [2]. What has changed in 4.8 to bring out these log messages? 




Version of all relevant components (if applicable):
=======================================================
OCS = ocs-operator.v4.8.0-402.ci
OCP  = 4.8.0-0.nightly-2021-05-21-233425

ceph version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
===========================================================================
No

Is there any workaround available to the best of your knowledge?
==================================================================
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
===============================================================================
2

Can this issue reproducible?
==========================
Yes

Can this issue reproduce from the UI?
==========================================
N/A

If this is a regression, please provide more details to justify this:
======================================================================
Not sur eif this is a code change for a feature of 4.8 or a regression

Steps to Reproduce:
====================
1. Install OCS 4.8 in either internal or external mode
2. Check the rook operator logs
3. 


Actual results:
==================
Logs flooded with "Failed to watch " messages ,e.g.

2021-05-21T11:35:24.987653122Z E0521 11:35:24.982904       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephObjectZoneGroup: the server has received too many requests and has asked us to try again later (get cephobjectzonegroups.ceph.rook.io)
...
...

and

2021-05-24T10:25:17.888693188Z W0524 10:25:17.888539       8 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget


Expected results:
========================
These messages can be misleading and needs to be hidden or added with proper reason for showing up

Additional info:
====================

4.8 internal mode logs: [1] - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-may24/jijoy-may24_20210524T090226/logs/testcases_1621854889/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-8ac2107296badc8a75cbb0d357470419528088c39474df858f62daaa920f0279/namespaces/openshift-storage/pods/rook-ceph-operator-77bd5678b9-9qkpp/rook-ceph-operator/rook-ceph-operator/logs/current.log

4.8 external mode logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j014vu1ce33-t4an/j014vu1ce33-t4an_20210521T103434/logs/failed_testcase_ocs_logs_1621594049/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-8ac2107296badc8a75cbb0d357470419528088c39474df858f62daaa920f0279/namespaces/openshift-storage/pods/rook-ceph-operator-77bd5678b9-h7kz6/rook-ceph-operator/rook-ceph-operator/logs/current.log



4.7 logs to compare: [2] - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz1952848/ocs-mustgather/must-gather.local.4626932914359587070/quay-io-rhceph-dev-ocs-must-gather-sha256-50bf6568d9e2fb4c16c616505e025b63a6ee49f401d57552e7dbe6db541a4e4e/namespaces/openshift-storage/pods/rook-ceph-operator-589cb4c84b-cl8rh/rook-ceph-operator/rook-ceph-operator/logs/current.log

Comment 2 Travis Nielsen 2021-05-25 19:11:03 UTC

The PDB deprecation message is tracked upstream with https://github.com/rook/rook/issues/7917.

Not sure what we can do about the other messages for "the server has received too many requests and has asked us to try again later", this is coming from the client-go library.

Comment 3 Mudit Agarwal 2021-06-03 13:05:21 UTC

Moving this to 4.9 as the Upstream PR is still in draft, and this does not look critical enough to me.
Please retarget if someone thinks otherwise.

Note You need to log in before you can comment on or make changes to this bug.