Bug 1964590

Summary: rook operator pod log flooded with "the server has received too many requests and has asked us to try again later"
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Neha Berry <nberry>
Component: rookAssignee: Santosh Pillai <sapillai>
Status: CLOSED WORKSFORME QA Contact: Elad <ebenahar>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.8CC: madam, muagarwa, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-15 18:24:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neha Berry 2021-05-25 18:42:13 UTC
Description of problem (please be detailed as possible and provide log
snippests):
==================================================================
In OCS 4.8(Internal, internal-attached and external modes) , rook logs contain a continuous flow of following messages, which though harmless, maybe giving incorrect impression on the server being busy(since we saw in all setups, its unlikely that the server was really busy)

Also, the messages are for CephClient" and CephNFS amongst others. Full log shared in [1]

>>Failed to watch messages

2021-05-24T10:17:50.913472560Z E0524 10:17:50.913389       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephObjectStore: the server has received too many requests and has asked us to try again later (get cephobjectstores.ceph.rook.io)

2021-05-24T10:17:51.824687354Z E0524 10:17:51.824612       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephFilesystem: the server has received too many requests and has asked us to try again later (get cephfilesystems.ceph.rook.io)

2021-05-24T10:17:52.183190146Z E0524 10:17:52.183135       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephClient: the server has received too many requests and has asked us to try again later (get cephclients.ceph.rook.io)

2021-05-24T10:17:52.263015707Z E0524 10:17:52.262953       8 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephBlockPool: the server has received too many requests and has asked us to try again later (get cephblockpools.ceph.rook.io)

>> PodDisruptionBudget is deprecated in v1.21+,

2021-05-24T10:25:17.888693188Z W0524 10:25:17.888539       8 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget

We didn't see similar messages in 4.7 logs [2]. What has changed in 4.8 to bring out these log messages? 




Version of all relevant components (if applicable):
=======================================================
OCS = ocs-operator.v4.8.0-402.ci
OCP  = 4.8.0-0.nightly-2021-05-21-233425

ceph version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
===========================================================================
No

Is there any workaround available to the best of your knowledge?
==================================================================
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
===============================================================================
2

Can this issue reproducible?
==========================
Yes

Can this issue reproduce from the UI?
==========================================
N/A

If this is a regression, please provide more details to justify this:
======================================================================
Not sur eif this is a code change for a feature of 4.8 or a regression

Steps to Reproduce:
====================
1. Install OCS 4.8 in either internal or external mode
2. Check the rook operator logs
3. 


Actual results:
==================
Logs flooded with "Failed to watch " messages ,e.g.

2021-05-21T11:35:24.987653122Z E0521 11:35:24.982904       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephObjectZoneGroup: the server has received too many requests and has asked us to try again later (get cephobjectzonegroups.ceph.rook.io)
...
...

and

2021-05-24T10:25:17.888693188Z W0524 10:25:17.888539       8 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget


Expected results:
========================
These messages can be misleading and needs to be hidden or added with proper reason for showing up

Additional info:
====================

4.8 internal mode logs: [1] - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-may24/jijoy-may24_20210524T090226/logs/testcases_1621854889/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-8ac2107296badc8a75cbb0d357470419528088c39474df858f62daaa920f0279/namespaces/openshift-storage/pods/rook-ceph-operator-77bd5678b9-9qkpp/rook-ceph-operator/rook-ceph-operator/logs/current.log

4.8 external mode logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j014vu1ce33-t4an/j014vu1ce33-t4an_20210521T103434/logs/failed_testcase_ocs_logs_1621594049/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-8ac2107296badc8a75cbb0d357470419528088c39474df858f62daaa920f0279/namespaces/openshift-storage/pods/rook-ceph-operator-77bd5678b9-h7kz6/rook-ceph-operator/rook-ceph-operator/logs/current.log



4.7 logs to compare: [2] - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz1952848/ocs-mustgather/must-gather.local.4626932914359587070/quay-io-rhceph-dev-ocs-must-gather-sha256-50bf6568d9e2fb4c16c616505e025b63a6ee49f401d57552e7dbe6db541a4e4e/namespaces/openshift-storage/pods/rook-ceph-operator-589cb4c84b-cl8rh/rook-ceph-operator/rook-ceph-operator/logs/current.log

Comment 2 Travis Nielsen 2021-05-25 19:11:03 UTC
The PDB deprecation message is tracked upstream with https://github.com/rook/rook/issues/7917.

Not sure what we can do about the other messages for "the server has received too many requests and has asked us to try again later", this is coming from the client-go library.

Comment 3 Mudit Agarwal 2021-06-03 13:05:21 UTC
Moving this to 4.9 as the Upstream PR is still in draft, and this does not look critical enough to me.
Please retarget if someone thinks otherwise.