Description of problem: cluster-monitoring-operator experiencing very short watch requests on single-node clusters. This leads to a higher watch request count which in turn leads to single-node CI test failure due to the watch count going over the allowed test threshold. From audit logs: 22:04:21 [ WATCH][ 737µs] [200] /api/v1/namespaces/openshift-config/configmaps?allowWatchBookmarks=true&resourceVersion=11283&timeoutSeconds=306&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:04:22 [ WATCH][ 4.727ms] [200] /api/v1/namespaces/openshift-config-managed/configmaps?allowWatchBookmarks=true&resourceVersion=11283&timeoutSeconds=446&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:04:22 [ WATCH][ 1.859ms] [200] /api/v1/namespaces/openshift-monitoring/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=11260&timeoutSeconds=501&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:04:22 [ WATCH][ 4.286ms] [200] /apis/certificates.k8s.io/v1/certificatesigningrequests?allowWatchBookmarks=true&resourceVersion=11280&timeout=6m35s&timeoutSeconds=395&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:04:24 [ WATCH][ 770µs] [200] /api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&resourceVersion=11416&timeoutSeconds=487&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 180.855ms] [200] /api/v1/namespaces/openshift-monitoring/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=13067&timeoutSeconds=378&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 160.594ms] [403] /api/v1/namespaces/openshift-config/configmaps?allowWatchBookmarks=true&resourceVersion=13073&timeoutSeconds=429&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 161.931ms] [200] /api/v1/namespaces/openshift-monitoring/configmaps?allowWatchBookmarks=true&resourceVersion=13263&timeoutSeconds=556&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 143.324ms] [403] /api/v1/namespaces/openshift-monitoring/secrets?allowWatchBookmarks=true&resourceVersion=13266&timeoutSeconds=571&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 165.666ms] [403] /apis/config.openshift.io/v1/apiservers?fieldSelector=metadata.name%3Dcluster&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 160.978ms] [403] /api/v1/namespaces/openshift-monitoring/secrets?allowWatchBookmarks=true&resourceVersion=13266&timeout=8m17s&timeoutSeconds=497&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 166.247ms] [403] /apis/certificates.k8s.io/v1/certificatesigningrequests?allowWatchBookmarks=true&resourceVersion=13054&timeout=7m31s&timeoutSeconds=451&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 154.448ms] [403] /api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&resourceVersion=13075&timeoutSeconds=404&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 210.732ms] [200] /api/v1/namespaces/openshift-user-workload-monitoring/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=12994&timeoutSeconds=306&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] 22:07:45 [ WATCH][ 214.544ms] [200] /api/v1/namespaces/openshift-user-workload-monitoring/configmaps?allowWatchBookmarks=true&resourceVersion=13073&timeoutSeconds=551&watch=true [system:serviceaccount:openshift-monitoring:cluster-monitoring-operator] Version-Release number of selected component (if applicable): 4.11 How reproducible: Happens frequently in single CI: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node/1490861976048373760 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node/1489537207965323264 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node/1489476692840812544 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node/1488157843289804800 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node/1488097891372240896 Steps to Reproduce: 1. Run conformance tests on single node cluster 2. Observe watch count going over the threshold (not always) Actual results: A lot of very short watch requests Expected results: Should not have short watch requests Additional info: PR to increase threshold until this is resolved: https://github.com/openshift/origin/pull/26685 JIRA ticket requesting review for said PR with some discussion about the issue: https://issues.redhat.com/browse/MON-2239 This causes about 44% of CI runs to fail
Once this bz gets resolved, you may want to revisit the threshold increase and lower it back to more strict values - https://github.com/openshift/origin/pull/26685
Follow-up on the first comment [1], you may also want to fix the threshold in the backport [2], once the solution to this gets backported to 4.10 [1] https://bugzilla.redhat.com/show_bug.cgi?id=2056450#c1 [2] https://github.com/openshift/origin/pull/26872
Dear reporter, we greatly appreciate the bug you have reported here. Unfortunately, due to migration to a new issue-tracking system (https://issues.redhat.com/), we cannot continue triaging bugs reported in Bugzilla. Since this bug has been stale for multiple days, we, therefore, decided to close this bug. If you think this is a mistake or this bug has a higher priority or severity as set today, please feel free to reopen this bug and tell us why. We are going to move every re-opened bug to https://issues.redhat.com. Thank you for your patience and understanding.