Bug 2002776
Summary: | Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured - fail due to ErrImagePull | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eran Cohen <ercohen> |
Component: | Monitoring | Assignee: | Sunil Thaha <sthaha> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, erooth, spasquie, sthaha, wking |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-11 02:13:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eran Cohen
2021-09-09 17:05:13 UTC
Eran, this looks like an issue with cri-o or with the image-registry service. Can you check with these teams and get their opinion? From a monitoring standpoint, there's not much we can do unfortunately. Looking at the image-registry logs [0] from a failed run [1]: ** the first cri-o request to the image-registry is rejected because it requires authentication ** time="2021-09-05T20:09:20.174939122Z" level=warning msg="error authorizing context: authorization header required" go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=eff50b10-9940-44c9-af85-38b2910b57d6 http.request.method=GET http.request.remoteaddr="10.128.0.1:47562" http.request.uri=/v2/ http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" time="2021-09-05T20:09:20.175016628Z" level=info msg=response go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=5448d595-1a35-44ce-94f9-501ed60c6b60 http.request.method=GET http.request.remoteaddr="10.128.0.1:47562" http.request.uri=/v2/ http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" http.response.contenttype="application/json; charset=utf-8" http.response.duration=1.3ms http.response.status=401 http.response.written=87 ** then cri-o authenticates successfully ** time="2021-09-05T20:09:20.196286265Z" level=info msg="response completed" go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=67359636-e1a2-47d4-8dee-2a0970a5ad89 http.request.method=GET http.request.remoteaddr="10.128.0.1:47564" http.request.uri="/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Ftools%3Apull" http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" http.response.contenttype=application/json http.response.duration=16.265351ms http.response.status=200 http.response.written=2609 time="2021-09-05T20:09:20.196349469Z" level=info msg=response go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=11d09ec7-4170-47e3-9153-0dcf67aff829 http.request.method=GET http.request.remoteaddr="10.128.0.1:47564" http.request.uri="/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Ftools%3Apull" http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" http.response.contenttype=application/json http.response.duration=16.368559ms http.response.status=200 http.response.written=2609 ** but the request from cri-o to pull the image manifest gets a 404 response ** time="2021-09-05T20:09:20.207282411Z" level=info msg="authorized request" go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=860e25a8-f3ea-4ec1-b49b-fad4237dfbd6 http.request.method=GET http.request.remoteaddr="10.128.0.1:47566" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" openshift.auth.user="system:serviceaccount:e2e-test-prometheus-pvjr9:default" vars.name=openshift/tools vars.reference=latest time="2021-09-05T20:09:20.215149716Z" level=error msg="response completed with error" err.code="manifest unknown" err.detail="unknown tag=latest" err.message="manifest unknown" go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=860e25a8-f3ea-4ec1-b49b-fad4237dfbd6 http.request.method=GET http.request.remoteaddr="10.128.0.1:47566" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" http.response.contenttype="application/json; charset=utf-8" http.response.duration=14.446912ms http.response.status=404 http.response.written=96 openshift.auth.user="system:serviceaccount:e2e-test-prometheus-pvjr9:default" vars.name=openshift/tools vars.reference=latest time="2021-09-05T20:09:20.215231422Z" level=info msg=response go.version=go1.16.6 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=6d16989c-3257-474d-8141-a806ef63cae2 http.request.method=GET http.request.remoteaddr="10.128.0.1:47566" http.request.uri=/v2/openshift/tools/manifests/latest http.request.useragent="cri-o/1.22.0-68.rhaos4.9.git011c10a.el8 go/go1.16.6 os/linux arch/amd64" http.response.contenttype="application/json; charset=utf-8" http.response.duration=14.56172ms http.response.status=404 http.response.written=96 [0] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node/1434598504415629312/artifacts/e2e-azure-upgrade-single-node/gather-extra/artifacts/pods/openshift-image-registry_image-registry-6d4df58c7-9d5nq_registry.log [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-upgrade-single-node/1434598504415629312 @spasquie from the imagestreams.yaml (in the must-gather logs) it seems that the image wasn't there at the time: cat openshift/image.openshift.io/imagestreams.yaml | egrep "dockerImageRepository|created" | tail -11 dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/sso74-openshift-rhel8 - created: "2021-09-05T19:59:33Z" - created: "2021-09-05T19:59:33Z" dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/tests - created: "2021-09-05T20:13:15Z" dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/tools - created: "2021-09-05T20:20:45Z" dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/ubi8-openjdk-11 - created: "2021-09-05T19:59:33Z" dockerImageRepository: image-registry.openshift-image-registry.svc:5000/openshift/ubi8-openjdk-8 - created: "2021-09-05T19:59:32Z" For some reason, the image the test is trying to pull (image-registry.openshift-image-registry.svc:5000/openshift/tools) got created 20 minutes after all other images (8 minutes after the test pull attempt). Any idea what might cause it? As you can see in comment #3 the image get created 20 minutes later. I'm a bit lost here about what create the image and why it shows up 20 minutes after all other images. @wking perhaps you have an idea what's going wrong here? |