Bug 2001856
Summary: | Repeating event: MissingVersion no image found for operand pod | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> | |
Component: | kube-apiserver | Assignee: | Antonio Ojea <aojeagar> | |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.9 | CC: | akashem, aojeagar, aos-bugs, ercohen, kewang, mfojtik, rfreiman, wlewis, xxia | |
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2003538 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:08:18 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2003538, 2003540 |
Description
Stephen Benjamin
2021-09-07 10:59:43 UTC
this seems related https://github.com/openshift/library-go/pull/1049, doesn't it? Looks related, but that fix is already vendored in cluster-kube-apiserver-operator. In the job I linked in comment #0, bootstrap finished by 6:23, but we still had missing image events until 07:16:32. Based on a conversation with sttts, my suggestion it was https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199 is incorrect since that's a different controller. Other controllers also had missing image events, including during the bootstrap. kube-apiserver just had the most. Search message for operand in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1435120717145313280/artifacts/e2e-aws-single-node-serial/gather-must-gather/artifacts/event-filter.html. Happens in the SNO upgrade test as well: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node/1435687668171149312 still missing the revendoring dance, checking one occurrence I can see events from: kube-controller-manager-operator kube-scheduler-operator-container kube-apiserver-operator etcd-operator Verification as below, To check when does the PR fix land to the payload, $ git clone https://github.com/openshift/cluster-kube-apiserver-operator $ cd cluster-kube-apiserver-operator $ git pull $ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep cluster-kube-apiserver-operator cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator 405ff13f18da49548dd409a0faba992cb4782961 $ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 ' 17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request #1228 from aojea/librarygo_bump We can see the PR fix was landed to the OCP payload on Sep 13, that means we shouldn't see repeating events from CI tests after Sep 13th. For kube-apiserver-operator, still found repeating events in the past two days, they were tested with 4.10.0-0.ci-2021-09-16-194109. $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4%5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator' event happened 21 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod event happened 29 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod So I think this bug was not fixed, assign back. (In reply to Ke Wang from comment #7) > Verification as below, > > To check when does the PR fix land to the payload, > $ git clone https://github.com/openshift/cluster-kube-apiserver-operator > $ cd cluster-kube-apiserver-operator > $ git pull > > $ oc adm release info --commits > registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep > cluster-kube-apiserver-operator > cluster-kube-apiserver-operator > https://github.com/openshift/cluster-kube-apiserver-operator > 405ff13f18da49548dd409a0faba992cb4782961 > > $ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 ' > 17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request > #1228 from aojea/librarygo_bump > > We can see the PR fix was landed to the OCP payload on Sep 13, that means we > shouldn't see repeating events from CI tests after Sep 13th. > > For kube-apiserver-operator, still found repeating events in the past two > days, they were tested with 4.10.0-0.ci-2021-09-16-194109. > > $ w3m -dump -cols 200 > 'https://search.ci.openshift.org/ > ?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4 > %5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep > 'openshift-kube-apiserver-operator' > event happened 21 times, something is wrong: > ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - > reason/MissingVersion no image found for operand pod > event happened 29 times, something is wrong: > ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - > reason/MissingVersion no image found for operand pod > > So I think this bug was not fixed, assign back. That jobs that you link are upgrade jobs from 4.9 :), the fix wasn't backported yet Hi aojeagar, thank you for your reply, that means I just check the 4.10 relevant CI jobs, right? If so, I'll leave it for a few days and then check it again without upgrade CI jobs, the upgrade jobs from 4.9 will check with 4.9 bug. right, we can see here that in non-upgrade jobs it stopped 7 days ago you can see that it is still happening in 4.9, but not in 4.10 https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job The backport is here https://bugzilla.redhat.com/show_bug.cgi?id=2003540 @kewang it seems there were no errors in "non-upgrade" 4.10 jobs in the last 10 days https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job aojeagar, I also checked again, just like what you said the bug was fixed on 4.10. Please change the bug status to ON_QA, I will add some comments and move it VERIFIED. $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator' No results found. So the bug was fixed, move the it VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |