Bug 2001856 - Repeating event: MissingVersion no image found for operand pod
Summary: Repeating event: MissingVersion no image found for operand pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Antonio Ojea
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 2003538 2003540
TreeView+ depends on / blocked
 
Reported: 2021-09-07 10:59 UTC by Stephen Benjamin
Modified: 2022-03-10 16:08 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2003538 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:08:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 661 0 None Merged Bug 2001856: bump library-go and dependencies 2021-09-21 08:35:06 UTC
Github openshift cluster-kube-apiserver-operator pull 1228 0 None Merged Bug 2001856: bump library-go to latest 2021-09-21 08:35:07 UTC
Github openshift cluster-kube-controller-manager-operator pull 562 0 None Merged Bug 2001856: bump library-go and dependencies 2021-09-21 08:35:08 UTC
Github openshift cluster-kube-scheduler-operator pull 368 0 None Merged Bug 2001856: bump libgo and related deps 2021-09-21 08:35:09 UTC
Github openshift library-go pull 1203 0 None Merged Bug 2001856: Avoid spurious MissingVersion events on the static pod controller 2021-09-21 08:35:10 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:08:42 UTC

Description Stephen Benjamin 2021-09-07 10:59:43 UTC
[sig-arch] events should not repeat pathologically

Since the merge of https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199, we've seen the following repeated events in CI:


1 events happened too frequently

event happened 25 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod


I think it's coming from this: https://github.com/openshift/cluster-kube-apiserver-operator/commit/ea2ec3bb5a8a36b98c987901a12822c34451354f#diff-22001281e3b968448f2558fd87069f7dbe886ce349047d0270433e17ece4372aR56

In the end, it looks like it got the info it wanted:

{
  "lastTransitionTime": "2021-09-07T06:38:10Z",
  "message": "KubeletMinorVersionUpgradeable: Kubelet and API server minor versions are synced.",
  "reason": "AsExpected",
  "status": "True",
  "type": "Upgradeable"
}

Example job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1435120717145313280



Is this a bug, or should we an exception for these events? Thanks!

Comment 1 Antonio Ojea 2021-09-07 15:14:29 UTC
this seems related https://github.com/openshift/library-go/pull/1049, doesn't it?

Comment 2 Stephen Benjamin 2021-09-07 15:45:12 UTC
Looks related, but that fix is already vendored in cluster-kube-apiserver-operator.  In the job I linked in comment #0, bootstrap finished by 6:23, but we still had missing image events until 07:16:32. 

Based on a conversation with sttts, my suggestion it was https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199 is incorrect since that's a different controller. 

Other controllers also had missing image events, including during the bootstrap. kube-apiserver just had the most.

Search message for operand in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1435120717145313280/artifacts/e2e-aws-single-node-serial/gather-must-gather/artifacts/event-filter.html.

Comment 4 Antonio Ojea 2021-09-09 14:13:25 UTC
still missing the revendoring dance, checking one occurrence I can see events from:

kube-controller-manager-operator
kube-scheduler-operator-container
kube-apiserver-operator
etcd-operator

Comment 7 Ke Wang 2021-09-17 11:22:56 UTC
Verification as below, 

To check when does the PR fix land to the payload, 
$ git clone https://github.com/openshift/cluster-kube-apiserver-operator
$ cd cluster-kube-apiserver-operator
$ git pull

$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                405ff13f18da49548dd409a0faba992cb4782961

$ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 '
17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request #1228 from aojea/librarygo_bump

We can see the PR fix was landed to the OCP payload on Sep 13, that means we shouldn't see repeating events from CI tests after Sep 13th.

For kube-apiserver-operator, still found repeating events in the past two days, they were tested with 4.10.0-0.ci-2021-09-16-194109.

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4%5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator'
event happened 21 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod
event happened 29 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod

So I think this bug was not fixed, assign back.

Comment 8 Antonio Ojea 2021-09-17 13:00:17 UTC
(In reply to Ke Wang from comment #7)
> Verification as below, 
> 
> To check when does the PR fix land to the payload, 
> $ git clone https://github.com/openshift/cluster-kube-apiserver-operator
> $ cd cluster-kube-apiserver-operator
> $ git pull
> 
> $ oc adm release info --commits
> registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep
> cluster-kube-apiserver-operator
>   cluster-kube-apiserver-operator               
> https://github.com/openshift/cluster-kube-apiserver-operator               
> 405ff13f18da49548dd409a0faba992cb4782961
> 
> $ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 '
> 17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request
> #1228 from aojea/librarygo_bump
> 
> We can see the PR fix was landed to the OCP payload on Sep 13, that means we
> shouldn't see repeating events from CI tests after Sep 13th.
> 
> For kube-apiserver-operator, still found repeating events in the past two
> days, they were tested with 4.10.0-0.ci-2021-09-16-194109.
> 
> $ w3m -dump -cols 200
> 'https://search.ci.openshift.org/
> ?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4
> %5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep
> 'openshift-kube-apiserver-operator'
> event happened 21 times, something is wrong:
> ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator -
> reason/MissingVersion no image found for operand pod
> event happened 29 times, something is wrong:
> ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator -
> reason/MissingVersion no image found for operand pod
> 
> So I think this bug was not fixed, assign back.


That jobs that you link are upgrade jobs from 4.9 :), the fix wasn't backported yet

Comment 9 Ke Wang 2021-09-18 07:11:51 UTC
Hi aojeagar, thank you for your reply, that means I just check the 4.10 relevant CI jobs, right? If so, I'll leave it for a few days and then check it again without upgrade CI jobs, the upgrade jobs from 4.9 will check with 4.9 bug.

Comment 10 Antonio Ojea 2021-09-20 08:18:27 UTC
right, we can see here that in non-upgrade jobs it stopped 7 days ago
you can see that it is still happening in 4.9, but not in 4.10 https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job

The backport is here https://bugzilla.redhat.com/show_bug.cgi?id=2003540

Comment 11 Antonio Ojea 2021-09-22 13:08:12 UTC
@kewang it seems there were no errors in "non-upgrade" 4.10 jobs in the last 10 days
https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 12 Ke Wang 2021-09-28 01:49:35 UTC
aojeagar, I also checked again, just like what you said the bug was fixed on 4.10. Please change the bug status to ON_QA, I will add some comments and move it VERIFIED.

Comment 13 Ke Wang 2021-09-29 09:34:35 UTC
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator'
No results found. 

So the bug was fixed, move the it VERIFIED.

Comment 16 errata-xmlrpc 2022-03-10 16:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.