2001856 – Repeating event: MissingVersion no image found for operand pod

Bug 2001856 - Repeating event: MissingVersion no image found for operand pod

Summary: Repeating event: MissingVersion no image found for operand pod

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Antonio Ojea
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2003538 2003540
TreeView+	depends on / blocked

Reported:	2021-09-07 10:59 UTC by Stephen Benjamin
Modified:	2022-03-10 16:08 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2003538 (view as bug list)
Environment:
Last Closed:	2022-03-10 16:08:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 661	None	Merged	Bug 2001856: bump library-go and dependencies	2021-09-21 08:35:06 UTC
Github	openshift cluster-kube-apiserver-operator pull 1228	None	Merged	Bug 2001856: bump library-go to latest	2021-09-21 08:35:07 UTC
Github	openshift cluster-kube-controller-manager-operator pull 562	None	Merged	Bug 2001856: bump library-go and dependencies	2021-09-21 08:35:08 UTC
Github	openshift cluster-kube-scheduler-operator pull 368	None	Merged	Bug 2001856: bump libgo and related deps	2021-09-21 08:35:09 UTC
Github	openshift library-go pull 1203	None	Merged	Bug 2001856: Avoid spurious MissingVersion events on the static pod controller	2021-09-21 08:35:10 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:08:42 UTC

Description Stephen Benjamin 2021-09-07 10:59:43 UTC

[sig-arch] events should not repeat pathologically

Since the merge of https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199, we've seen the following repeated events in CI:


1 events happened too frequently

event happened 25 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod


I think it's coming from this: https://github.com/openshift/cluster-kube-apiserver-operator/commit/ea2ec3bb5a8a36b98c987901a12822c34451354f#diff-22001281e3b968448f2558fd87069f7dbe886ce349047d0270433e17ece4372aR56

In the end, it looks like it got the info it wanted:

{
  "lastTransitionTime": "2021-09-07T06:38:10Z",
  "message": "KubeletMinorVersionUpgradeable: Kubelet and API server minor versions are synced.",
  "reason": "AsExpected",
  "status": "True",
  "type": "Upgradeable"
}

Example job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1435120717145313280



Is this a bug, or should we an exception for these events? Thanks!

Comment 1 Antonio Ojea 2021-09-07 15:14:29 UTC

this seems related https://github.com/openshift/library-go/pull/1049, doesn't it?

Comment 2 Stephen Benjamin 2021-09-07 15:45:12 UTC

Looks related, but that fix is already vendored in cluster-kube-apiserver-operator.  In the job I linked in comment #0, bootstrap finished by 6:23, but we still had missing image events until 07:16:32. 

Based on a conversation with sttts, my suggestion it was https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199 is incorrect since that's a different controller. 

Other controllers also had missing image events, including during the bootstrap. kube-apiserver just had the most.

Search message for operand in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-single-node-serial/1435120717145313280/artifacts/e2e-aws-single-node-serial/gather-must-gather/artifacts/event-filter.html.

Comment 3 Eran Cohen 2021-09-09 07:57:17 UTC

Happens in the SNO upgrade test as well:
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-upgrade-single-node/1435687668171149312

Comment 4 Antonio Ojea 2021-09-09 14:13:25 UTC

still missing the revendoring dance, checking one occurrence I can see events from:

kube-controller-manager-operator
kube-scheduler-operator-container
kube-apiserver-operator
etcd-operator

Comment 7 Ke Wang 2021-09-17 11:22:56 UTC

Verification as below, 

To check when does the PR fix land to the payload, 
$ git clone https://github.com/openshift/cluster-kube-apiserver-operator
$ cd cluster-kube-apiserver-operator
$ git pull

$ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                405ff13f18da49548dd409a0faba992cb4782961

$ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 '
17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request #1228 from aojea/librarygo_bump

We can see the PR fix was landed to the OCP payload on Sep 13, that means we shouldn't see repeating events from CI tests after Sep 13th.

For kube-apiserver-operator, still found repeating events in the past two days, they were tested with 4.10.0-0.ci-2021-09-16-194109.

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4%5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator'
event happened 21 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod
event happened 29 times, something is wrong: ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator - reason/MissingVersion no image found for operand pod

So I think this bug was not fixed, assign back.

Comment 8 Antonio Ojea 2021-09-17 13:00:17 UTC

(In reply to Ke Wang from comment #7)
> Verification as below, 
> 
> To check when does the PR fix land to the payload, 
> $ git clone https://github.com/openshift/cluster-kube-apiserver-operator
> $ cd cluster-kube-apiserver-operator
> $ git pull
> 
> $ oc adm release info --commits
> registry.ci.openshift.org/ocp/release:4.10.0-0.ci-2021-09-16-194109 | grep
> cluster-kube-apiserver-operator
>   cluster-kube-apiserver-operator               
> https://github.com/openshift/cluster-kube-apiserver-operator               
> 405ff13f18da49548dd409a0faba992cb4782961
> 
> $ git log --date local --pretty="%h %an %cd - %s" 405ff13 | grep '#1228 '
> 17d0234d OpenShift Merge Robot Mon Sep 13 22:49:26 2021 - Merge pull request
> #1228 from aojea/librarygo_bump
> 
> We can see the PR fix was landed to the OCP payload on Sep 13, that means we
> shouldn't see repeating events from CI tests after Sep 13th.
> 
> For kube-apiserver-operator, still found repeating events in the past two
> days, they were tested with 4.10.0-0.ci-2021-09-16-194109.
> 
> $ w3m -dump -cols 200
> 'https://search.ci.openshift.org/
> ?search=no+image+found+for+operand+pod&maxAge=48h&context=1&type=junit&name=4
> %5C.10&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job' | grep
> 'openshift-kube-apiserver-operator'
> event happened 21 times, something is wrong:
> ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator -
> reason/MissingVersion no image found for operand pod
> event happened 29 times, something is wrong:
> ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator -
> reason/MissingVersion no image found for operand pod
> 
> So I think this bug was not fixed, assign back.


That jobs that you link are upgrade jobs from 4.9 :), the fix wasn't backported yet

Comment 9 Ke Wang 2021-09-18 07:11:51 UTC

Hi aojeagar, thank you for your reply, that means I just check the 4.10 relevant CI jobs, right? If so, I'll leave it for a few days and then check it again without upgrade CI jobs, the upgrade jobs from 4.9 will check with 4.9 bug.

Comment 10 Antonio Ojea 2021-09-20 08:18:27 UTC

right, we can see here that in non-upgrade jobs it stopped 7 days ago
you can see that it is still happening in 4.9, but not in 4.10 https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job

The backport is here https://bugzilla.redhat.com/show_bug.cgi?id=2003540

Comment 11 Antonio Ojea 2021-09-22 13:08:12 UTC

@kewang it seems there were no errors in "non-upgrade" 4.10 jobs in the last 10 days
https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 12 Ke Wang 2021-09-28 01:49:35 UTC

aojeagar, I also checked again, just like what you said the bug was fixed on 4.10. Please change the bug status to ON_QA, I will add some comments and move it VERIFIED.

Comment 13 Ke Wang 2021-09-29 09:34:35 UTC

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=no+image+found+for+operand+pod&maxAge=336h&context=1&type=junit&name=4.10&excludeName=upgrade&maxMatches=5&maxBytes=20971520&groupBy=job' | grep 'openshift-kube-apiserver-operator'
No results found. 

So the bug was fixed, move the it VERIFIED.

Comment 16 errata-xmlrpc 2022-03-10 16:08:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.