1923811 – Registry claims Available=True despite .status.readyReplicas == 0 while .spec.replicas == 2

Bug 1923811 - Registry claims Available=True despite .status.readyReplicas == 0 while .spec.replicas == 2

Summary: Registry claims Available=True despite .status.readyReplicas == 0 while .spe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Oleg Bulatov
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-02 00:48 UTC by W. Trevor King
Modified:	2021-10-18 01:55 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: the image registry operator didn't update .status.readyReplicas of the config resource Consequence: its value were always zero Fix: write number of ready image registry replicas from the deployment into the config Result: this field shows how many image registry replicas are ready.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:37:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-image-registry-operator pull 669	0	None	Merged	Bug 1923811: Report ready replicas	2021-10-29 16:19:04 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:38:06 UTC

Description W. Trevor King 2021-02-02 00:48:32 UTC

Seen in a 4.5.11 cluster's Insights tarball:

$ tar -xOz config/imageregistry.json < "$(ls | tail -n1)" | jq '{spec: (.spec | {managementState, replicas}), status: (.status | {readyReplicas})}'
{
  "spec": {
    "managementState": "Managed",
    "replicas": 2
  },
  "status": {
    "readyReplicas": 0
  }
}

But despite having no ready replicas, the registry is claiming Available=True:

$ tar -xOz config/imageregistry.json < "$(ls | tail -n1)" | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' | sort
2020-09-30T03:51:17Z ImageConfigControllerDegraded=False AsExpected: -
2020-09-30T03:51:17Z NodeCADaemonControllerDegraded=False AsExpected: -
2020-09-30T03:51:23Z Degraded=False -: -
2020-09-30T03:51:23Z Removed=False -: -
2020-09-30T04:26:53Z Available=True Ready: The registry is ready
2020-12-09T13:46:05Z ImageRegistryCertificatesControllerDegraded=False AsExpected: -
2020-12-14T12:32:26Z Progressing=False Ready: The registry is ready
2020-12-14T12:32:26Z StorageExists=True GCS Bucket Exists: -

I would have expected Available=False.

Comment 2 Oleg Bulatov 2021-02-02 15:56:57 UTC

readyReplicas is always 0, the operator is not aware of this field.

As we cannot remove this field from API, perhaps the operator should report the number of the registry replicas without node-ca and cron jobs.

Comment 3 W. Trevor King 2021-02-02 21:11:09 UTC

readyReplicas looks like a non-deprecated v1 property [1].  Can't you pass through the status value from the Deployment?  Picking a random CI job:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1356101157344251904/artifacts/e2e-aws-upgrade/deployments.json | jq '.items[] | select(.metadata.name == "i
mage-registry").status | {replicas, availableReplicas, updatedReplicas, readyReplicas}'
{
  "replicas": 2,
  "availableReplicas": 2,
  "updatedReplicas": 2,
  "readyReplicas": 2
}

[1]: https://github.com/openshift/api/blob/a9e731090f5ed361e5ab887d0ccd55c1db7fc633/imageregistry/v1/00-crd.yaml#L1111-L1113

Comment 4 Oleg Bulatov 2021-02-03 23:40:21 UTC

It came from OperatorStatus [1]. Yes, most likely we'll just pass through the value from the Deployment, as nobody cares about the node-ca daemonset. Hopefully one day we find a new home for node-ca.

[1]: https://github.com/openshift/api/blob/a9e731090f5ed361e5ab887d0ccd55c1db7fc633/operator/v1/types.go#L120

Comment 7 Wenjing Zheng 2021-04-07 08:22:48 UTC

When replica is set to 0, Available=False as blow:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
image-registry 4.8.0-0.ci.test-2021-04-07-073739-ci-ln-g8qkn42 False False True 5m6s

Comment 8 Oleg Bulatov 2021-04-07 09:13:39 UTC

Wenjing, this is a bit different problem.

The registry with alive replicas used to report .status.readyReplicas == 0 on config.imageregistry.operator.openshift.io/cluster. This should be fixed and now it should be equal to .spec.replicas when everything works fine.

Comment 9 Wenjing Zheng 2021-04-08 08:39:18 UTC

(In reply to Oleg Bulatov from comment #8)
> Wenjing, this is a bit different problem.
> 
> The registry with alive replicas used to report .status.readyReplicas == 0
> on config.imageregistry.operator.openshift.io/cluster. This should be fixed
> and now it should be equal to .spec.replicas when everything works fine.

Thanks for the reminder! Yes, I can see .status.readyReplicas is reflecting the same value with .spec.replicas now.

Comment 11 Wenjing Zheng 2021-04-13 03:05:34 UTC

Verified on 4.8.0-0.nightly-2021-04-09-222447.

Comment 14 errata-xmlrpc 2021-07-27 22:37:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.