Bug 2081552 - [bz-Image Registry] clusteroperator/image-registry should not change condition/Available
Summary: [bz-Image Registry] clusteroperator/image-registry should not change conditio...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Credential Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Shivanthi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-03 23:50 UTC by W. Trevor King
Modified: 2023-03-09 01:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:18:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description W. Trevor King 2022-05-03 23:50:46 UTC
[bz-Image Registry] clusteroperator/image-registry should not change condition/Available

is failing frequently in CI, see [1] and:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=image-registry+should+not+change+condition/Available' | grep 'failures match' | sort
periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-ppc64le (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.11-ocp-e2e-aws-arm64-techpreview-serial (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.11-upgrade-from-nightly-4.10-ocp-remote-libvirt-ppc64le (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.8-upgrade-from-nightly-4.7-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.9-upgrade-from-nightly-4.8-ocp-remote-libvirt-s390x (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-ovirt-upgrade (all) - 4 runs, 50% failed, 200% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-azure-upgrade-single-node (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-ovn (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-techpreview-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade (all) - 60 runs, 25% failed, 53% of failures match = 13% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-rt-upgrade (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-upgrade (all) - 21 runs, 100% failed, 43% of failures match = 43% impact
periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-ovirt-upgrade (all) - 4 runs, 75% failed, 133% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-ovirt-upgrade (all) - 4 runs, 75% failed, 100% of failures match = 75% impact
periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-vsphere-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-upi-serial (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-upgrade-from-stable-4.9-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-gcp (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-gcp-rt (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-serial-ovn-dualstack (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-upgrade (all) - 3 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.11-upgrade-from-stable-4.10-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 3 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.9-e2e-vsphere-upi-serial (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-release-master-okd-4.10-e2e-vsphere (all) - 5 runs, 80% failed, 25% of failures match = 20% impact
pull-ci-openshift-machine-config-operator-release-4.10-e2e-aws-upgrade-single-node (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
pull-ci-openshift-machine-config-operator-release-4.10-e2e-vsphere-upgrade (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
pull-ci-openshift-origin-master-e2e-aws-single-node-upgrade (all) - 9 runs, 78% failed, 100% of failures match = 78% impact

For example, [2] has:

  : [bz-Image Registry] clusteroperator/image-registry should not change condition/Available
    Run #0: Failed	2h23m7s
    {  2 unexpected clusteroperator state transitions during e2e test run 

    May 03 14:28:58.817 - 3490s E clusteroperator/image-registry condition/Available status/False reason/Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created
2 tests failed during this blip (2022-05-03 14:28:58.817414102 +0000 UTC to 2022-05-03 14:28:58.817414102 +0000 UTC): [sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods [Suite:openshift/conformance/parallel] [Suite:k8s]}

With:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade/1521470801340010496/build-log.txt | grep 'clusteroperator/image-registry condition/Available.*changed'
  May 03 14:28:58.817 E clusteroperator/image-registry condition/Available status/False reason/NoReplicasAvailable changed: Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created
  May 03 15:27:08.975 W clusteroperator/image-registry condition/Available status/True reason/MinimumAvailability changed: Available: The registry has minimum availability\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created

The test-case is flake-only, so this isn't impacting CI success rates.  But having the operator claim Available=False is not a great customer experience. Possibly not a big enough UX impact to be worth backports, but certainly a big enough UX impact to be worth fixing in the development branch.

[1]: https://sippy.ci.openshift.org/sippy-ng/tests/4.11/analysis?test=%5Bbz-Image%20Registry%5D%20clusteroperator%2Fimage-registry%20should%20not%20change%20condition%2FAvailable
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade/1521470801340010496

Comment 1 Oleg Bulatov 2022-05-05 14:14:44 UTC
from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade/1521470801340010496

May 03 14:26:19.012 - 3671s W clusteroperator/image-registry condition/Progressing status/True reason/Progressing: Unable to apply resources: unable to sync storage configuration: Get "https://storage.googleapis.com/storage/v1/b/ci-op-77pjlmh2-d3bee-f5mr2-image-registry-us-central1-rvudplqn?alt=json&prettyPrint=false&projection=full": oauth2: cannot fetch token: 400 Bad Request\nProgressing: Response: {"error":"invalid_grant","error_description":"Invalid grant: account not found"}

The CCO provided credentials didn't work for an hour. It seems the cluster had some serious problems. Moving to CCO for credentials insights.

Comment 3 Shiftzilla 2023-03-09 01:18:29 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9249


Note You need to log in before you can comment on or make changes to this bug.