Bug 1753778
| Summary: | OpenShift fails to upgrade when image-registry operator is unmanaged or removed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Paul Gozart <pgozart> | |
| Component: | Image Registry | Assignee: | Ricardo Maraschini <rmarasch> | |
| Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.1.z | CC: | adam.kaplan, aos-bugs, xiuwang | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: the image registry operator did not report itself as Available and with the correct version if its management state was set to `Removed`
Consequence: upgrades failed if the image registry operator was set to `Removed`
Fix: the image registry operator reports itself as Available and at the correct version if is set to `Removed`
Result: upgrades can complete if the image registry is removed from the cluster
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1769690 1769691 (view as bug list) | Environment: | ||
| Last Closed: | 2020-01-23 11:06:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1769690, 1770658 | |||
|
Description
Paul Gozart
2019-09-19 20:29:10 UTC
Noting short-term work-around from Oleg in case others encounter this issue in 4.1: 1) set the management state back to managed 2) set the storage config to emptydir (assuming they don't want to configure real storage for the registry/don't already have a valid storage config) This will allow the registry operator to achieve the new version of the cluster and report it. You can then set the registry back to removed (until the next time you want to upgrade, anyway). (They should *not* allow the registry to continue running w/ emptydir storage however, as it will be a problem if images are pushed to that registry and then it restarts for any reason..the storage will be lost and you'll have metadata in etcd (imagestream objects) that do not match to blobs that exist in the registry storage, which is a real mess). Also note that not having an internal registry means the imagestreams+templates that the samples operator installs in the openshift namespace are not going to work as they rely on pullthrough via the internal registry in order to handle authentication to registry.redhat.io w/o requiring every user provide their own creds. Upgrade 4.3.0-0.nightly-2019-10-17-202206 to 4.3.0-0.nightly-2019-10-18-004604.
When set imageregistry to Unmanaged
image-registry co
Last Transition Time: 2019-10-18T01:45:26Z
Message: The registry configuration is set to unmanaged mode
Reason: Unmanaged
Status: True
Type: Available
Last Transition Time: 2019-10-18T01:45:32Z
Message: The registry configuration is set to unmanaged mode
Reason: Unmanaged
Status: False
Type: Progressing
Last Transition Time: 2019-10-18T01:43:47Z
Message: The registry configuration is set to unmanaged mode
Reason: Unmanaged
Status: False
Type: Degraded
Extension: <nil>
When set to Removed,
Last Transition Time: 2019-10-18T02:50:17Z
Message: The registry is removed
Reason: Removed
Status: True
Type: Available
Last Transition Time: 2019-10-18T03:20:27Z
Message: All registry resources are removed
Reason: Removed
Status: False
Type: Progressing
Last Transition Time: 2019-10-18T01:32:38Z
Status: False
Type: Degraded
Extension: <nil>
These works as designed.
But failed to upgrade both set Unmanaged or Removed.
$ oc get clusterversion -o json | jq -r '.items[].status.conditions[]'
{
"lastTransitionTime": "2019-10-18T01:39:57Z",
"message": "Done applying 4.3.0-0.nightly-2019-10-17-202206",
"status": "True",
"type": "Available"
}
{
"lastTransitionTime": "2019-10-18T03:14:08Z",
"message": "Cluster operator image-registry is still updating",
"reason": "ClusterOperatorNotAvailable",
"status": "True",
"type": "Failing"
}
{
"lastTransitionTime": "2019-10-18T02:55:12Z",
"message": "Unable to apply 4.3.0-0.nightly-2019-10-18-004604: the cluster operator image-registry has not yet successfully rolled out",
"reason": "ClusterOperatorNotAvailable",
"status": "True",
"type": "Progressing"
}
{
"lastTransitionTime": "2019-10-18T02:54:06Z",
"status": "True",
"type": "RetrievedUpdates"
}
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.3.0-0.nightly-2019-10-18-004604 True False False 109m
cloud-credential 4.3.0-0.nightly-2019-10-18-004604 True False False 126m
cluster-autoscaler 4.3.0-0.nightly-2019-10-18-004604 True False False 115m
console 4.3.0-0.nightly-2019-10-18-004604 True False False 112m
dns 4.3.0-0.nightly-2019-10-17-202206 True False False 125m
image-registry 4.3.0-0.nightly-2019-10-17-202206 True False False 39m
ingress 4.3.0-0.nightly-2019-10-18-004604 True False False 116m
insights 4.3.0-0.nightly-2019-10-18-004604 True False False 126m
kube-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 123m
kube-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 123m
kube-scheduler 4.3.0-0.nightly-2019-10-18-004604 True False False 123m
machine-api 4.3.0-0.nightly-2019-10-18-004604 True False False 124m
machine-config 4.3.0-0.nightly-2019-10-17-202206 True False False 124m
marketplace 4.3.0-0.nightly-2019-10-18-004604 True False False 27m
monitoring 4.3.0-0.nightly-2019-10-18-004604 True False False 113m
network 4.3.0-0.nightly-2019-10-17-202206 True False False 125m
node-tuning 4.3.0-0.nightly-2019-10-18-004604 True False False 28m
openshift-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 122m
openshift-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 123m
openshift-samples 4.3.0-0.nightly-2019-10-18-004604 True False False 28m
operator-lifecycle-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 125m
operator-lifecycle-manager-catalog 4.3.0-0.nightly-2019-10-18-004604 True False False 125m
operator-lifecycle-manager-packageserver 4.3.0-0.nightly-2019-10-18-004604 True False False 26m
service-ca 4.3.0-0.nightly-2019-10-18-004604 True False False 126m
service-catalog-apiserver 4.3.0-0.nightly-2019-10-18-004604 True False False 121m
service-catalog-controller-manager 4.3.0-0.nightly-2019-10-18-004604 True False False 121m
storage 4.3.0-0.nightly-2019-10-18-004604 True False False 28m
$oc logs -f cluster-version-operator-6457b678f-txjqb -n openshift-cluster-version | grep "image-registry"
I1018 03:24:45.501861 1 sync_worker.go:592] Done syncing for deployment "openshift-image-registry/cluster-image-registry-operator" (171 of 448)
I1018 03:24:45.501883 1 sync_worker.go:579] Running sync for clusteroperator "image-registry" (172 of 448)
E1018 03:30:12.297744 1 task.go:77] error running apply for clusteroperator "image-registry" (172 of 448): Cluster operator image-registry is still updating
I1018 03:30:12.297952 1 task_graph.go:611] Result of work: [Cluster operator image-registry is still updating]
I1018 03:30:12.297971 1 sync_worker.go:745] Update error 172 of 448: ClusterOperatorNotAvailable Cluster operator image-registry is still updating (*errors.errorString: cluster operator image-registry is still updating)
E1018 03:30:12.297991 1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Cluster operator image-registry is still updating
@Xiujan We have no guarantees for Unamanged when it comes to upgrades, so please do not include this in your test matrix. Can you please try the test again and attach the data from must-gather if upgrade fails again? Assigning to Ricardo - there's potential that the current upgrade issue overlaps/duplicates https://bugzilla.redhat.com/show_bug.cgi?id=1768357 @Ricardo @Adam Upgrade also failed when set image-registry operator to Removed, here is the must-gather log http://virt-openshift-05.lab.eng.nay.redhat.com/xiuwang/1753778/ Could upgrade successfully when set image-registry to removed. Upgrade from 4.3.0-0.nightly-2019-11-06-184828 to 4.3.0-0.nightly-2019-11-07-010532. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |