Bug 1775518
Summary: | Service mesh auto upgrade fails each time | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> |
Component: | OLM | Assignee: | Ben Luddy <bluddy> |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, bluddy, dsover, ecordell, eparis, jokerman, vdinh |
Version: | 4.2.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-02-06 17:21:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jaspreet Kaur
2019-11-22 07:20:06 UTC
Hi Jaspreet, I have a few questions: 1) What version of openshift was your cluster? 2) Was there any other CSVs that went in the failed state during auto upgrade? Looks like servicemesh depends on a few operator, and if any of those operators fail to install for some reason, servicemesh will fail to install. But if that's the case, I don't think it'll be fair to just say **ServiceMesh** failed to install during auto upgrade in the bug report. 3) How reproducible was this? Could you provide more detailed steps on how to reproduce this? If this was a one off thing, for example elasticsearch operator, which servicemesh operator depends on, had a one off glitch for some reason in your cluster, we may not be able to classify this as a bug. However, if with the steps you provide, elasticsearch operator fails to install more than once, and only during upgrade, then we could investigate this further as a potential bug. This may have the same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1789920, which prevents garbage collection of copied CSVs, but we don't have quite enough information to confirm or disconfirm. If your cluster has reproduced it, the "conflicting CRD owner in namespace" failures would be expected, since the copied CSV asserts itself the owner of the same CRD that the new CSV wants to own. During a normal upgrade, the new version would specify that it replaces the previous version, so the existing CRD ownership is not considered a conflict. However, if the zombie CSV were two or more versions earlier than the newest CSV, it would result in a CRD ownership conflict. You can query your cluster for CSVs that are in this state: $ oc get -A -o json csv | jq '.items[] | select((.status.reason == "Copied" and .metadata.annotations["olm.operatorNamespace"] == .metadata.namespace))' Any such CSVs can be safely deleted. The 4.4.0 release will contain changes that prevent CSVs from entering this state and clean up and existing CSVs that are already in this state. The fixes will also be backported to 4.3.z (https://bugzilla.redhat.com/show_bug.cgi?id=1797019) and 4.2.z (https://bugzilla.redhat.com/show_bug.cgi?id=1797021). If you can reproduce your original issue, but there are no CSVs matching the above query, please respond and we can consider more avenues of investigation. *** This bug has been marked as a duplicate of bug 1789920 *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |