Bug 2113831
| Summary: | Hive Project is stuck with terminating state after attempt to upgrade to ACM 2.5 | ||
|---|---|---|---|
| Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Mihir Lele <mlele> |
| Component: | Installer | Assignee: | Ray Harris <raharris> |
| Status: | CLOSED NOTABUG | QA Contact: | txue |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | rhacm-2.5.z | CC: | daliu, dhuynh, efried, huichen, jagray, jfindysz |
| Target Milestone: | --- | Flags: | txue:
qe_test_coverage+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-06 12:55:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We have been able to get past the issue with this workaround with the help of the hive engineering: Run: kubectl get apiservice Look for ones listed as AVAILABLE is False and delete them Run: kubectl delete apiservce <service-name> @efried @jagray I am not sure if hive or acm installer should fix it. So Could you help to take a look? I believe the fix here is going to be on the ACM side. If deleting an existing deployment's namespace is part of their upgrade process, they'll need to add a step to delete the APIService (and any other non-namespaced resources). On the hive side, we need to look into better documenting the uninstallation process. (We may also try to look into having the hive-operator clean up after itself a bit better when its deployment is deleted.) (In reply to Eric Fried from comment #7) > On the hive side, we need to look into better documenting the > uninstallation process. (We may also try to look into having the > hive-operator clean up after itself a bit better when its deployment is > deleted.) https://issues.redhat.com/browse/HIVE-1998 G2Bsync 1222456466 comment ray-harris Mon, 22 Aug 2022 14:40:04 UTC G2Bsync ACM was unable to reproduce this issue. We're not going to add code to check for this as this is the first and only time it's been reported. We'll make our SRE aware of the potential issue in case they run into it again. This issue can be closed. |
Description of the problem: Hive Project is stuck with terminating state after attempt to upgrade to ACM 2.5 Also, MCH is stuck in installing state. Additional info: Hive ns: - lastTransitionTime: "2022-07-26T01:35:33Z" message: 'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: admission.hive.openshift.io/v1: the server is currently unable to handle the request' reason: DiscoveryFailed status: "True" type: NamespaceDeletionDiscoveryFailure less 0020-acm-must-gather-2.tar.gz/acm-must-gather-2/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-385e5fb24b0f50ba4ada884ea8c4d5013393769261d221ab37f79e9e96e461e1/namespaces/rhacm/pods/multicluster-operators-standalone-subscription-7bc8d49776-wr28f/multicluster-operators-standalone-subscription/multicluster-operators-standalone-subscription/logs/current.log 2022-07-30T12:45:31.540674414Z E0730 12:45:31.540597 1 gitrepo.go:303] Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload- pack": EOF Failed to git clone with the primary channel: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOF 2022-07-30T12:45:31.540674414Z E0730 12:45:31.540655 1 git_subscriber_item.go:265] Failed to clone git: https://github.com/stolostron/acm-hive-openshift-releases.git err: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOFUnable to clone the git repo https://github.com/stolostron/acm-hive-openshift-releases.git 2022-07-30T12:45:31.540748415Z I0730 12:45:31.540674 1 git_subscriber_item.go:268] exit doSubscription: rhacm/hive-clusterimagesets-subscription-fast-0 2022-07-30T12:45:31.540748415Z E0730 12:45:31.540680 1 git_subscriber_item.go:160] Failed to clone git: https://github.com/stolostron/acm-hive-openshift-releases.git err: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOFSubscription error. I am not sure about the tasks that are done at the background for upgrading ACM 2.4 to 2.5. But I can see that hive was being managed by mch in 2.4, and its managed by mce in 2.5, so my guess is that hive needs to be redeployed? Also, I didnt see any evidence to suggest that mce deployment was triggered. This looks like a connected setup from the Must gather. So I am assuming that we dont need to add the mce annotation on mch manually.