Bug 2113831 - Hive Project is stuck with terminating state after attempt to upgrade to ACM 2.5
Summary: Hive Project is stuck with terminating state after attempt to upgrade to ACM 2.5
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Installer
Version: rhacm-2.5.z
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Ray Harris
QA Contact: txue
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-02 06:01 UTC by Mihir Lele
Modified: 2025-10-03 12:41 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-06 12:55:46 UTC
Target Upstream Version:
Embargoed:
txue: qe_test_coverage+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 24769 0 None None None 2022-08-02 09:55:22 UTC

Description Mihir Lele 2022-08-02 06:01:58 UTC
Description of the problem:

Hive Project is stuck with terminating state after attempt to upgrade to ACM 2.5

Also, MCH is stuck in installing state.

Additional info:

Hive ns:

  - lastTransitionTime: "2022-07-26T01:35:33Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: admission.hive.openshift.io/v1: the server is
      currently unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure


less 0020-acm-must-gather-2.tar.gz/acm-must-gather-2/registry-redhat-io-rhacm2-acm-must-gather-rhel8-sha256-385e5fb24b0f50ba4ada884ea8c4d5013393769261d221ab37f79e9e96e461e1/namespaces/rhacm/pods/multicluster-operators-standalone-subscription-7bc8d49776-wr28f/multicluster-operators-standalone-subscription/multicluster-operators-standalone-subscription/logs/current.log

2022-07-30T12:45:31.540674414Z E0730 12:45:31.540597       1 gitrepo.go:303] Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-
pack": EOF Failed to git clone with the primary channel: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOF
2022-07-30T12:45:31.540674414Z E0730 12:45:31.540655       1 git_subscriber_item.go:265] Failed to clone git: https://github.com/stolostron/acm-hive-openshift-releases.git err: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOFUnable to clone the git repo https://github.com/stolostron/acm-hive-openshift-releases.git
2022-07-30T12:45:31.540748415Z I0730 12:45:31.540674       1 git_subscriber_item.go:268] exit doSubscription: rhacm/hive-clusterimagesets-subscription-fast-0
2022-07-30T12:45:31.540748415Z E0730 12:45:31.540680       1 git_subscriber_item.go:160] Failed to clone git: https://github.com/stolostron/acm-hive-openshift-releases.git err: Get "https://github.com/stolostron/acm-hive-openshift-releases.git/info/refs?service=git-upload-pack": EOFSubscription error.


I am not sure about the tasks that are done at the background for upgrading ACM 2.4 to 2.5. But I can see that hive was being managed by mch in 2.4, and its managed by mce in 2.5, so my guess is that hive needs to be redeployed? Also, I didnt see any evidence to suggest that mce deployment was triggered.

This looks like a connected setup from the Must gather. So I am assuming that we dont need to add the mce annotation on mch manually.

Comment 4 Mihir Lele 2022-08-04 05:19:55 UTC
We have been able to get past the issue with this workaround with the help of the hive engineering:

Run: kubectl get apiservice


Look for ones listed as AVAILABLE is False and delete them


Run: kubectl delete apiservce <service-name>

Comment 5 daliu 2022-08-15 02:54:19 UTC
@efried 
@jagray 

I am not sure if hive or acm installer should fix it. 
So Could you help to take a look?

Comment 7 Eric Fried 2022-08-15 17:13:46 UTC
I believe the fix here is going to be on the ACM side. If deleting an existing deployment's namespace is part of their upgrade process, they'll need to add a step to delete the APIService (and any other non-namespaced resources). On the hive side, we need to look into better documenting the uninstallation process. (We may also try to look into having the hive-operator clean up after itself a bit better when its deployment is deleted.)

Comment 8 Eric Fried 2022-08-15 19:14:26 UTC
(In reply to Eric Fried from comment #7)
> On the hive side, we need to look into better documenting the
> uninstallation process. (We may also try to look into having the
> hive-operator clean up after itself a bit better when its deployment is
> deleted.)

https://issues.redhat.com/browse/HIVE-1998

Comment 9 bot-tracker-sync 2022-08-22 14:51:21 UTC
G2Bsync 1222456466 comment 
 ray-harris Mon, 22 Aug 2022 14:40:04 UTC 
 G2Bsync

ACM was unable to reproduce this issue. We're not going to add code to check for this as this is the first and only time it's been reported. We'll make our SRE aware of the potential issue in case they run into it again.

This issue can be closed.


Note You need to log in before you can comment on or make changes to this bug.