Bug 2063596

Summary: claim clusters from clusterpool throws errors
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Hui Chen <huichen>
Component: ConsoleAssignee: Kevin Cormier <kcormier>
Status: CLOSED ERRATA QA Contact: Xiang Yin <xiyin>
Severity: high Docs Contact: Christopher Dawson <cdawson>
Priority: high    
Version: rhacm-2.5CC: daliu, dho, dhuynh, efried, huichen, kcormier, yuhe
Target Milestone: ---Keywords: Reopened
Target Release: rhacm-2.5Flags: huichen: qe_test_coverage+
bot-tracker-sync: rhacm-2.5+
kcormier: needinfo-
efried: needinfo-
kcormier: needinfo-
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-09 02:09:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 daliu 2022-03-14 07:32:32 UTC
@efried 

It looks like the clusterpool already assigned one clusterdeployment, and the clusterclaim status and clusterdeployment status changed to running after a few minites.

But the clusterclaim status before running is not right, could you help to take a look?

Comment 2 Eric Fried 2022-03-14 14:12:10 UTC
The timeout is coming from the ACM side, not Hive. The ClusterDeployments aren't ready to be claimed yet, so the claim stays pending. This is working as designed from the hive side. More...

We recently changed the claim logic to only fulfill claims with *running* clusters [1] to mitigate problems with clusters failing to resume after they were claimed. I discussed this with @kcormier and @gbuchana last week, recommending that ACM take this into account, and take advantage of the other knobs we've put into place to improve the UX in this area, including:
- ClusterPool.Spec.RunningCount to maintain a subset of clusters Running so they can be claimed immediately
- ClusterPool.Spec.HibernationConfig.ResumeTimeout to make sure clusters that get stuck resuming are thrown out and replaced

[1] https://issues.redhat.com/browse/HIVE-1679

Comment 3 daliu 2022-03-15 02:23:45 UTC
Thanks @efried 

@kcormier Please help to enhance UI logic here...

Comment 4 bot-tracker-sync 2022-03-21 22:21:44 UTC
G2Bsync 1073963158 comment 
 KevinFCormier Mon, 21 Mar 2022 14:24:24 UTC 
 G2Bsync This has been addressed through UI logic updates.

Comment 5 Hui Chen 2022-04-06 14:43:36 UTC
I think we can close this issue now since the UI logic was changed already.

Comment 9 errata-xmlrpc 2022-06-09 02:09:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956