Bug 2037209

Summary: [IPI on Alibabacloud] worker nodes are put in the default resource group unexpectedly
Product: OpenShift Container Platform Reporter: Jianli Wei <jiwei>
Component: InstallerAssignee: Michael McCune <mimccune>
Installer sub component: openshift-installer QA Contact: Jianli Wei <jiwei>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: brlu, bteng, gpei, mimccune, mstaeble
Version: 4.10   
Target Release: 4.10.0   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Last Closed: 2022-03-10 16:37:09 UTC Type: Bug
alibabacloud web console - default resource group none

Description Jianli Wei 2022-01-05 09:12:48 UTC
Created attachment 1849003 [details]
alibabacloud web console - default resource group

./openshift-install 4.10.0-0.nightly-2022-01-05-052228
built from commit 22d874c8d0751d5645de95121662e32d17d6eada
release image registry.ci.openshift.org/ocp/release@sha256:934dfba08338fbb64926f77950ab69d1fe23d5e1efe3f4ed66aa1740bb181c72
release architecture amd64

Platform: alibabacloud

Please specify:
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)

What happened?
The worker nodes are not in the cluster resource group, instead they are in the Default Resource Group (rg-acfnw6kdej3hyai), which is unexpected, see the attachment.

What did you expect to happen?
All nodes of the cluster should be in the same resource group, i.e. the cluster resource group if not specified explicitly.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?
FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/64145/
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-05-052228   True        False         9m32s   Cluster version is 4.10.0-0.nightly-2022-01-05-052228
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
jiwei-306-x5kbj-master-0                     Ready    master   41m   v1.22.1+6859754
jiwei-306-x5kbj-master-1                     Ready    master   40m   v1.22.1+6859754
jiwei-306-x5kbj-master-2                     Ready    master   39m   v1.22.1+6859754
jiwei-306-x5kbj-worker-eu-central-1a-j9zgs   Ready    worker   29m   v1.22.1+6859754
jiwei-306-x5kbj-worker-eu-central-1b-tfxgz   Ready    worker   29m   v1.22.1+6859754

Comment 1 Michael McCune 2022-01-13 17:25:21 UTC
i'm assigning this to myself since i'm creating some patches for this, and plan to talk with alibaba about the changes needed.

Comment 2 Michael McCune 2022-01-25 01:52:06 UTC
all of the linked PRs have merged, but we still need one more to fix the problem. we are coordinating with engineers from Alibaba to implement the last fix.

Comment 4 Michael McCune 2022-01-25 13:46:14 UTC
all PRs have merged for this fix

Comment 6 Jianli Wei 2022-01-26 04:21:58 UTC
Tested with 4.10.0-0.ci-2022-01-25-204950, all resources of the cluster (except the OSS bucket for image registry, see https://bugzilla.redhat.com/show_bug.cgi?id=2039304) are put in the cluster's resource group as expected, mark as verified. 

./openshift-install 4.10.0-0.ci-2022-01-25-204950
built from commit f07482a5683d99ff9c767eefcd9b2feb027353fb
release image registry.ci.openshift.org/ocp/release@sha256:c55892e607d41986466a24cf291d08acb9bf4335d8d3a8e254f05c8a910e112e
release architecture amd64

$ aliyun ecs DescribeInstances --RegionId eu-central-1 --VpcId vpc-gw8ycw1wqv9rd4o945fvb --endpoint ecs.eu-central-1.aliyuncs.com --output cols=ZoneId,InstanceName,ResourceGroupId,InstanceType,Status rows=Instances.Instance[]
ZoneId        | InstanceName                               | ResourceGroupId    | InstanceType  | Status
------        | ------------                               | ---------------    | ------------  | ------
eu-central-1a | jiwei-303-kp7lw-worker-eu-central-1a-mgq97 | rg-aekzzbrzgx5g5lq | ecs.g6.large  | Running
eu-central-1b | jiwei-303-kp7lw-worker-eu-central-1b-27trd | rg-aekzzbrzgx5g5lq | ecs.g6.large  | Running
eu-central-1b | jiwei-303-kp7lw-worker-eu-central-1b-nx9zh | rg-aekzzbrzgx5g5lq | ecs.g6.large  | Running
eu-central-1a | jiwei-303-kp7lw-master-1                   | rg-aekzzbrzgx5g5lq | ecs.g6.xlarge | Running
eu-central-1b | jiwei-303-kp7lw-master-2                   | rg-aekzzbrzgx5g5lq | ecs.g6.xlarge | Running
eu-central-1b | jiwei-303-kp7lw-master-0                   | rg-aekzzbrzgx5g5lq | ecs.g6.xlarge | Running

$ aliyun resourcemanager ListResourceGroups --ResourceGroupId rg-aekzzbrzgx5g5lq --endpoint resourcemanager.eu-central-1.aliyuncs.com --output cols=CreateDate,Name,DisplayName,Id,Status rows=ResourceGroups.ResourceGroup[]
CreateDate                | Name               | DisplayName        | Id                 | Status
----------                | ----               | -----------        | --                 | ------
2022-01-26T11:20:55+08:00 | jiwei-303-kp7lw-rg | jiwei-303-kp7lw-rg | rg-aekzzbrzgx5g5lq | OK


Comment 9 errata-xmlrpc 2022-03-10 16:37:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.