Created attachment 1852159 [details] .openshift_install.log Version: $ openshift-install version openshift-install 4.10.0-0.nightly-2022-01-20-082726 built from commit 9eade28a9ce4862a6ef092bc5f5fcfb499342d4d release image registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c release architecture amd64 $ Platform: alibabacloud Please specify: * IPI What happened? (1)Although the installation failed due to https://bugzilla.redhat.com/show_bug.cgi?id=2041694, the 'destroy cluster' is expected to work well, but it hung after 'stage=Nat gateways'. The expected 'stage' after 'stage=Nat gateways' should be 'stage=EIPs', but the EIP stays there. (2) Although it tells 'INFO ECS instances deleted', one master instance ('jiwei-405-bbrrc-master-1') stays running. (3) '.openshift_install.log' keeps telling: time="2022-01-20T12:29:59Z" level=debug msg="Revoking dependency for security groups" securityGroupIDs="[sg-a2dc4z6pxx3pz3rpueyi sg-a2d9yj0vthluex0ro9bg sg-a2d9yj0vthlueyzssbal]" stage="ECS security groups" time="2022-01-20T12:29:59Z" level=debug msg=Revoking securityGroupID=sg-a2dc4z6pxx3pz3rpueyi stage="ECS security groups" time="2022-01-20T12:29:59Z" level=debug msg="Error executing stage" error="SDK.ServerError\nErrorCode: InvalidSecurityGroupId.NotFound\nRecommend: https://error-center.aliyun.com/status/search?Keyword=InvalidSecurityGroupId.NotFound&source=PopGw\nRequestId: 73B85607-90C9-36A5-900B-726609EB5A8E\nMessage: The specified SecurityGroupId does not exist." stage="ECS security groups" What did you expect to happen? Destroying the cluster should succeed. How to reproduce it (as minimally and precisely as possible)? Always. Anything else we need to know? $ yq e '.compute[].platform' work/install-config.yaml alibabacloud: systemDiskCategory: cloud_efficiency $ yq e '.controlPlane.platform' work/install-config.yaml alibabacloud: systemDiskCategory: cloud_efficiency $ yq e '.platform' work/install-config.yaml alibabacloud: region: ap-south-1 resourceGroupID: rg-aek2c4huej7f3ni $ yq e '.credentialsMode' work/install-config.yaml Manual $ $ openshift-install create manifests --dir work INFO Consuming Install Config from target directory INFO Manifests created in: work/manifests and work/openshift $ openshift-install create cluster --dir work --log-level info INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Common Manifests from target directory INFO Consuming Worker Machines from target directory INFO Consuming Openshift Manifests from target directory INFO Consuming Master Machines from target directory INFO Creating infrastructure resources... ERROR ERROR Error: [ERROR] terraform-provider-alicloud/alicloud/resource_alicloud_instance.go:452: Resource alicloud_instance RunInstances Failed!!! [SDK alibaba-cloud-sdk-go ERROR]: ERROR SDK.ServerError ERROR ErrorCode: InvalidResourceType.NotSupported ERROR Recommend: https://error-center.aliyun.com/status/search?Keyword=InvalidResourceType.NotSupported&source=PopGw ERROR RequestId: 8AB5BF0F-C4D1-3866-AC4A-3890388D12EB ERROR Message: user order resource type [[cloud_essd]] not exists in [ap-south-1a] ERROR ERROR on ../../tmp/openshift-install-bootstrap-608796246/main.tf line 133, in resource "alicloud_instance" "bootstrap": ERROR 133: resource "alicloud_instance" "bootstrap" { ERROR ERROR FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change $ aliyun vpc DescribeEipAddresses --RegionId ap-south-1 --EipName jiwei-405-bbrrc-eip --output cols=AllocationId,InstanceRegionId,InstanceType,Description,InternetChargeType,IpAddress,Tags.Tag[] rows=EipAddresses.EipAddress[] AllocationId | InstanceRegionId | InstanceType | Description | InternetChargeType | IpAddress | Tags.Tag[] ------------ | ---------------- | ------------ | ----------- | ------------------ | --------- | ---------- eip-a2dkmtv16me8rv4sr16wb | ap-south-1 | Nat | Created By OpenShift Installer | PayByTraffic | 149.129.184.209 | [map[Key:Name Value:jiwei-405-bbrrc-eip] map[Key:sigs.k8s.io/cloud-provider-alibaba/origin Value:ocp] map[Key:GISV Value:ocp] map[Key:kubernetes.io/cluster/jiwei-405-bbrrc Value:owned]] $ $ openshift-install destroy cluster --dir work --log-level info INFO ECS instances deleted stage=ECS instances INFO OSS bucket deleted bucketName=jiwei-405-bbrrc-bootstrap stage=OSS buckets INFO OSS buckets deleted stage=OSS buckets INFO RAM roles deleted stage=RAM roles INFO Private zones deleted stage=private zones INFO SLB instances deleted stage=SLBs INFO NAT gateways deleted stage=Nat gateways ^C $ $ aliyun vpc DescribeEipAddresses --RegionId ap-south-1 --EipName jiwei-405-bbrrc-eip --output cols=AllocationId,InstanceRegionId,InstanceType,Description,InternetChargeType,IpAddress,Tags.Tag[] rows=EipAddresses.EipAddress[] AllocationId | InstanceRegionId | InstanceType | Description | InternetChargeType | IpAddress | Tags.Tag[] ------------ | ---------------- | ------------ | ----------- | ------------------ | --------- | ---------- eip-a2dkmtv16me8rv4sr16wb | | | Created By OpenShift Installer | PayByTraffic | 149.129.184.209 | [map[Key:Name Value:jiwei-405-bbrrc-eip] map[Key:sigs.k8s.io/cloud-provider-alibaba/origin Value:ocp] map[Key:GISV Value:ocp] map[Key:kubernetes.io/cluster/jiwei-405-bbrrc Value:owned]] $ $ aliyun ecs DescribeInstances --RegionId ap-south-1 --InstanceName jiwei-405-bbrrc-master-1 --endpoint ecs.ap-south-1.aliyuncs.com --output cols=ZoneId,InstanceId,Status,SecurityGroupIds.SecurityGroupId[],VpcAttributes.VpcId rows=Instances.Instance[] ZoneId | InstanceId | Status | SecurityGroupIds.SecurityGroupId[] | VpcAttributes.VpcId ------ | ---------- | ------ | ---------------------------------- | ------------------- ap-south-1a | i-a2dc4z6pxx3pz3rpf5tk | Running | [sg-a2d9yj0vthluex0ro9bg] | vpc-a2d05l0atni35cloe8u6h $ aliyun ecs DescribeSecurityGroups --RegionId ap-south-1 --VpcId vpc-a2d05l0atni35cloe8u6h --endpoint ecs.ap-south-1.aliyuncs.com --output cols=SecurityGroupName,SecurityGroupId,Description rows=SecurityGroups.SecurityGroup[] SecurityGroupName | SecurityGroupId | Description ----------------- | --------------- | ----------- jiwei-405-bbrrc_sg_bootstrap | sg-a2d9yj0vthlueyzssbal | Created By OpenShift Installer jiwei-405-bbrrc-sg-master | sg-a2d9yj0vthluex0ro9bg | Created By OpenShift Installer $
The destroyer should not error when it attempts to delete a resource that does not exist. The destroyer should be accepting a not-found error as a successful install. See https://github.com/openshift/installer/blob/303a3c7adcc718f48e1aa372acd92f31b4685642/pkg/destroy/alibabacloud/alibabacloud.go#L832 as the problematic area for the specific case outlined in this BZ.
@jiwei This scenario doesn't always come up, so I wonder if your cluster resources and logs are still saved? 1. Whether the following records can be found in the log Deleting ECS instances ecsIDs=[<the ID of jiwei-405-bbrrc-master-1>] stage=ECS instances 2. Does the master instance ('jiwei-405-bbrrc-master-1') have tags
Not meet the issue during yesterday and today's testing, mark as verified for now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056