Bug 2040160 - [IPI on Alibabacloud] installation fails when region does not support pay-by-bandwidth
Summary: [IPI on Alibabacloud] installation fails when region does not support pay-by-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: aos-install
QA Contact: Jianli Wei
URL:
Whiteboard:
: 2041664 (view as bug list)
Depends On:
Blocks: 2042356
TreeView+ depends on / blocked
 
Reported: 2022-01-13 06:53 UTC by Jianli Wei
Modified: 2022-03-10 16:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2042356 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:39:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
alibaba web console - creating EIP and associate to NAT gateway (135.76 KB, image/png)
2022-01-13 06:53 UTC, Jianli Wei
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5547 0 None open Bug 2040160: [Alibaba] fix EIP metering method 2022-01-19 15:23:55 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:39:26 UTC

Description Jianli Wei 2022-01-13 06:53:26 UTC
Created attachment 1850513 [details]
alibaba web console - creating EIP and associate to NAT gateway

Version:
$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-13-000150
built from commit 28cfc831cee01eb503a2340b4d5365fd281bf867
release image registry.ci.openshift.org/ocp/release@sha256:089541f3c2bb64b9561fefeed3dae688e422ad1f50a17400525c6cd0bab61f46
release architecture amd64

Platform: alibabacloud

Please specify:
* IPI

What happened?
IPI installation in region 'me-east-1' failed on alicloud_pvtz_zone_attachment and AllocateEipAddress, even with valid instanceType and systemDiskCategory.

What did you expect to happen?
As both binding private zone to VPC and allocating EIP are fundermental things during IPI installation, please confirm if the region should be supported or not.

How to reproduce it (as minimally and precisely as possible)?
Always.

Anything else we need to know?
>(1) The customized install-config.yaml and the final error message.
$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-13-000150
built from commit 28cfc831cee01eb503a2340b4d5365fd281bf867
release image registry.ci.openshift.org/ocp/release@sha256:089541f3c2bb64b9561fefeed3dae688e422ad1f50a17400525c6cd0bab61f46
release architecture amd64
$ yq e .platform Dubai/install-config.yaml
alibabacloud:
  region: me-east-1
  defaultMachinePlatform:
    instanceType: ecs.sn2ne.xlarge
    systemDiskCategory: cloud_ssd
$ openshift-install create manifests --dir Dubai
INFO Consuming Install Config from target directory 
INFO Manifests created in: Dubai/manifests and Dubai/openshift 
$ 
$ tree Dubai
Dubai
├── manifests
│   ├── cloud-provider-config.yaml
│   ├── cluster-config.yaml
│   ├── cluster-dns-02-config.yml
│   ├── cluster-infrastructure-02-config.yml
│   ├── cluster-ingress-02-config.yml
│   ├── cluster-network-01-crd.yml
│   ├── cluster-network-02-config.yml
│   ├── cluster-proxy-01-config.yaml
│   ├── cluster-scheduler-02-config.yml
│   ├── cvo-overrides.yaml
│   ├── kube-cloud-config.yaml
│   ├── kube-system-configmap-root-ca.yaml
│   ├── machine-config-server-tls-secret.yaml
│   └── openshift-config-secret-pull-secret.yaml
└── openshift
    ├── 99_kubeadmin-password-secret.yaml
    ├── 99_openshift-cluster-api_master-machines-0.yaml
    ├── 99_openshift-cluster-api_master-machines-1.yaml
    ├── 99_openshift-cluster-api_master-machines-2.yaml
    ├── 99_openshift-cluster-api_master-user-data-secret.yaml
    ├── 99_openshift-cluster-api_worker-machineset-0.yaml
    ├── 99_openshift-cluster-api_worker-user-data-secret.yaml
    ├── 99_openshift-machineconfig_99-master-ssh.yaml
    ├── 99_openshift-machineconfig_99-worker-ssh.yaml
    └── openshift-install-manifests.yaml

2 directories, 24 files
$ 
$ grep zoneId Dubai -r
Dubai/openshift/99_openshift-cluster-api_worker-machineset-0.yaml:          zoneId: me-east-1a
Dubai/openshift/99_openshift-cluster-api_master-machines-2.yaml:      zoneId: me-east-1a
Dubai/openshift/99_openshift-cluster-api_master-machines-0.yaml:      zoneId: me-east-1a
Dubai/openshift/99_openshift-cluster-api_master-machines-1.yaml:      zoneId: me-east-1a
$ 
$ openshift-install create cluster --dir Dubai --log-level info
INFO Consuming Common Manifests from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Consuming Openshift Manifests from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming Master Machines from target directory 
INFO Creating infrastructure resources...         
ERROR
ERROR Error: [ERROR] terraform-provider-alicloud/alicloud/resource_alicloud_pvtz_zone_attachment.go:141: Resource alicloud_pvtz_zone_att
achment BindZoneVpc Failed!!! [SDK alibaba-cloud-sdk-go ERROR]:
ERROR SDKError:
ERROR    Code: ZoneVpc.Invalid.RegionId            
ERROR    Message: code: 400, Zone-Vpc regionId is invalid. request id: 48C6D230-E0DE-5710-84C5-3524C3E9E21B 
ERROR    Data: {"Code":"ZoneVpc.Invalid.RegionId","HostId":"pvtz.aliyuncs.com","Message":"Zone-Vpc regionId is invalid.","Recommend":"https://error-center.aliyun.com/status/search?Keyword=ZoneVpc.Invalid.RegionId\u0026source=PopGw","RequestId":"48C6D230-E0DE-5710-84C5-3524C3E9E21B"} 
ERROR                                              
ERROR                                              
ERROR   on ../../tmp/openshift-install-cluster-3444940882/dns/privatezone.tf line 34, in resource "alicloud_pvtz_zone_attachment" "pvtz_attachment": 
ERROR   34: resource "alicloud_pvtz_zone_attachment" "pvtz_attachment" { 
ERROR                                              
ERROR                                              
ERROR                                              
ERROR Error: [ERROR] terraform-provider-alicloud/alicloud/resource_alicloud_eip_address.go:200: Resource alicloud_eip_address AllocateEipAddress Failed!!! [SDK alibaba-cloud-sdk-go ERROR]: 
ERROR SDKError:                                    
ERROR    Code: OrderError.EIP                      
ERROR    Message: code: 400, The Account failed to create order. request id: 0A2FA3EA-6635-371C-861F-6F699984C507 
ERROR    Data: {"Code":"OrderError.EIP","HostId":"vpc.me-east-1.aliyuncs.com","Message":"The Account failed to create order.","Recommend":"https://error-center.aliyun.com/status/search?Keyword=OrderError.EIP\u0026source=PopGw","RequestId":"0A2FA3EA-6635-371C-861F-6F699984C507"} 
ERROR                                              
ERROR
ERROR   on ../../tmp/openshift-install-cluster-3444940882/vpc/eip.tf line 1, in resource "alicloud_eip_address" "eip":
ERROR    1: resource "alicloud_eip_address" "eip" {
ERROR
ERROR
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change
$ 

>(2) I tried with "aliyun" of BindZoneVpc and AllocateEipAddress which also failed. But, at alibaba cloud websole, I could CreateEIP in the region successfuly and bind it to the NAT gateway, although in the Bind VPC page of the private zone I failed to find the region. 

$ aliyun vpc DescribeVpcs --RegionId me-east-1 --VpcName jiwei-403-xr2df-vpc --endpoint vpc.me-east-1.aliyuncs.com --output cols=VpcName,VpcId,NatGatewayIds.NatGatewayIds rows=Vpcs.Vpc[]
VpcName             | VpcId                     | NatGatewayIds.NatGatewayIds
-------             | -----                     | ---------------------------
jiwei-403-xr2df-vpc | vpc-eb3509o1571lfbkqlluoc | [ngw-eb3xiic4gc5pdszbh4wpe]

$ aliyun pvtz DescribeZones --endpoint pvtz.me-east-1.aliyuncs.com --Keyword jiwei-403
{
        "PageNumber": 1,
        "PageSize": 20,
        "RequestId": "CA232F7D-DEAE-5F31-A4FF-65C02829A260",
        "TotalItems": 1,
        "TotalPages": 1,
        "Zones": {
                "Zone": [
                        {
                                "CreateTime": "2022-01-13T05:31Z",
                                "CreateTimestamp": 1642051886000,
                                "IsPtr": false,
                                "ProxyPattern": "ZONE",
                                "RecordCount": 2,
                                "ResourceGroupId": "rg-aeky7nftfwradjy",
                                "UpdateTime": "2022-01-13T05:31Z",
                                "UpdateTimestamp": 1642051916000,
                                "ZoneId": "36f2f2e5f53133ae4b24252bcdc75788",
                                "ZoneName": "jiwei-403.alicloud-qe.devcluster.openshift.com",
                                "ZoneType": "AUTH_ZONE"
                        }
                ]
        }
}
$ 
$ aliyun pvtz BindZoneVpc --ZoneId 36f2f2e5f53133ae4b24252bcdc75788 --Vpcs.1.RegionId me-east-1 --Vpcs.1.VpcId vpc-eb3509o1571lfbkqlluoc --endpoint pvtz.me-east-1.aliyuncs.com
ERROR: SDK.ServerError
ErrorCode: ZoneVpc.Invalid.RegionId
Recommend: https://error-center.aliyun.com/status/search?Keyword=ZoneVpc.Invalid.RegionId&source=PopGw
RequestId: 8883DEC9-9F0F-5EDE-B022-3451FEB52CDF
Message: Zone-Vpc regionId is invalid.
$ 
$ aliyun vpc AllocateEipAddress --RegionId me-east-1 --endpoint vpc.me-east-1.aliyuncs.com
ERROR: SDK.ServerError
ErrorCode: OrderError.EIP
Recommend: https://error-center.aliyun.com/status/search?Keyword=OrderError.EIP&source=PopGw
RequestId: 2A140267-51DA-3181-9A9C-0C6DBA14DFEF
Message: The Account failed to create order.
$

Comment 1 bteng 2022-01-13 11:31:25 UTC
I have tested AllocateEipAddress in my environmnet, EIP can be allocated successfully. Maybe EIP reached the quota limit in your account? I will send you a screenshot in slack. 
shell@Alicloud:~$ aliyun vpc AllocateEipAddress --RegionId me-east-1 --endpoint vpc.me-east-1.aliyuncs.com
{
        "AllocationId": "eip-eb30z1urfwd3jcs2ew5iz",
        "EipAddress": "47.91.126.141",
        "RequestId": "8DD7C858-F24D-39F2-A033-051AEB156998",
        "ResourceGroupId": "rg-acfm2ia66e2qy4y"

PVTZ service is not supported in Dubai(me-east-1), because there is only one AZ, which leaded to the failure.

Comment 3 Jianli Wei 2022-01-14 03:41:44 UTC
FYI As suggested by Bo Teng, adding the param '--InternetChargeType PayByTraffic' can successfully allocating an EIP. 

$ aliyun vpc AllocateEipAddress --RegionId me-east-1 --Name jiwei-test1-eip --InternetChargeType PayByTraffic --endpoint vpc.me-east-1.aliyuncs.com
{
        "AllocationId": "eip-eb3w4ooz8p61e9kfhyl17",
        "EipAddress": "47.91.109.174",
        "RequestId": "EF514AB0-E05F-384A-B0C1-20B0FD874528",
        "ResourceGroupId": "rg-acfnw6kdej3hyai"
}
$ 

Default is pay by bandwidth, but Dubai does not support pay by bandwidth.

Comment 4 bteng 2022-01-14 11:28:25 UTC
Regarding EIP Dubai does not support EIP paid by bandwidth, only support pain by traffic. Since PVTZ is not support, OCP can not support this region.

Comment 5 Matthew Staebler 2022-01-17 16:30:33 UTC
@bteng Are there API calls that the installer could make to verify that PVTZ is or is not supported in a region? Ideally, the installer would fail before running the terraform with a more meaningful error given to the user.

Comment 6 Matthew Staebler 2022-01-19 10:31:16 UTC
I have moved the issue with the PVTZ service to https://bugzilla.redhat.com/show_bug.cgi?id=2042356. I am changing the focus of this BZ to be only the issue with the EIP.

Comment 7 Matthew Staebler 2022-01-19 10:32:00 UTC
*** Bug 2041664 has been marked as a duplicate of this bug. ***

Comment 9 Jianli Wei 2022-01-20 11:30:23 UTC
Tested with region 'ap-south-1 (India (Mumbai))' (bug https://bugzilla.redhat.com/show_bug.cgi?id=2041664) and now the EIP does be created by installer. Mark as verified. 

$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-20-082726
built from commit 9eade28a9ce4862a6ef092bc5f5fcfb499342d4d
release image registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c
release architecture amd64
$ 
$ aliyun vpc DescribeEipAddresses --RegionId ap-south-1 --EipName jiwei-405-bbrrc-eip --output cols=AllocationId,InstanceRegionId,InstanceType,Description,InternetChargeType,IpAddress,Tags.Tag[] rows=EipAddresses.EipAddress[]
AllocationId              | InstanceRegionId | InstanceType | Description                    | InternetChargeType | IpAddress       | Tags.Tag[]
------------              | ---------------- | ------------ | -----------                    | ------------------ | ---------       | ----------
eip-a2dkmtv16me8rv4sr16wb | ap-south-1       | Nat          | Created By OpenShift Installer | PayByTraffic       | 149.129.184.209 | [map[Key:Name Value:jiwei-405-bbrrc-eip] map[Key:sigs.k8s.io/cloud-provider-alibaba/origin Value:ocp] map[Key:GISV Value:ocp] map[Key:kubernetes.io/cluster/jiwei-405-bbrrc Value:owned]]

$

Comment 13 errata-xmlrpc 2022-03-10 16:39:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.