Bug 2035720 - [IPI on Alibabacloud] deploying a private cluster by 'publish: Internal' failed due to 'dns_public_record'
Summary: [IPI on Alibabacloud] deploying a private cluster by 'publish: Internal' fail...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Brian Lu
QA Contact: Jianli Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-27 08:34 UTC by Jianli Wei
Modified: 2022-10-14 15:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-14 15:58:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the resources of the private cluster (3.53 KB, text/plain)
2022-07-28 14:21 UTC, Jianli Wei
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5534 0 None Draft Bug 2035720: [Alibaba] support internal publish strategy 2022-01-13 13:07:58 UTC

Description Jianli Wei 2021-12-27 08:34:54 UTC
Version:
$ openshift-install version
openshift-install 4.10.0-0.nightly-2021-12-23-153012
built from commit 94a3ed9cbe4db66dc50dab8b85d2abf40fb56426
release image registry.ci.openshift.org/ocp/release@sha256:39cacdae6214efce10005054fb492f02d26b59fe9d23686dc17ec8a42f428534
release architecture amd64

Platform: alibabacloud

Please specify:
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)

What happened?
Unexpected error of 'Internal publish strategy is not supported on "alibabacloud" platform', because Internal publish strategy should be supported for "alibabacloud", please clarify otherwise, thanks! 

$ openshift-install create install-config --dir work
? SSH Public Key /home/jiwei/.ssh/openshift-qe.pub
? Platform alibabacloud
? Region us-east-1
? Base Domain alicloud-qe.devcluster.openshift.com
? Cluster Name jiwei-uu
? Pull Secret [? for help] *********
INFO Install-Config created in: work              
$ 
$ vim work/install-config.yaml
$ yq e '.publish' work/install-config.yaml
Internal
$ openshift-install create cluster --dir work --log-level info
FATAL failed to fetch Metadata: failed to load asset "Install Config": invalid "install-config.yaml" file: publish: Invalid value: "Internal": Internal publish strategy is not supported on "alibabacloud" platform 
$ 

What did you expect to happen?
"publish: Internal" should be supported for platform "alibabacloud".

How to reproduce it (as minimally and precisely as possible)?
Always

Comment 1 Jianli Wei 2022-01-25 01:01:21 UTC
FYI in the case of a private cluster, the bootstrap instance won't have public IP address.

Comment 3 Jianli Wei 2022-01-29 02:16:04 UTC
Tested and got below error. 

$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-28-213019
built from commit 4fc9fa88c22221b6cede2456b1c33847943b75c9
release image registry.ci.openshift.org/ocp/release@sha256:08421fc455ec6686257afe0b09dacaa811425fb0ef7e8cd7c123312f40352b9a
release architecture amd64
$ 
$ yq e .platform work/install-config.yaml 
alibabacloud:
  region: us-east-1
  vpcID: vpc-0xil4gt2y7n4yj2ge0shk
  vswitchIDs:
    - vsw-0xihmrzxkr6xujzlanhyz
    - vsw-0xi6n2r6c33jflp8vc5kb
$ yq e .publish work/install-config.yaml 
Internal
$ yq e .credentialsMode work/install-config.yaml 
Manual
$ openshift-install create manifests --dir work
INFO Consuming Install Config from target directory 
INFO Manifests created in: work/manifests and work/openshift 
$ 
$ export http_proxy=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@47.90.219.61:3128
$ export https_proxy=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@47.90.219.61:3128
$ 
$ openshift-install create cluster --dir work --log-level info
INFO Consuming Master Machines from target directory 
INFO Consuming Openshift Manifests from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming Common Manifests from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Creating infrastructure resources...         
ERROR                                              
ERROR Error: "value": required field is not set    
ERROR                                              
ERROR   on ../../tmp/openshift-install-cluster-678510377/dns/privatezone.tf line 14, in resource "alicloud_alidns_record" "dns_public_record": 
ERROR   14: resource "alicloud_alidns_record" "dns_public_record" { 
ERROR                                              
ERROR                                              
ERROR Failed to read tfstate: open /tmp/openshift-install-cluster-678510377/terraform.cluster.tfstate: no such file or directory 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change 
$

Comment 4 Jianli Wei 2022-01-29 02:19:23 UTC
FYI although no the original error, to deploy a private cluster, using "publish: Internal", still failed, so reopen the bug.

Comment 5 Jianli Wei 2022-01-29 02:55:07 UTC
Created attachment 1857513 [details]
.openshift_install.log

Comment 6 Jianli Wei 2022-01-29 03:30:49 UTC
FYI I retried once, issue too. 

> the QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/72043/
01-29 11:18:16.938  level=error msg=Error: "value": required field is not set
01-29 11:18:16.938  level=error
01-29 11:18:16.938  level=error msg=  on ../../../../../../../../tmp/openshift-install-cluster-3695075468/dns/privatezone.tf line 14, in resource "alicloud_alidns_record" "dns_public_record":
01-29 11:18:16.938  level=error msg=  14: resource "alicloud_alidns_record" "dns_public_record" {
01-29 11:18:16.938  level=error
01-29 11:18:16.938  level=error
01-29 11:18:16.938  level=error msg=Failed to read tfstate: open /tmp/openshift-install-cluster-3695075468/terraform.cluster.tfstate: no such file or directory
01-29 11:18:16.938  level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

> the install-config.yaml
apiVersion: v1
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    alibabacloud:
      instanceType: ecs.g6.xlarge
  replicas: 3
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    alibabacloud:
      instanceType: ecs.g6.large
  replicas: 2
metadata:
  name: jiwei-601
platform:
  alibabacloud:
    region: us-east-1
    resourceGroupID: rg-aek2wky7lxk4f5y
    vpcID: vpc-0xi6h9s2713tmqc5bpyhc
    vswitchIDs:
    - vsw-0xi183q0g3xqdmkhpgc93
    - vsw-0xi3nk4nu9366f623vtb9
pullSecret: HIDDEN
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
publish: Internal
credentialsMode: Manual
baseDomain: alicloud-qe.devcluster.openshift.com
sshKey: HIDDEN

> the vpc/vswitch/nat resources
$ aliyun vpc DescribeVpcs --RegionId us-east-1 --VpcName jiwei-601-vpc --endpoint vpc.aliyuncs.com --output cols=CreationTime,VpcId,CidrBlock rows=Vpcs.Vpc[]
CreationTime         | VpcId                     | CidrBlock
------------         | -----                     | ---------
2022-01-29T03:11:59Z | vpc-0xi6h9s2713tmqc5bpyhc | 10.0.0.0/16

$ aliyun vpc DescribeVSwitches --RegionId us-east-1 --VpcId vpc-0xi6h9s2713tmqc5bpyhc --endpoint vpc.aliyuncs.com --output cols=Status,VSwitchName,VSwitchId,CidrBlock,ZoneId rows=VSwitches.VSwitch[]
Status    | VSwitchName                  | VSwitchId                 | CidrBlock     | ZoneId
------    | -----------                  | ---------                 | ---------     | ------
Available | jiwei-601-bastion            | vsw-0xi0ue5qls66rx3lezuyq | 10.0.128.0/20 | us-east-1b
Available | jiwei-601-vswitch-us-east-1b | vsw-0xi3nk4nu9366f623vtb9 | 10.0.224.0/20 | us-east-1b
Available | jiwei-601-vswitch-us-east-1a | vsw-0xi183q0g3xqdmkhpgc93 | 10.0.240.0/20 | us-east-1a
Available | jiwei-601-vswitch-natgw      | vsw-0xi13e792cuhi5aedeq2y | 10.0.176.0/20 | us-east-1a

$ aliyun vpc DescribeNatGateways --RegionId us-east-1 --VpcId vpc-0xi6h9s2713tmqc5bpyhc --endpoint vpc.aliyuncs.com --output cols=NatGatewayId,NetworkType,IpLists.IpList[].IpAddress,SnatTableIds.SnatTableId rows=NatGateways.NatGateway[]
NatGatewayId              | NetworkType | IpLists.IpList[].IpAddress | SnatTableIds.SnatTableId
------------              | ----------- | -------------------------- | ------------------------
ngw-0xif5wyxj7lwr2gvnemj5 | internet    | [47.90.166.191]            | [stb-0xibjsydjfbll6cqa6qrj]

$ aliyun vpc DescribeSnatTableEntries --RegionId us-east-1 --SnatTableId stb-0xibjsydjfbll6cqa6qrj --endpoint vpc.aliyuncs.com --output cols=SnatEntryId,Status,SnatIp,SourceCIDR,SourceVSwitchId rows=SnatTableEntries.SnatTableEntry[]
SnatEntryId                | Status    | SnatIp        | SourceCIDR    | SourceVSwitchId
-----------                | ------    | ------        | ----------    | ---------------
snat-0xiktzhin9xur3dtm95zu | Available | 47.90.166.191 | 10.0.224.0/20 | vsw-0xi3nk4nu9366f623vtb9
snat-0xi80lglebs0vtkklxkgl | Available | 47.90.166.191 | 10.0.240.0/20 | vsw-0xi183q0g3xqdmkhpgc93

$

Comment 12 Rohit Patil 2022-04-27 11:43:15 UTC
The same observation
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/97644/console

build: 4.11.0-0.nightly-2022-04-26-181148
template: aos-4_11/ipi-on-alicloud/versioned-installer-private_cluster

Error: 
04-27 11:20:04.944  level=debug msg=[INFO] running Terraform command: /home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/terraform/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-cluster-103485472/terraform.tfvars.json -var-file=/tmp/openshift-install-cluster-103485472/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true
04-27 11:20:11.656  level=error
04-27 11:20:11.656  level=error msg=Error: "value": required field is not set

cc: jiwei

Comment 14 Jianli Wei 2022-06-16 02:21:39 UTC
Set to High severity as CEE has customer wants to use the scenario, thanks!

Comment 16 Beth White 2022-10-14 15:58:33 UTC
Cloned to Jira project and communicated directly with Brian Lu (assignee) https://issues.redhat.com/browse/OCPBUGS-2384


Note You need to log in before you can comment on or make changes to this bug.