Bug 2035720
| Summary: | [IPI on Alibabacloud] deploying a private cluster by 'publish: Internal' failed due to 'dns_public_record' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianli Wei <jiwei> | ||||
| Component: | Installer | Assignee: | Brian Lu <brlu> | ||||
| Installer sub component: | openshift-installer | QA Contact: | Jianli Wei <jiwei> | ||||
| Status: | CLOSED DEFERRED | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | beth.white, brlu, bteng, gpei, padillon, ropatil, yqu | ||||
| Version: | 4.10 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-10-14 15:58:33 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jianli Wei
2021-12-27 08:34:54 UTC
FYI in the case of a private cluster, the bootstrap instance won't have public IP address. Tested and got below error.
$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-28-213019
built from commit 4fc9fa88c22221b6cede2456b1c33847943b75c9
release image registry.ci.openshift.org/ocp/release@sha256:08421fc455ec6686257afe0b09dacaa811425fb0ef7e8cd7c123312f40352b9a
release architecture amd64
$
$ yq e .platform work/install-config.yaml
alibabacloud:
region: us-east-1
vpcID: vpc-0xil4gt2y7n4yj2ge0shk
vswitchIDs:
- vsw-0xihmrzxkr6xujzlanhyz
- vsw-0xi6n2r6c33jflp8vc5kb
$ yq e .publish work/install-config.yaml
Internal
$ yq e .credentialsMode work/install-config.yaml
Manual
$ openshift-install create manifests --dir work
INFO Consuming Install Config from target directory
INFO Manifests created in: work/manifests and work/openshift
$
$ export http_proxy=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@47.90.219.61:3128
$ export https_proxy=http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@47.90.219.61:3128
$
$ openshift-install create cluster --dir work --log-level info
INFO Consuming Master Machines from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming Common Manifests from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Creating infrastructure resources...
ERROR
ERROR Error: "value": required field is not set
ERROR
ERROR on ../../tmp/openshift-install-cluster-678510377/dns/privatezone.tf line 14, in resource "alicloud_alidns_record" "dns_public_record":
ERROR 14: resource "alicloud_alidns_record" "dns_public_record" {
ERROR
ERROR
ERROR Failed to read tfstate: open /tmp/openshift-install-cluster-678510377/terraform.cluster.tfstate: no such file or directory
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change
$
FYI although no the original error, to deploy a private cluster, using "publish: Internal", still failed, so reopen the bug. Created attachment 1857513 [details]
.openshift_install.log
FYI I retried once, issue too. > the QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/72043/ 01-29 11:18:16.938 level=error msg=Error: "value": required field is not set 01-29 11:18:16.938 level=error 01-29 11:18:16.938 level=error msg= on ../../../../../../../../tmp/openshift-install-cluster-3695075468/dns/privatezone.tf line 14, in resource "alicloud_alidns_record" "dns_public_record": 01-29 11:18:16.938 level=error msg= 14: resource "alicloud_alidns_record" "dns_public_record" { 01-29 11:18:16.938 level=error 01-29 11:18:16.938 level=error 01-29 11:18:16.938 level=error msg=Failed to read tfstate: open /tmp/openshift-install-cluster-3695075468/terraform.cluster.tfstate: no such file or directory 01-29 11:18:16.938 level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change > the install-config.yaml apiVersion: v1 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: alibabacloud: instanceType: ecs.g6.xlarge replicas: 3 compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: alibabacloud: instanceType: ecs.g6.large replicas: 2 metadata: name: jiwei-601 platform: alibabacloud: region: us-east-1 resourceGroupID: rg-aek2wky7lxk4f5y vpcID: vpc-0xi6h9s2713tmqc5bpyhc vswitchIDs: - vsw-0xi183q0g3xqdmkhpgc93 - vsw-0xi3nk4nu9366f623vtb9 pullSecret: HIDDEN networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.0.0.0/16 networkType: OpenShiftSDN publish: Internal credentialsMode: Manual baseDomain: alicloud-qe.devcluster.openshift.com sshKey: HIDDEN > the vpc/vswitch/nat resources $ aliyun vpc DescribeVpcs --RegionId us-east-1 --VpcName jiwei-601-vpc --endpoint vpc.aliyuncs.com --output cols=CreationTime,VpcId,CidrBlock rows=Vpcs.Vpc[] CreationTime | VpcId | CidrBlock ------------ | ----- | --------- 2022-01-29T03:11:59Z | vpc-0xi6h9s2713tmqc5bpyhc | 10.0.0.0/16 $ aliyun vpc DescribeVSwitches --RegionId us-east-1 --VpcId vpc-0xi6h9s2713tmqc5bpyhc --endpoint vpc.aliyuncs.com --output cols=Status,VSwitchName,VSwitchId,CidrBlock,ZoneId rows=VSwitches.VSwitch[] Status | VSwitchName | VSwitchId | CidrBlock | ZoneId ------ | ----------- | --------- | --------- | ------ Available | jiwei-601-bastion | vsw-0xi0ue5qls66rx3lezuyq | 10.0.128.0/20 | us-east-1b Available | jiwei-601-vswitch-us-east-1b | vsw-0xi3nk4nu9366f623vtb9 | 10.0.224.0/20 | us-east-1b Available | jiwei-601-vswitch-us-east-1a | vsw-0xi183q0g3xqdmkhpgc93 | 10.0.240.0/20 | us-east-1a Available | jiwei-601-vswitch-natgw | vsw-0xi13e792cuhi5aedeq2y | 10.0.176.0/20 | us-east-1a $ aliyun vpc DescribeNatGateways --RegionId us-east-1 --VpcId vpc-0xi6h9s2713tmqc5bpyhc --endpoint vpc.aliyuncs.com --output cols=NatGatewayId,NetworkType,IpLists.IpList[].IpAddress,SnatTableIds.SnatTableId rows=NatGateways.NatGateway[] NatGatewayId | NetworkType | IpLists.IpList[].IpAddress | SnatTableIds.SnatTableId ------------ | ----------- | -------------------------- | ------------------------ ngw-0xif5wyxj7lwr2gvnemj5 | internet | [47.90.166.191] | [stb-0xibjsydjfbll6cqa6qrj] $ aliyun vpc DescribeSnatTableEntries --RegionId us-east-1 --SnatTableId stb-0xibjsydjfbll6cqa6qrj --endpoint vpc.aliyuncs.com --output cols=SnatEntryId,Status,SnatIp,SourceCIDR,SourceVSwitchId rows=SnatTableEntries.SnatTableEntry[] SnatEntryId | Status | SnatIp | SourceCIDR | SourceVSwitchId ----------- | ------ | ------ | ---------- | --------------- snat-0xiktzhin9xur3dtm95zu | Available | 47.90.166.191 | 10.0.224.0/20 | vsw-0xi3nk4nu9366f623vtb9 snat-0xi80lglebs0vtkklxkgl | Available | 47.90.166.191 | 10.0.240.0/20 | vsw-0xi183q0g3xqdmkhpgc93 $ The same observation https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/97644/console build: 4.11.0-0.nightly-2022-04-26-181148 template: aos-4_11/ipi-on-alicloud/versioned-installer-private_cluster Error: 04-27 11:20:04.944 level=debug msg=[INFO] running Terraform command: /home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/install-dir/terraform/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-cluster-103485472/terraform.tfvars.json -var-file=/tmp/openshift-install-cluster-103485472/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true 04-27 11:20:11.656 level=error 04-27 11:20:11.656 level=error msg=Error: "value": required field is not set cc: jiwei Set to High severity as CEE has customer wants to use the scenario, thanks! Created attachment 1899976 [details] the resources of the private cluster Tested with a build having the PR https://github.com/openshift/installer/pull/5671 (see https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1552602010828673024), to deploy a private cluster on alibabacloud can succeed. The attachment lists the resources of the cluster. FYI the flexy-install & flexy-destroy jobs: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/125320/ https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-destroy/118387/ Cloned to Jira project and communicated directly with Brian Lu (assignee) https://issues.redhat.com/browse/OCPBUGS-2384 |