Bug 2046025 - [IPI on Alibabacloud] pre-configured alicloud DNS private zone is deleted after destroying cluster, please clarify
Summary: [IPI on Alibabacloud] pre-configured alicloud DNS private zone is deleted aft...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: aos-install
QA Contact: Jianli Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-26 08:13 UTC by Jianli Wei
Modified: 2022-03-12 04:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-12 04:41:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5583 0 None open Bug 2046025: [Alibaba] fix destory exist private zone 2022-01-27 04:53:58 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:42:08 UTC

Description Jianli Wei 2022-01-26 08:13:54 UTC
Version:
$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-25-023600
built from commit 6bd4f3ecafb39f0ea2f62b7b27b548ca74bab020
release image registry.ci.openshift.org/ocp/release@sha256:19fd4b9a313f2dfcdc982f088cffcc5859484af3cb8966cde5ec0be90b262dbc
release architecture amd64

Platform: alibabacloud

Please specify:
* IPI

What happened?
Installer supports the optional option "privateZoneID", for alibabacloud platform. If specified, along with vpcID & vswitchIDs, the installation went well and finally got a working cluster. But, after destroying the cluster, the pre-configured alicloud DNS private zone is deleted which seems unexpected, please clarify. 

What did you expect to happen?
The pre-configure alicloud DNS private zone should stay, just as the VPC and vSwitches, after destroying the cluster. 

How to reproduce it (as minimally and precisely as possible)?
Always.

Anything else we need to know?
$ openshift-install explain installconfig.platform.alibabacloud.privateZoneID
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <string>
  PrivateZoneID is the ID of an existing private zone into which to add DNS records for the cluster's internal API. An existing private zone can only be used when also using existing VPC. The private zone must be associated with the VPC containing the subnets. Leave the private zone unset to have the installer create the private zone on your behalf.

$ openshift-install create install-config --dir work
? SSH Public Key /home/fedora/.ssh/openshift-qe.pub
? Platform alibabacloud
? Region cn-hongkong
? Base Domain alicloud-qe.devcluster.openshift.com
? Cluster Name jiwei-302
? Pull Secret [? for help] *******
INFO Install-Config created in: work
$ 
$ echo 'credentialsMode: Manual' >> work/install-config.yaml
$ vim work/install-config.yaml
$ yq e .platform work/install-config.yaml 
alibabacloud:
  region: cn-hongkong
  resourceGroupID: rg-aek2aognijpinoy
  vpcID: vpc-j6cgz9esl8lyawer8s44f
  vswitchIDs:
    - vsw-j6cduab4a6xur18mvjwbx
  privateZoneID: aabf0115fb79f473c0df093a267ce40d
$ yq e .metadata work/install-config.yaml 
creationTimestamp: null
name: jiwei-302
$ 
$ openshift-install create manifests --dir work
FATAL failed to fetch Master Machines: failed to load asset "Install Config": platform.alibabacloud.privateZoneID: Invalid value: "aabf0115fb79f473c0df093a267ce40d": the name jiwei-300.alicloud-qe.devcluster.openshift.com of the existing private zone does not match the expected zone name jiwei-302.alicloud-qe.devcluster.openshift.com 
$ 
$ vim work/install-config.yaml
$ yq e .metadata work/install-config.yaml 
creationTimestamp: null
name: jiwei-300
$ yq e .platform work/install-config.yaml 
alibabacloud:
  region: cn-hongkong
  resourceGroupID: rg-aek2aognijpinoy
  vpcID: vpc-j6cgz9esl8lyawer8s44f
  vswitchIDs:
    - vsw-j6cduab4a6xur18mvjwbx
  privateZoneID: aabf0115fb79f473c0df093a267ce40d
$ openshift-install create manifests --dir work
INFO Consuming Install Config from target directory 
INFO Manifests created in: work/manifests and work/openshift 
$ 
$ openshift-install create cluster --dir work --log-level info
INFO Consuming Master Machines from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Consuming Common Manifests from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Worker Machines from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 4:07AM) for the Kubernetes API at https://api.jiwei-300.alicloud-qe.devcluster.openshift.com:6443...
INFO API v1.23.0+06791f6 up
INFO Waiting up to 30m0s (until 4:21AM) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s (until 4:51AM) for the cluster at https://api.jiwei-300.alicloud-qe.devcluster.openshift.com:6443 to initialize
...
W0126 04:11:29.781257  427525 reflector.go:324] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: 
Get "https://api.jiwei-300.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.
name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost
I0126 04:11:29.781511  427525 trace.go:205] Trace[420993082]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (26-Jan-2022 04:11:13.859) (total time: 15922ms):
Trace[420993082]: ---"Objects listed" error:Get "https://api.jiwei-300.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost 15921ms (04:11:29.781)
Trace[420993082]: [15.922029888s] [15.922029888s] END
E0126 04:11:29.781550  427525 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.jiwei-300.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost
INFO Waiting up to 10m0s (until 4:33AM) for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/fedora/work/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.jiwei-300.alicloud-qe.devcluster.openshift.com
INFO Login to the console with user: "kubeadmin", and password: "MGfDQ-WvXI5-NL95B-oToxz"
INFO Time elapsed: 37m59s  
$ 
$ export KUBECONFIG=/home/fedora/work/auth/kubeconfig
$ oc get nodes
NAME                             STATUS   ROLES    AGE   VERSION
jiwei-300-dbd8j-master-0         Ready    master   80m   v1.23.0+06791f6
jiwei-300-dbd8j-master-1         Ready    master   78m   v1.23.0+06791f6
jiwei-300-dbd8j-master-2         Ready    master   79m   v1.23.0+06791f6
jiwei-300-dbd8j-worker-b-96l9b   Ready    worker   59m   v1.23.0+06791f6
jiwei-300-dbd8j-worker-b-lh4r5   Ready    worker   59m   v1.23.0+06791f6
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-25-023600   True        False         50m     Cluster version is 4.10.0-0.nightly-2022-01-25-023600
$ 
$ openshift-install destroy cluster --dir work --log-level info
INFO OSS bucket deleted                            bucketName=jiwei-300-dbd8j-image-registry-cn-hongkong-cwgtmfixfdnvwnpfgba stage=OSS buckets
INFO OSS buckets deleted                           stage=OSS buckets
INFO ECS instances deleted                         stage=ECS instances
INFO RAM roles deleted                             stage=RAM roles
INFO Private zones deleted                         stage=private zones
INFO SLB instances deleted                         stage=SLBs
INFO Security groups deleted                       stage=ECS security groups
INFO NAT gateways deleted                          stage=Nat gateways
INFO EIPs deleted                                  stage=EIPs
INFO VSwitches deleted                             stage=VSwitchs
INFO Time elapsed: 1m27s                          
$ aliyun pvtz DescribeZones --QueryVpcId vpc-j6cgz9esl8lyawer8s44f --endpoint pvtz.cn-hongkong.aliyuncs.com
{
        "PageNumber": 1,
        "PageSize": 20,
        "RequestId": "9727089F-60D8-5354-A73B-6BFAF0CAB28A",
        "TotalItems": 0,
        "TotalPages": 1,
        "Zones": {
                "Zone": []
        }
}
$

Comment 1 husun 2022-01-27 04:54:50 UTC
Should not destroy pre-configured alicloud DNS private zone, fix it on PR: https://github.com/openshift/installer/pull/5583

Comment 2 Patrick Dillon 2022-01-27 19:36:46 UTC
I have approved the PR, but we should check if once it is merged that the private records are leaked. If so, we should create a new BZ for that.

Comment 4 Jianli Wei 2022-01-29 05:28:56 UTC
(In reply to Patrick Dillon from comment #2)
> I have approved the PR, but we should check if once it is merged that the
> private records are leaked. If so, we should create a new BZ for that.

@Patrick Yes the PR fixed the original issue, but the private records do be leaked, will file another bug. Thanks! 

>(1) before installation, create the VPC and PVTZ via alicloud UI
$ aliyun vpc DescribeVpcs --RegionId us-east-1 --VpcName jiwei-603-vpc --endpoint vpc.aliyuncs.com --output cols=CreationTime,VpcId,CidrBlock rows=Vpcs.Vpc[]
CreationTime         | VpcId                     | CidrBlock
------------         | -----                     | ---------
2022-01-29T03:39:07Z | vpc-0xifjl4lq21834b2z7p52 | 10.0.0.0/16

$ aliyun vpc DescribeVSwitches --RegionId us-east-1 --VpcId vpc-0xifjl4lq21834b2z7p52 --endpoint vpc.aliyuncs.com --output cols=Status,VSwitchName,VSwitchId,CidrBlock,ZoneId rows=VSwitches.VSwitch[]
Status    | VSwitchName                  | VSwitchId                 | CidrBlock     | ZoneId
------    | -----------                  | ---------                 | ---------     | ------
Available | jiwei-603-vswitch-us-east-1b | vsw-0xiaf8vw0talxcx2muhga | 10.0.224.0/20 | us-east-1b
Available | jiwei-603-vswitch-us-east-1a | vsw-0xi6k07lku8uq96iid66n | 10.0.240.0/20 | us-east-1a
Available | jiwei-603-vswitch-natgw      | vsw-0xixv7g336a9mssnzul1c | 10.0.176.0/20 | us-east-1a

$ aliyun vpc DescribeNatGateways --RegionId us-east-1 --VpcId vpc-0xifjl4lq21834b2z7p52 --endpoint vpc.aliyuncs.com --output cols=NatGatewayId,NetworkType,IpLists.IpList[].IpAddress,SnatTableIds.SnatTableId rows=NatGateways.NatGateway[]
NatGatewayId              | NetworkType | IpLists.IpList[].IpAddress | SnatTableIds.SnatTableId
------------              | ----------- | -------------------------- | ------------------------
ngw-0xi0fs66zfnkl4vnm8n1f | internet    | [47.253.97.183]            | [stb-0xikajj0t26dfu5haq8rz]

$ aliyun vpc DescribeSnatTableEntries --RegionId us-east-1 --SnatTableId stb-0xikajj0t26dfu5haq8rz --endpoint vpc.aliyuncs.com --output cols=SnatEntryId,Status,SnatIp,SourceCIDR,SourceVSwitchId rows=SnatTableEntries.SnatTableEntry[]
SnatEntryId                | Status    | SnatIp        | SourceCIDR    | SourceVSwitchId
-----------                | ------    | ------        | ----------    | ---------------
snat-0xi3vstuvlavvfsdbroml | Available | 47.253.97.183 | 10.0.224.0/20 | vsw-0xiaf8vw0talxcx2muhga
snat-0xi9020z483auqgid9hiw | Available | 47.253.97.183 | 10.0.240.0/20 | vsw-0xi6k07lku8uq96iid66n

$

>(2) do the installation and then destroying
$ openshift-install version
openshift-install 4.10.0-0.nightly-2022-01-28-213019
built from commit 4fc9fa88c22221b6cede2456b1c33847943b75c9
release image registry.ci.openshift.org/ocp/release@sha256:08421fc455ec6686257afe0b09dacaa811425fb0ef7e8cd7c123312f40352b9a
release architecture amd64
$ openshift-install create install-config --dir work
? SSH Public Key /home/fedora/.ssh/openshift-qe.pub
? Platform alibabacloud
? Region us-east-1
? Base Domain alicloud-qe.devcluster.openshift.com
? Cluster Name jiwei-603
? Pull Secret [? for help] ****
INFO Install-Config created in: work
$ echo 'credentialsMode: Manual' >> work/install-config.yaml
$ vim work/install-config.yaml
$ yq e .platform work/install-config.yaml 
alibabacloud:
  region: us-east-1
  vpcID: vpc-0xifjl4lq21834b2z7p52
  vswitchIDs:
    - vsw-0xi6k07lku8uq96iid66n
    - vsw-0xiaf8vw0talxcx2muhga
  privateZoneID: 1d59c05ad3fde7feca6e432669d7115f
$ yq e .metadata work/install-config.yaml 
creationTimestamp: null
name: jiwei-603
$ yq e .credentialsMode work/install-config.yaml 
Manual
$ openshift-install create manifests --dir work
INFO Consuming Install Config from target directory 
INFO Manifests created in: work/manifests and work/openshift 
$ openshift-install create cluster --dir work --log-level info
INFO Consuming Master Machines from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Consuming Common Manifests from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 4:14AM) for the Kubernetes API at https://api.jiwei-603.alicloud-qe.devcluster.openshift.com:6443...
INFO API v1.23.3+b63be7f up
INFO Waiting up to 30m0s (until 4:26AM) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s (until 4:46AM) for the cluster at https://api.jiwei-603.alicloud-qe.devcluster.openshift.com:6443 to initialize...
W0129 04:06:30.527842  432691 reflector.go:324] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: Get "https://api.jiwei-603.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost
I0129 04:06:30.528548  432691 trace.go:205] Trace[893700782]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (29-Jan-2022 04:06:17.699) (total time: 12828ms):
Trace[893700782]: ---"Objects listed" error:Get "https://api.jiwei-603.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost 12828ms (04:06:30.527)
Trace[893700782]: [12.828465688s] [12.828465688s] END
E0129 04:06:30.528967  432691 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.jiwei-603.alicloud-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": http2: client connection lost
INFO Waiting up to 10m0s (until 4:32AM) for the openshift-console route to be created... 
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/fedora/work/auth/kubeconfig' 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.jiwei-603.alicloud-qe.devcluster.openshift.com 
INFO Login to the console with user: "kubeadmin", and password: "SLCLa-4rye8-tcAxv-hLsr6" 
INFO Time elapsed: 31m50s                         
$ export KUBECONFIG=/home/fedora/work/auth/kubeconfig
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-28-213019   True        False         10m     Cluster version is 4.10.0-0.nightly-2022-01-28-213019
$ oc get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
jiwei-603-nbtdt-master-0                  Ready    master   34m   v1.23.3+b63be7f
jiwei-603-nbtdt-master-1                  Ready    master   33m   v1.23.3+b63be7f
jiwei-603-nbtdt-master-2                  Ready    master   34m   v1.23.3+b63be7f
jiwei-603-nbtdt-worker-us-east-1a-tknwg   Ready    worker   24m   v1.23.3+b63be7f
jiwei-603-nbtdt-worker-us-east-1b-6vkf5   Ready    worker   21m   v1.23.3+b63be7f
jiwei-603-nbtdt-worker-us-east-1b-v4wzm   Ready    worker   22m   v1.23.3+b63be7f
$ aliyun pvtz DescribeZones --QueryVpcId vpc-0xifjl4lq21834b2z7p52 --endpoint pvtz.us-east-1.aliyuncs.com
{
        "PageNumber": 1,
        "PageSize": 20,
        "RequestId": "4AACB315-444B-5541-AC89-706E305EBDF7",
        "TotalItems": 1,
        "TotalPages": 1,
        "Zones": {
                "Zone": [
                        {
                                "CreateTime": "2022-01-29T03:44Z",
                                "CreateTimestamp": 1643427891000,
                                "IsPtr": false,
                                "ProxyPattern": "ZONE",
                                "RecordCount": 3,
                                "ResourceGroupId": "rg-acfnw6kdej3hyai",
                                "UpdateTime": "2022-01-29T04:04Z",
                                "UpdateTimestamp": 1643429075000,
                                "ZoneId": "1d59c05ad3fde7feca6e432669d7115f",
                                "ZoneName": "jiwei-603.alicloud-qe.devcluster.openshift.com",
                                "ZoneType": "AUTH_ZONE"
                        }
                ]
        }
}
$ 
$ openshift-install destroy cluster --dir work --log-level info
INFO OSS bucket deleted                            bucketName=jiwei-603-nbtdt-image-registry-us-east-1-abrwlmcckaiupnbkbomxx stage=OSS buckets
INFO OSS buckets deleted                           stage=OSS buckets
INFO ECS instances deleted                         stage=ECS instances
INFO RAM roles deleted                             stage=RAM roles
INFO SLB instances deleted                         stage=SLBs
INFO Security groups deleted                       stage=ECS security groups
INFO Resource group deleted                        name=jiwei-603-nbtdt-rg stage=resource groups
INFO Time elapsed: 47s                            
$ 

>(3) make sure the PVTZ stays and the cluster's zone records are deleted, after destroying the cluster
$ aliyun pvtz DescribeZones --QueryVpcId vpc-0xifjl4lq21834b2z7p52 --endpoint pvtz.us-east-1.aliyuncs.com
{
        "PageNumber": 1,
        "PageSize": 20,
        "RequestId": "68D83C85-B6B3-504C-B643-05251764C04F",
        "TotalItems": 1,
        "TotalPages": 1,
        "Zones": {
                "Zone": [
                        {
                                "CreateTime": "2022-01-29T03:44Z",
                                "CreateTimestamp": 1643427891000,
                                "IsPtr": false,
                                "ProxyPattern": "ZONE",
                                "RecordCount": 3,
                                "ResourceGroupId": "rg-acfnw6kdej3hyai",
                                "UpdateTime": "2022-01-29T04:04Z",
                                "UpdateTimestamp": 1643429075000,
                                "ZoneId": "1d59c05ad3fde7feca6e432669d7115f",
                                "ZoneName": "jiwei-603.alicloud-qe.devcluster.openshift.com",
                                "ZoneType": "AUTH_ZONE"
                        }
                ]
        }
}
>$ aliyun pvtz DescribeZoneRecords --ZoneId 1d59c05ad3fde7feca6e432669d7115f --endpoint pvtz.us-east-1.aliyuncs.com --output cols=Rr,Value,Status,Ttl rows=Records.Record[]
Rr      | Value       | Status | Ttl
--      | -----       | ------ | ---
*.apps  | 47.252.20.1 | ENABLE | 30
api     | 10.0.240.83 | ENABLE | 60
api-int | 10.0.240.83 | ENABLE | 60

$

Comment 7 errata-xmlrpc 2022-03-12 04:41:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.