Created attachment 1855504 [details] attach1.1 Created attachment 1855504 [details] attach1.1 Background: Generally there are 4 fundermental scenarios that QE testing needs to cover, including, (1) IPI a cluster (2) IPI a private cluster (3) IPI a cluster in a disconnected network (4) IPI a cluster in a disconnected network behind http proxy >With the above said, we need to figure out as soon what's wrong in either QE test env or the installer. So maybe it is not a bug, high attentions are expected, thanks in advance! Version: ./openshift-install 4.10.0-0.nightly-2022-01-25-023600 built from commit 6bd4f3ecafb39f0ea2f62b7b27b548ca74bab020 release image registry.ci.openshift.org/ocp/release@sha256:19fd4b9a313f2dfcdc982f088cffcc5859484af3cb8966cde5ec0be90b262dbc release architecture amd64 Platform: alibabacloud Please specify: * IPI What happened? Installation in a disconnected network got 'Bootstrap failed to complete', and the bootstrap machine seems ok, but all 3 masters are NotReady and with very few images pulled. FYI there are 2 scenarios: (1) disconnected with local mirror registry which is within the VPC (2) disconnected with http proxy for Internet access We are using alicloud VPC network ACL to construct a disconnected network. What did you expect to happen? The installation should succeed. How to reproduce it (as minimally and precisely as possible)? Always. Anything else we need to know? FYI The QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/70906/ The imageContentSources section in "install-config.yaml": imageContentSources: - mirrors: - jiwei-304.mirror-registry.alicloud-qe.devcluster.openshift.com:5000/ocp/release source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - jiwei-304.mirror-registry.alicloud-qe.devcluster.openshift.com:5000/ocp/release source: registry.ci.openshift.org/ocp/release Please see attach1.1 for nodes status and related alicloud resources, and attach1.2 for the gathered bootstrap logs. FYI The QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/70932/ The proxy section in "install-config.yaml": proxy: httpProxy: http://username:password@10.0.143.21:3128 httpsProxy: http://username:password@10.0.143.21:3128 noProxy: test.no-proxy.com BTW, the installer would use the bastion's public IP address for http proxy. Please see attach2.1 for nodes status and related alicloud resources, and attach2.2 for the gathered bootstrap logs.
Created attachment 1855505 [details] attach1.2
Created attachment 1855506 [details] attach2.1
Created attachment 1855507 [details] attach2.2
Created attachment 1855508 [details] attach2.2
Jianli Wei, > We are using alicloud VPC network ACL to construct a disconnected network. Could you please describe all the steps to create the disconnected network, including the commands? I found only the destroy scripts on build artifacts.
(In reply to Marco Braga from comment #5) > Jianli Wei, > > > We are using alicloud VPC network ACL to construct a disconnected network. > > Could you please describe all the steps to create the disconnected network, > including the commands? I found only the destroy scripts on build artifacts. @Marco please refer to https://gitlab.cee.redhat.com/jiwei/flexy-templates/-/blob/ipi-on-ali/functionality-testing/aos-4_10/hosts/libs/alicloud/utils_v2.sh#L371-457, thanks!
Sorry, please ignore the 2nd attach2.2 which is a duplicate uploading due to network issue that time, thanks!
Created attachment 1896245 [details] [4.11] bootstrap logs I retried the scenario, i.e. "IPI a cluster in a disconnected network behind http proxy", with 4.11.0-0.nightly-2022-07-11-080250, and the attachment is the gathered bootstrap logs. As a disconnected network, the VPC doesn't have NAT gateway configured, so all control-plane nodes and compute nodes are expected to use the http proxy when accessing internet. FYI the content of install-config.yaml: apiVersion: v1 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: alibabacloud: instanceType: ecs.g6.xlarge replicas: 3 compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: alibabacloud: instanceType: ecs.g6.large replicas: 2 metadata: name: jiwei-0712-05 platform: alibabacloud: region: ap-northeast-1 resourceGroupID: rg-aek2c4huej7f3ni vpcID: vpc-6we8dsk71y9ldriddscdq vswitchIDs: - vsw-6weguf7vesewzhxzwq4f0 - vsw-6werzddz3hqwl4nrkrooj pullSecret: <pull secret> networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.0.0.0/16 networkType: OpenShiftSDN publish: External proxy: httpProxy: http://<username>:<password>@10.0.175.232:3128 httpsProxy: http://<username>:<password>@10.0.175.232:3128 noProxy: test.no-proxy.com credentialsMode: Manual baseDomain: alicloud-qe.devcluster.openshift.com sshKey: <ssh keys>
Cloned to Jira project https://issues.redhat.com/browse/OCPBUGS-2388