+++ Scenario * dual-stack * KubeAPI * multi-node * 2 Cluster Networks (IPv4 + IPv6) * 2 Service Networks (IPv4 + IPv6) * 1 Machine Network (IPv4) * 1 API and 1 Ingress VIP (IPv4) +++ Current state Cluster (i.e. ACI resource) is flapping in "preparing for installation" +++ Desired state Cluster should be failing validation +++ Internal info What I have observed is that we are trying to generate the ignition, but because OCP Installer has a validator for network configuration it does not allow us to create a correct one. We do not handle the error coming from ignition.go in any meaningful way so we are going back to "preparing for installation" and the process starts again. We should either create a new validator or plug some logic into existing one that would be checking the network configuration (i.e. alignment of all the *-networks) and failing before even trying to pass anything to the ignition.go
There seem to be two scenarios for this validator and only one of the fails. Looks like the flow for cluster creation is bypassing the validator function and the flow for cluster update is correct. +++ Scenario 1 Initial ACI contains * 2 Cluster Networks (IPv4 + IPv6) * 2 Service Networks (IPv4 + IPv6) * 1 Machine Network (IPv4) In this case the validation is not failing and we get the cluster in ``` - lastProbeTime: "2021-10-08T12:19:37Z" lastTransitionTime: "2021-10-08T12:19:37Z" message: SyncOK reason: SyncOK status: "True" type: SpecSynced - lastProbeTime: "2021-10-08T12:19:37Z" lastTransitionTime: "2021-10-08T12:19:37Z" message: 'The cluster''s validations are failing: Clusters must have exactly 3 dedicated masters. Please either add hosts, or disable the worker host,Hosts have not been discovered yet,Hosts have not been discovered yet,Hosts have not been discovered yet' reason: ValidationsFailing status: "False" type: Validated ``` +++ Scenario 2 Initial ACI contains * 2 Cluster Networks (IPv4 + IPv6) * 2 Service Networks (IPv4 + IPv6) * 2 Machine Networks (IPv4 + IPv6) An updated ACI contains * 2 Cluster Networks (IPv4 + IPv6) * 2 Service Networks (IPv4 + IPv6) * 1 Machine Network (IPv4) In this case the validation fails and we see ``` - lastProbeTime: "2021-10-08T12:16:37Z" lastTransitionTime: "2021-10-08T12:16:37Z" message: 'The Spec could not be synced due to an input error: Expected 2 machine networks, found 1' reason: InputError status: "False" type: SpecSynced ```
Creating a cluster via ACI does not include Machine Networks in the `params.NewClusterParams`, the object looks like this ``` time="2021-10-08T12:58:12Z" level=info msg="CHOCOBOMB: Creating cluster with params: &{AdditionalNtpSource:<nil> BaseDNSDomain:hive.example.com ClusterNetworkCidr:<nil> Clu sterNetworkHostPrefix:0 ClusterNetworks:[0xc0025f9830 0xc0025f9860] CPUArchitecture: DiskEncryption:<nil> HighAvailabilityMode:<nil> HTTPProxy:<nil> HTTPSProxy:<nil> Hypert hreading:<nil> IngressVip:192.168.111.101 MachineNetworks:[] Name:0xc001fcd710 NetworkType:<nil> NoProxy:<nil> OcpReleaseImage: OlmOperators:[] OpenshiftVersion:0xc001fcd72 0 Platform:<nil> PullSecret:0xc001fcd730 SchedulableMasters:<nil> ServiceNetworkCidr:<nil> ServiceNetworks:[0xc0015f2ae0 0xc0015f2b00] SSHPublicKey:ssh-rsa AAAAB3NzaC1yc2EA AAADAQABAAABgQC1b/IibQkel9sU5OYuNkoL3qda0vzgx2Sb2lmF5hFsZ3L2D+w+Ixkwjw1g0jQAsQ+00rlKYgdxVmUWYpGE2ZKLQ75kHzs4qChupTMb1rJL5YH8xVeKuCN86WkW2rn5vT7gY8r+m/odCBkL4WQDxGVXdHcevhO6 klehsb2PdhqKkbm+xNMrHSOWOnxbV2O7U4VdWgHMcPt9vlSf4ewNHMNer0cTmmqIIg9Lqbp5p8zcM20uSdMQBjar+A2PHu29CyjqVMczu7S6G/DLbTG4GnovcPJwOiNUgOLEt13kNLRbODXl610DmESS4Si4bAZvi555fXmoAgrW 4uLCZ8zOEgMaz+G6yhcMqJ47WjznhbJRJeWmqz3pjd+252SCrznAmXrbD/mpjYZulDLPIejENJzd7LRBp3DBDQtgrWeP+04CosNYD2vXWV+Xlofd/uSdVzyY+kKkuatGx7R13PHK+WlgxW3albEPEgz8T+3IRKNNfDmwtEem6R0K AhTuC0volGk= root.lab.eng.rdu2.redhat.com UserManagedNetworking:0xc0018d5ba9 VipDhcpAllocation:0xc0018d5ba8}" func="github.com/openshift/assisted-service/inter nal/bminventory.(*bareMetalInventory).RegisterClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:477" cluster_id=ba34e502-a244-440 6-961e-a9fd78a3c0fa go-id=592 pkg=Inventory request_id=f5508ad2-c4a7-4734-b036-cf9fffd8db9d [...] time="2021-10-08T12:58:12Z" level=info msg="ClusterDeployment Reconcile started" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeplo$ mentsReconciler).Reconcile" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:117" cluster_deployment=dual-aci clu$ ter_deployment_namespace=assisted-installer-2 go-id=592 request_id=ed803828-2f16-476a-a868-f1bab0f5864e time="2021-10-08T12:58:12Z" level=info msg="update cluster ba34e502-a244-4406-961e-a9fd78a3c0fa with params: &{AdditionalNtpSource:<nil> APIVip:0xc001b2a820 APIVipDNSName:$ nil> BaseDNSDomain:<nil> ClusterNetworkCidr:<nil> ClusterNetworkHostPrefix:<nil> ClusterNetworks:[] DiskEncryption:<nil> HTTPProxy:<nil> HTTPSProxy:<nil> Hyperthreading:<ni l> IngressVip:<nil> MachineNetworkCidr:<nil> MachineNetworks:[0xc001c09dc0] Name:<nil> NetworkType:0xc001b2a810 NoProxy:<nil> OlmOperators:[] Platform:<nil> PullSecret:<nil > SchedulableMasters:<nil> ServiceNetworkCidr:<nil> ServiceNetworks:[] SSHPublicKey:<nil> UserManagedNetworking:<nil> VipDhcpAllocation:<nil>}" func="github.com/openshift/a ssisted-service/internal/bminventory.(*bareMetalInventory).v2UpdateClusterInternal" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2375" go-id= 592 pkg=Inventory request_id=ed803828-2f16-476a-a868-f1bab0f5864e time="2021-10-08T12:58:12Z" level=info msg="CHOCOBOMB: Updating cluster with params: &{AdditionalNtpSource:<nil> APIVip:0xc001b2a820 APIVipDNSName:<nil> BaseDNSDomain:<nil> ClusterNetworkCidr:<nil> ClusterNetworkHostPrefix:<nil> ClusterNetworks:[] DiskEncryption:<nil> HTTPProxy:<nil> HTTPSProxy:<nil> Hyperthreading:<nil> IngressVip:<nil> Mach ineNetworkCidr:<nil> MachineNetworks:[0xc001c09dc0] Name:<nil> NetworkType:0xc001b2a810 NoProxy:<nil> OlmOperators:[] Platform:<nil> PullSecret:<nil> SchedulableMasters:<ni l> ServiceNetworkCidr:<nil> ServiceNetworks:[] SSHPublicKey:<nil> UserManagedNetworking:<nil> VipDhcpAllocation:<nil>}" func="github.com/openshift/assisted-service/internal /bminventory.(*bareMetalInventory).validateAndUpdateClusterParams" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2153" go-id=592 pkg=Inventory request_id=ed803828-2f16-476a-a868-f1bab0f5864e time="2021-10-08T12:58:12Z" level=info msg="Updated clusterDeployment assisted-installer-2/dual-aci" func="github.com/openshift/assisted-service/internal/controller/control lers.(*ClusterDeploymentsReconciler).updateIfNeeded" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:739" agent_c luster_install=dual-aci agent_cluster_install_namespace=assisted-installer-2 cluster_deployment=dual-aci cluster_deployment_namespace=assisted-installer-2 go-id=592 request _id=ed803828-2f16-476a-a868-f1bab0f5864e time="2021-10-08T12:58:12Z" level=info msg="ClusterDeployment Reconcile ended" func="github.com/openshift/assisted-service/internal/controller/controllers.(*ClusterDeployme ntsReconciler).Reconcile.func1" file="/go/src/github.com/openshift/origin/internal/controller/controllers/clusterdeployments_controller.go:114" agent_cluster_install=dual-a ci agent_cluster_install_namespace=assisted-installer-2 cluster_deployment=dual-aci cluster_deployment_namespace=assisted-installer-2 go-id=592 request_id=ed803828-2f16-476 a-a868-f1bab0f5864e ```
G2Bsync 941430821 comment CrystalChun Tue, 12 Oct 2021 20:04:17 UTC G2Bsync PRs are already merged https://github.com/openshift/assisted-service/pull/2731 https://github.com/openshift/assisted-service/pull/2660
Verified using ACM 2.5.0-DOWNSTREAM-2022-04-11-09-21-38 on creation of dual-stack ACI with single machineNetwork: oc get agentclusterinstalls.extensions.hive.openshift.io spoke-0 -o json | jq '.spec.networking' { "clusterNetwork": [ { "cidr": "10.128.0.0/14", "hostPrefix": 23 }, { "cidr": "fd01::/48", "hostPrefix": 64 } ], "machineNetwork": [ { "cidr": "fd2e:6f44:5dd8:5::/64" } ], "serviceNetwork": [ "172.30.0.0/16", "fd02::/112" ] } spec is not synced with a clear messsage: oc get agentclusterinstalls.extensions.hive.openshift.io spoke-0 -o json | jq '.status.conditions | map(select(.type=="SpecSynced"))' [ { "lastProbeTime": "2022-04-13T07:08:04Z", "lastTransitionTime": "2022-04-13T07:08:04Z", "message": "The Spec could not be synced due to an input error: Expected 2 machine networks, found 1", "reason": "InputError", "status": "False", "type": "SpecSynced" } ] Patching in the missing machine CIDR allows the ACI to sync. Updating the ACI to remove the second machine CIDR brings the ACI back to not-synced state, so this flow did not regress