Hide Forgot
Created attachment 1904978 [details] install-config.yaml Description: When controlPlane.replicas = 1 and compute.replicas is missing from install-config.yaml, assisted-service validation fails because it is looking for more worker nodes and expecting api_vip and ingress_vip to be set. It doesn't recognize that I'm trying to install a SNO cluster. Only when compute.replicas is set to 0 does it recognize it to be SNO. We should add a validation to warn users that compute.replicas needs to be set to 0 if controlPlane.replicas = 1. Steps to reproduce: 1. Create agent.iso using install-config.yaml and agent-config.yaml 2. Deploy a SNO cluster using agent.iso 3. openshift-install agent wait-for install-complete Expected: Cluster installation is successful Actual: Validation fails [rwsu@hardprov-fx2-22 openshift-installer]$ ./openshift-install agent wait-for install-complete INFO Waiting for cluster install to initialize. Sleeping for 30 seconds INFO Waiting for cluster install to initialize. Sleeping for 30 seconds INFO Waiting for cluster install to initialize. Sleeping for 30 seconds INFO Cluster is not ready for install. Check host validations WARNING Cluster has stopped installing... working to recover installation WARNING Cluster has stopped installing... working to recover installation WARNING Cluster has stopped installing... working to recover installation INFO Checking for validation failures ---------------------------------------------- ERROR Validation failure found for cluster category=hosts-data label=all-hosts-are-ready-to-install message=The cluster has hosts that are not ready to install. ERROR Validation failure found for cluster category=hosts-data label=sufficient-masters-count message=Clusters must have exactly 3 dedicated masters and if workers are added, there should be at least 2 workers. Please check your configuration and add or remove hosts as to meet the above requirement. ERROR Validation failure found for cluster category=network label=Machine CIDR message=The Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs. ERROR Validation failure found for cluster category=network label=api-vip-defined message=The API virtual IP is undefined and must be provided. ERROR Validation failure found for cluster category=network label=ingress-vip-defined message=The Ingress virtual IP is undefined and must be provided. INFO Checking for validation failures ---------------------------------------------- ERROR Validation failure found for cluster category=hosts-data label=all-hosts-are-ready-to-install message=The cluster has hosts that are not ready to install. ERROR Validation failure found for cluster category=hosts-data label=sufficient-masters-count message=Clusters must have exactly 3 dedicated masters and if workers are added, there should be at least 2 workers. Please check your configuration and add or remove hosts as to meet the above requirement. ERROR Validation failure found for cluster category=network label=Machine CIDR message=The Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs. ERROR Validation failure found for cluster category=network label=api-vip-defined message=The API virtual IP is undefined and must be provided. ERROR Validation failure found for cluster category=network label=ingress-vip-defined message=The Ingress virtual IP is undefined and must be provided. ERROR Validation failure found for control1.ostest.test.metalkube.org category=network label=DNS wildcard not configured message=Parse error for domain name resolutions result ERROR Validation failure found for control1.ostest.test.metalkube.org category=network label=Machine CIDR message=Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs ERROR Validation failure found for control1.ostest.test.metalkube.org category=network label=NTP synchronization message=Host couldn't synchronize with any NTP server INFO Checking for validation failures ---------------------------------------------- ERROR Validation failure found for cluster category=hosts-data label=all-hosts-are-ready-to-install message=The cluster has hosts that are not ready to install. ERROR Validation failure found for cluster category=hosts-data label=sufficient-masters-count message=Clusters must have exactly 3 dedicated masters and if workers are added, there should be at least 2 workers. Please check your configuration and add or remove hosts as to meet the above requirement. ERROR Validation failure found for cluster category=network label=Machine CIDR message=The Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs. ERROR Validation failure found for cluster category=network label=api-vip-defined message=The API virtual IP is undefined and must be provided. ERROR Validation failure found for cluster category=network label=ingress-vip-defined message=The Ingress virtual IP is undefined and must be provided. ERROR Validation failure found for control1.ostest.test.metalkube.org category=network label=Machine CIDR message=Machine Network CIDR is undefined; the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs ERROR Validation failure found for control1.ostest.test.metalkube.org category=network label=NTP synchronization message=Host couldn't synchronize with any NTP server
Created attachment 1904979 [details] agent-config.yaml
PR https://github.com/openshift/installer/pull/6223
It is still failing in case of compute.replicas missing from install-config.yaml. DEBUG OpenShift Installer unreleased-master-7004-g1fb1397635c89ff8b3645fed4c4c264e4119fa84-dirty DEBUG Built from commit 1fb1397635c89ff8b3645fed4c4c264e4119fa84 DEBUG Fetching Agent Installer ISO... DEBUG Loading Agent Installer ISO... DEBUG Loading Agent Installer Ignition... DEBUG Loading Agent Manifests... DEBUG Loading Agent PullSecret... DEBUG Loading Install Config... panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x321d83b] goroutine 1 [running]: github.com/openshift/installer/pkg/asset/agent.(*OptionalInstallConfig).validateSNOConfiguration(0x2?, 0xc000fc7400) /home/mhans/installer/pkg/asset/agent/installconfig.go:169 +0x81b github.com/openshift/installer/pkg/asset/agent.(*OptionalInstallConfig).validateInstallConfig(0xc00123ede8?, 0x1ab1aa00?) /home/mhans/installer/pkg/asset/agent/installconfig.go:107 +0x1a5 github.com/openshift/installer/pkg/asset/agent.(*OptionalInstallConfig).Load(0xc000437f00, {0x1ab1aa00, 0xc00083d2b0}) /home/mhans/installer/pkg/asset/agent/installconfig.go:62 +0x45 github.com/openshift/installer/pkg/asset/store.(*storeImpl).load(0xc000b1e510, {0x1ab21ed0, 0xc000437d80}, {0xc000335e18, 0x8}) /home/mhans/installer/pkg/asset/store/store.go:264 +0x2b2 github.com/openshift/installer/pkg/asset/store.(*storeImpl).load(0xc000b1e510, {0x1ab21ff0, 0xc000141c10}, {0xc000335de6, 0x6}) /home/mhans/installer/pkg/asset/store/store.go:247 +0xc05
With PR https://github.com/openshift/installer/pull/6462, the validation message in the case when compute.replicas missing from install-config.yaml, will be FATAL failed to fetch Agent Installer ISO: failed to load asset "Install Config": invalid install-config configuration: Compute.Replicas: Required value: Total number of Compute.Replicas must be 0 for none platform. Found 3 The installer's default install config settings sets the Compute.Replicas to 3 hence the error message saying "Found 3". Sample install config 1: Compute is missing altogether apiVersion: v1 baseDomain: test.metalkube.org controlPlane: hyperthreading: Enabled name: master replicas: 1 metadata: namespace: cluster-0 name: ostest networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OVNKubernetes machineNetwork: - cidr: 192.168.122.0/23 serviceNetwork: - 172.30.0.0/16 platform: none: {} fips: false pullSecret: sshKey: Sample install config 2: Only Compute.Replicas are missing apiVersion: v1 baseDomain: test.metalkube.org compute: - hyperthreading: Enabled name: worker controlPlane: hyperthreading: Enabled name: master metadata: namespace: cluster-0 name: ostest networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 networkType: OVNKubernetes machineNetwork: - cidr: 192.168.122.0/23 serviceNetwork: - 172.30.0.0/16 platform: none: {} fips: false pullSecret: sshKey:
Bug has been verified with master branch. It's working as expected.