Hide Forgot
Version: 4.8.0-0.nightly-2021-04-13-171608 Platform: OSP13 Defining below section on install-config.yaml: compute: - name: worker platform: openstack: zones: ['AZ-0'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 Leads to below assignations on the resulting OCP cluster: master-0: [novaAZ: AZ-0, cinderAZ: cinderAZ1] master-1: [novaAZ: AZ-0, cinderAZ: cinderAZ1] master-2: [novaAZ: AZ-0, cinderAZ: cinderAZ1] worker-0: [novaAZ: AZ-0, cinderAZ: cinderAZ1] worker-1: [novaAZ: AZ-0, cinderAZ: cinderAZ1] worker-2: [novaAZ: AZ-0, cinderAZ: cinderAZ1] Therefore, there are not any rootVolume created on cinderAZ0, as it is indicated on the install-config.yaml. What did you expect to happen? The install-config.yaml above should fail the preflight checks as there are cinder AZs that will not be used in any case. The user must be instructed to define a number of Nova zones equal to or greater than the number of Cinder zones. Additionally, the expected pairing between the nova and cinder zones must be properly documented, as the current behavior is misleading and it is required to understand the implementation details to predict the resulting assignments.
I think the way it work is that it'll try to schedule a volume in the first AZ, and if not possible it'll try the second one. If that so, i'm not sure we want to match the number of Nova zones with the Cinder zones. In real world, AZ are used for Edge type deployments, where one cluster is deployed per Edge site, and the workers are in the same zone, including its servers and volumes, so in your case you wouldn't have CinderAZ1 defined. @mfedosin what do you think?
Talked to Mike, we agreed that it's a LOW bug for now. Until we get proper validation, we can workaround with a doc patch saying that if you have one AZ for Nova, you want one AZ for Cinder too.
I was using: rootVolume: size: 30 type: sdd Which worked for 4.7 but errors in 4.8: level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal Is this by design?
I met the same error when install 4.8.0-0.nightly-2021-04-19-121657. 09:28:27 [INFO] Generating manifests files..... 09:28:30 level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] I did not set the rootVolume.zones when installing, and we have nova AZ for both compute and cinder. After checking with @rlobillo, assign it back.
(In reply to Mark Hamzy from comment #4) > I was using: > rootVolume: > size: 30 > type: sdd > > Which worked for 4.7 but errors in 4.8: > level=fatal msg=failed to fetch Master Machines: failed to load asset > "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid > value: []string(nil): there must be either just one volume availability zone > common to all nodes or the number of compute and volume availability zones > must be equal > > Is this by design? It's not. The new validation was wrongly failing when the user specified a nova AZ but no cinder AZ. This should be fixed by https://github.com/openshift/installer/pull/4860.
Verified on 4.8.0-0.nightly-2021-05-15-141455 #Test 1: This config: compute: - name: worker platform: openstack: zones: ['AZ-0'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 triggers below error (As expected): FATAL failed to fetch Metadata: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] #Test 2: If there are several nova AZs and several cinder AZs but the length of the these lists is not equal, the installer fails too (as expected): - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1'] replicas: 1 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1'] replicas: 3 FATAL failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] #Test 3: Configuring several nova AZs and only one cinder AZ: - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0'] replicas: 3 generates below manifests: ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: cinderAZ0 # Test 4: same number of cinder and nova AZs: compute: - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1','cinderAZ0'] replicas: 1 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1','cinderAZ0'] replicas: 3 Output: ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: cinderAZ1 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: cinderAZ1 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-3.yaml: availabilityZone: AZ-1 Backward compatibility also checked on OCP QE CI.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438