Summary: | some defined rootVolumes zones not used on installation | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | rlobillo |
Component: | Installer | Assignee: | Mike Fedosin <mfedosin> |
Installer sub component: | OpenShift on OpenStack | QA Contact: | rlobillo |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | low | CC: | emacchi, juriarte, m.andre, mfedosin, mhamzy, wduan |
Version: | 4.8 | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:00:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
rlobillo
2021-04-15 12:04:13 UTC
I think the way it work is that it'll try to schedule a volume in the first AZ, and if not possible it'll try the second one. If that so, i'm not sure we want to match the number of Nova zones with the Cinder zones. In real world, AZ are used for Edge type deployments, where one cluster is deployed per Edge site, and the workers are in the same zone, including its servers and volumes, so in your case you wouldn't have CinderAZ1 defined. @mfedosin what do you think? Talked to Mike, we agreed that it's a LOW bug for now. Until we get proper validation, we can workaround with a doc patch saying that if you have one AZ for Nova, you want one AZ for Cinder too. I was using: rootVolume: size: 30 type: sdd Which worked for 4.7 but errors in 4.8: level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal Is this by design? I met the same error when install 4.8.0-0.nightly-2021-04-19-121657. 09:28:27 [INFO] Generating manifests files..... 09:28:30 level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] I did not set the rootVolume.zones when installing, and we have nova AZ for both compute and cinder. After checking with @rlobillo, assign it back. (In reply to Mark Hamzy from comment #4) > I was using: > rootVolume: > size: 30 > type: sdd > > Which worked for 4.7 but errors in 4.8: > level=fatal msg=failed to fetch Master Machines: failed to load asset > "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid > value: []string(nil): there must be either just one volume availability zone > common to all nodes or the number of compute and volume availability zones > must be equal > > Is this by design? It's not. The new validation was wrongly failing when the user specified a nova AZ but no cinder AZ. This should be fixed by https://github.com/openshift/installer/pull/4860. Verified on 4.8.0-0.nightly-2021-05-15-141455 #Test 1: This config: compute: - name: worker platform: openstack: zones: ['AZ-0'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ1', 'cinderAZ0'] replicas: 3 triggers below error (As expected): FATAL failed to fetch Metadata: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] #Test 2: If there are several nova AZs and several cinder AZs but the length of the these lists is not equal, the installer fails too (as expected): - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1'] replicas: 1 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1'] replicas: 3 FATAL failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] #Test 3: Configuring several nova AZs and only one cinder AZ: - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0'] replicas: 3 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0'] replicas: 3 generates below manifests: ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: cinderAZ0 # Test 4: same number of cinder and nova AZs: compute: - name: worker platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1','cinderAZ0'] replicas: 1 controlPlane: name: master platform: openstack: zones: ['AZ-0','AZ-1','AZ-2'] additionalNetworkIDs: [] rootVolume: size: 25 type: tripleo zones: ['cinderAZ0','cinderAZ1','cinderAZ0'] replicas: 3 Output: ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml: availabilityZone: cinderAZ1 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: AZ-0 ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: AZ-1 ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml: availabilityZone: cinderAZ1 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: AZ-2 ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml: availabilityZone: cinderAZ0 ostest/openshift/99_openshift-cluster-api_worker-machineset-3.yaml: availabilityZone: AZ-1 Backward compatibility also checked on OCP QE CI. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |