Bug 1949923 - some defined rootVolumes zones not used on installation
Summary: some defined rootVolumes zones not used on installation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: All
OS: All
low
high
Target Milestone: ---
: 4.8.0
Assignee: Mike Fedosin
QA Contact: rlobillo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-15 12:04 UTC by rlobillo
Modified: 2021-07-27 23:01 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:00:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4851 0 None open Bug 1949923: OpenStack: validate root volume availability zones 2021-04-15 15:17:04 UTC
Github openshift installer pull 4860 0 None open Bug 1949923: validate root volume AZs only if they are set 2021-04-20 11:02:34 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:01:12 UTC

Description rlobillo 2021-04-15 12:04:13 UTC
Version: 4.8.0-0.nightly-2021-04-13-171608
Platform: OSP13

Defining below section on install-config.yaml:

compute:
- name: worker
  platform:
    openstack:
      zones: ['AZ-0']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ1', 'cinderAZ0']
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0']
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ1', 'cinderAZ0']
  replicas: 3

Leads to below assignations on the resulting OCP cluster:
master-0: [novaAZ: AZ-0, cinderAZ: cinderAZ1]
master-1: [novaAZ: AZ-0, cinderAZ: cinderAZ1]
master-2: [novaAZ: AZ-0, cinderAZ: cinderAZ1]
worker-0: [novaAZ: AZ-0, cinderAZ: cinderAZ1]
worker-1: [novaAZ: AZ-0, cinderAZ: cinderAZ1]
worker-2: [novaAZ: AZ-0, cinderAZ: cinderAZ1]

Therefore, there are not any rootVolume created on cinderAZ0, as it is indicated on the install-config.yaml.

What did you expect to happen?

The install-config.yaml above should fail the preflight checks as there are cinder AZs that will not be used in any case. The user must be instructed to define a number of Nova zones equal to or greater than the number of Cinder zones.

Additionally, the expected pairing between the nova and cinder zones must be properly documented, as the current behavior is misleading and it is required to understand the implementation details to predict the resulting assignments.

Comment 1 Emilien Macchi 2021-04-15 13:57:07 UTC
I think the way it work is that it'll try to schedule a volume in the first AZ, and if not possible it'll try the second one. If that so, i'm not sure we want to match the number of Nova zones with the Cinder zones.

In real world, AZ are used for Edge type deployments, where one cluster is deployed per Edge site, and the workers are in the same zone, including its servers and volumes, so in your case you wouldn't have CinderAZ1 defined.

@mfedosin what do you think?

Comment 2 Emilien Macchi 2021-04-15 14:10:38 UTC
Talked to Mike, we agreed that it's a LOW bug for now.
Until we get proper validation, we can workaround with a doc patch saying that if you have one AZ for Nova, you want one AZ for Cinder too.

Comment 4 Mark Hamzy 2021-04-18 01:06:57 UTC
I was using:
      rootVolume:
        size: 30
        type: sdd

Which worked for 4.7 but errors in 4.8:
level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal

Is this by design?

Comment 5 Wei Duan 2021-04-20 08:53:51 UTC
I met the same error when install 4.8.0-0.nightly-2021-04-19-121657.

09:28:27  [INFO] Generating manifests files.....
09:28:30  level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string(nil): there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal]

I did not set the rootVolume.zones when installing, and we have nova AZ for both compute and cinder.
After checking with @rlobillo, assign it back.

Comment 7 Martin André 2021-05-03 09:15:50 UTC
(In reply to Mark Hamzy from comment #4)
> I was using:
>       rootVolume:
>         size: 30
>         type: sdd
> 
> Which worked for 4.7 but errors in 4.8:
> level=fatal msg=failed to fetch Master Machines: failed to load asset
> "Install Config": controlPlane.platform.openstack.rootVolume.zones: Invalid
> value: []string(nil): there must be either just one volume availability zone
> common to all nodes or the number of compute and volume availability zones
> must be equal
> 
> Is this by design?

It's not. The new validation was wrongly failing when the user specified a nova AZ but no cinder AZ. This should be fixed by https://github.com/openshift/installer/pull/4860.

Comment 9 rlobillo 2021-05-17 11:38:39 UTC
Verified on 4.8.0-0.nightly-2021-05-15-141455


#Test 1: This config:

compute:
- name: worker
  platform:
    openstack:
      zones: ['AZ-0']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ1', 'cinderAZ0']
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0']
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ1', 'cinderAZ0']
  replicas: 3

triggers below error (As expected):


FATAL failed to fetch Metadata: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] 

#Test 2: If there are several nova AZs and several cinder AZs but the length of the these lists is not equal, the installer fails too (as expected):

- name: worker
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0','cinderAZ1']
  replicas: 1
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0','cinderAZ1']
  replicas: 3

FATAL failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal, compute[0].platform.openstack.rootVolume.zones: Invalid value: []string{"cinderAZ0", "cinderAZ1"}: there must be either just one volume availability zone common to all nodes or the number of compute and volume availability zones must be equal] 


#Test 3: Configuring several nova AZs and only one cinder AZ:

- name: worker
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0']
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0']
  replicas: 3


generates below manifests:

ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml:      availabilityZone: AZ-0
ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml:        availabilityZone: cinderAZ0
ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml:      availabilityZone: AZ-1
ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml:        availabilityZone: cinderAZ0
ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml:      availabilityZone: AZ-2
ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml:        availabilityZone: cinderAZ0
ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml:          availabilityZone: AZ-0
ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml:            availabilityZone: cinderAZ0
ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml:          availabilityZone: AZ-1
ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml:            availabilityZone: cinderAZ0
ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml:          availabilityZone: AZ-2
ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml:            availabilityZone: cinderAZ0


# Test 4: same number of cinder and nova AZs:

compute:
- name: worker
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0','cinderAZ1','cinderAZ0']
  replicas: 1
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
      rootVolume:
        size: 25
        type: tripleo
        zones: ['cinderAZ0','cinderAZ1','cinderAZ0']
  replicas: 3

Output: 

ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml:      availabilityZone: AZ-0
ostest/openshift/99_openshift-cluster-api_master-machines-0.yaml:        availabilityZone: cinderAZ0                                                                                         
ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml:      availabilityZone: AZ-1
ostest/openshift/99_openshift-cluster-api_master-machines-1.yaml:        availabilityZone: cinderAZ1                                                                                         
ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml:      availabilityZone: AZ-2
ostest/openshift/99_openshift-cluster-api_master-machines-2.yaml:        availabilityZone: cinderAZ0                                                                                         
ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml:          availabilityZone: AZ-0                                                                                          
ostest/openshift/99_openshift-cluster-api_worker-machineset-0.yaml:            availabilityZone: cinderAZ0                                                                                   
ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml:          availabilityZone: AZ-1                                                                                          
ostest/openshift/99_openshift-cluster-api_worker-machineset-1.yaml:            availabilityZone: cinderAZ1                                                                                   
ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml:          availabilityZone: AZ-2                                                                                          
ostest/openshift/99_openshift-cluster-api_worker-machineset-2.yaml:            availabilityZone: cinderAZ0                                                                                   
ostest/openshift/99_openshift-cluster-api_worker-machineset-3.yaml:          availabilityZone: AZ-1                                                


Backward compatibility also checked on OCP QE CI.

Comment 12 errata-xmlrpc 2021-07-27 23:00:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.