Bug 1451023
Summary: | Changes to the default clusterNetworkCIDR & hostSubnetLength via installer does not take in account old default value when adding new master. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
Component: | Installer | Assignee: | Andrew Butcher <abutcher> |
Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 3.4.0 | CC: | aos-bugs, erich, jiajliu, jokerman, mmccomas, mwoodson, rhowe, sdodson, smilner, weshi |
Target Milestone: | --- | Keywords: | NeedsTestCase |
Target Release: | 3.7.0 | Flags: | jiajliu:
needinfo-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch | Doc Type: | Bug Fix |
Doc Text: |
Cause:
When upgrading between versions (specifically 3.3/1.3 or earlier to 3.4 or later) the default values for clusterNetworkCIDR and hostSubnetLength changed. If the inventory file didn't specify corresponding inventory variables the upgrade will fail.
Consequence:
Controller service fails to start back up.
Fix:
The following are now required inventory variables when upgrading or installing:
- osm_cluster_network_cidr
- osm_host_subnet_length
- openshift_portal_net
Result:
The the required variables are not set the upgrade/install will stop early and let the admin know the variables must be set and where they can find the corresponding values.
Message:
osm_cluster_network_cidr, osm_host_subnet_length, and openshift_portal_net are required inventory
variables when upgrading. These variables should match what is currently used in the cluster. If
you don't remember what these values are you can find them in /etc/origin/master/master-config.yaml
on a master with the names clusterNetworkCIDR (osm_cluster_network_cidr),
hostSubnetLength (osm_host_subnet_length), and serviceNetworkCIDR (openshift_portal_net).
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-28 21:54:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ryan Howe
2017-05-15 15:03:50 UTC
Change to the defaults. https://github.com/openshift/openshift-ansible/commit/b50b4ea0b03feb9431abd7294fe4fb6b549ddfc0 A workaround is of course to set osm_cluster_network_cidr and osm_host_subnet_length to the old values before running the scaleup playbook. While we should fix this I'm lowering severity to medium based on the easy workaround. Added an inventory check before upgrade which makes sure that these two variables are explicitly set. This is what it looks like when one or both are not set via inventory: Tuesday 29 August 2017 11:35:50 -0400 (0:00:00.013) 0:00:03.567 ******** fatal: [192.168.124.234]: FAILED! => { "assertion": "osm_cluster_network_cidr is defined", "changed": false, "evaluated_to": false, "failed": true } MSG: osm_cluster_network_cidr and openshift_portal_net are required inventory variables when upgrading. These variables should match what is currently used in the cluster. If you don't remember what these values are yo u can find them in /etc/origin/master/master-config.yaml on a master with the names clusterNetworkCIDR(osm_cluster_network_cidr) and hostSubnetLength (openshift_portal_net). PR: https://github.com/openshift/openshift-ansible/pull/5256 Merged Updated message: https://github.com/openshift/openshift-ansible/pull/5386 Added doc text. From QE's perspective, the resolution described in comment 2 should be the best one, and the fix should be landed into scale up playbook, but not upgrade playbook. > When performing a scaleup we need to read the CIDR values from an existing master then set that fact on the scaled up masters. Even if can not implement that fix in short term, as a compromise, when performing a scaleup, once installer find CIDR values from an existing master is mismatched with the new CIDR values for new master, installer exit and prompt user to set osm_cluster_network_cidr and osm_host_subnet_length to the old values before running the scaleup playbook. From customer's perspective, when running upgrade, it is not reasonable to force user to set osm_cluster_network_cidr and osm_host_subnet_length into inventory host file. it is not relative to upgrade, but scaleup. So assign this bug ack. *** Bug 1493268 has been marked as a duplicate of this bug. *** The final fix for the issue: https://github.com/openshift/openshift-ansible/pull/5473 Moving to MODIFIED as it's not built into rpm package. Tested with openshift-ansible-3.7.0-0.134.0.git.0.6f43fc3.el7.noarch.rpm 1. Trigger master HA installation with original network parameters # cat inventory_host <--snip--> osm_cluster_network_cidr=11.0.0.0/16 osm_host_subnet_length=8 openshift_master_portal_net=172.31.0.0/16 <--snip--> 2. Removed above network parameters from inventory file 3. Scale up one master against the env above ##Result: Installation succeeded, but the master-controllers on the new master was indicating some errors: " failed to start SDN plugin controller: cannot change the serviceNetworkCIDR of an already-deployed cluster" Dig more found that the new master was still using the new default portal net: # grep -nri "serviceNetworkCIDR:" /etc/origin/master/master-config.yaml 162: serviceNetworkCIDR: 172.30.0.0/16 As `initialize_facts.yml` was executed prior to `set_network_facts.yml`, so installer would still take the default `portal_net` for the following tasks. ./playbooks/common/openshift-cluster/initialize_facts.yml:143: portal_net: "{{ openshift_portal_net | default(openshift_master_portal_net) | default(None) }}" Verified with openshift-ansible-3.7.0-0.147.0.git.0.2fb41ee.el7.noarch.rpm Test steps: 1. Trigger master HA installation with original network parameters # cat inventory_host <--snip--> osm_cluster_network_cidr=10.1.0.0/16 osm_host_subnet_length=8 openshift_portal_net=172.31.0.0/16 <--snip--> 2. Removed above network parameters from inventory file 3. Scale up one master against the env above 4. Check the new master parameters against the new master: # grep -E "NetworkCIDR|Length" /etc/origin/master/master-config.yaml clusterNetworkCIDR: 10.1.0.0/16 externalIPNetworkCIDRs: hostSubnetLength: 8 serviceNetworkCIDR: 172.31.0.0/16 5. Trigger S2I build against the new master, works well Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |