Bug 1473858
Summary: | Installer does not configure flannel correctly for openstack installs. | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> | |
Component: | Installer | Assignee: | Scott Dodson <sdodson> | |
Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.5.0 | CC: | aos-bugs, eminguez, erich, jack.ottofaro, jokerman, misalunk, mmccomas, sdodson | |
Target Milestone: | --- | |||
Target Release: | 3.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch | Doc Type: | Bug Fix | |
Doc Text: |
The flannel network was previously defined using the same subnet as the kubernetes services subnet. This caused a conflict between services and SDN networks. The flannel network is now correctly defined by the osm_cluster_network_cidr variable.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1491412 1491413 1594310 (view as bug list) | Environment: | ||
Last Closed: | 2017-11-28 22:05:05 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1490388, 1491412, 1491413, 1594310 |
Description
Ryan Howe
2017-07-21 21:39:00 UTC
3.6 as well. Tested with openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm Installation failed: TASK [flannel_register : Generate etcd configuration for etcd] ***************** Monday 18 September 2017 06:37:14 +0000 (0:00:00.130) 0:05:32.474 ****** fatal: [host-8-241-75.host.centralci.eng.rdu2.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText' After setting -> osm_cluster_network_cidr=10.128.0.0/14 osm_host_subnet_length=9 I get the following error -> AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText' The following helped to set the CIDR -> openshift.master.sdn_cluster_network_cidr=10.128.0.0/14 openshift.master.sdn_host_subnet_length=9 So, I've created a cluster from scratch using the latest 3.6 bits + modifying manually the files provided in the PR, and using the default values (so I didn't touched osm_cluster_network_cidr nor osm_host_subnet_length) and it worked for me: # alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"' # oetcdctl get /openshift.com/network/config { "Network": "10.128.0.0/14", "SubnetLen": 23, "Backend": { "Type": "host-gw" } } # oetcdctl ls /openshift.com/network/subnets /openshift.com/network/subnets/10.128.10.0-23 /openshift.com/network/subnets/10.128.108.0-23 /openshift.com/network/subnets/10.128.118.0-23 /openshift.com/network/subnets/10.128.28.0-23 /openshift.com/network/subnets/10.128.40.0-23 /openshift.com/network/subnets/10.128.140.0-23 /openshift.com/network/subnets/10.128.98.0-23 /openshift.com/network/subnets/10.128.12.0-23 # oetcdctl get /openshift.com/network/subnets/10.128.98.0-23 {"PublicIP":"192.168.98.10","BackendType":"host-gw"} Is there any other modification that can affect that since the version in GA right now (the one I've used)? $ rpm -qf /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/flannel-config.json openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch Eduardo, your change is only on the master branch you'll want to test a 3.7 version of the installer, right? (In reply to Scott Dodson from comment #8) > Eduardo, your change is only on the master branch you'll want to test a 3.7 > version of the installer, right? I can do it if needed, but the thing is I think it should be backported to older releases as well. I've tested 3.6 + manually patching those files as is the GA bits I can use (IDK how to test 3.7 TBH) (In reply to Eduardo Minguez from comment #7) > So, I've created a cluster from scratch using the latest 3.6 bits + > modifying manually the files provided in the PR, and using the default > values (so I didn't touched osm_cluster_network_cidr nor > osm_host_subnet_length) and it worked for me: > > > # alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt > --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt > --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel. > com:2379,https://master-2.edu.flannel.com:2379"' > # oetcdctl get /openshift.com/network/config > { > "Network": "10.128.0.0/14", > "SubnetLen": 23, > "Backend": { > "Type": "host-gw" > } > } > # oetcdctl ls /openshift.com/network/subnets > /openshift.com/network/subnets/10.128.10.0-23 > /openshift.com/network/subnets/10.128.108.0-23 > /openshift.com/network/subnets/10.128.118.0-23 > /openshift.com/network/subnets/10.128.28.0-23 > /openshift.com/network/subnets/10.128.40.0-23 > /openshift.com/network/subnets/10.128.140.0-23 > /openshift.com/network/subnets/10.128.98.0-23 > /openshift.com/network/subnets/10.128.12.0-23 > # oetcdctl get /openshift.com/network/subnets/10.128.98.0-23 > {"PublicIP":"192.168.98.10","BackendType":"host-gw"} > > Is there any other modification that can affect that since the version in GA > right now (the one I've used)? > > $ rpm -qf > /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/ > flannel-config.json > openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch So was it that the following values we used were ignored and the default values were taken ? openshift.master.sdn_cluster_network_cidr=10.128.0.0/14 openshift.master.sdn_host_subnet_length=9 Also what if a customer want set custom cluster cidr and host subnet length ? So, I did deployed the OCP cluster not setting any values to pod network nor services network to check if it worked with default values (the most common scenario AFAIK) I didn't have the chance to test it with different values for the cidr or subnet yet. I've tested with custom values and it failed. I've created a new PR[1] that fixes the issue in my tests [cloud-user@bastion ~]$ grep -E 'osm|portal' /etc/ansible/hosts osm_default_node_selector="role=app" osm_use_cockpit=true osm_cluster_network_cidr=10.130.0.0/14 osm_host_subnet_length=8 openshift_portal_net=10.111.0.0/16 After the installation: [root@master-0 ~]# alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"' [root@master-0 ~]# oetcdctl get /openshift.com/network/config { "Network": "10.130.0.0/14", "SubnetLen": 24, "Backend": { "Type": "host-gw" } } But, the subnets assigned to the nodes are on different subnet: [root@master-0 ~]# oetcdctl ls /openshift.com/network/subnets /openshift.com/network/subnets/10.128.83.0-24 /openshift.com/network/subnets/10.128.18.0-24 /openshift.com/network/subnets/10.128.77.0-24 /openshift.com/network/subnets/10.128.101.0-24 /openshift.com/network/subnets/10.128.20.0-24 /openshift.com/network/subnets/10.128.92.0-24 /openshift.com/network/subnets/10.128.58.0-24 /openshift.com/network/subnets/10.128.48.0-24 I think I will need some help with that, as TBH I'm not an openshift-ansible expert. [1] https://github.com/openshift/openshift-ansible/pull/5493 So the PR has been merged as the subnets were ok (it was just me not knowing how to subnet :D) Is there anything I can do in order to push it? Will it be backported to <3.7 releases? Thx! PRs against release-3.6, release-1.5, and release-1.4 branches would be helpful. I'll try to get to those today if you don't. We should include both of your fixes in each of those PRs. (In reply to Scott Dodson from comment #14) > PRs against release-3.6, release-1.5, and release-1.4 branches would be > helpful. I'll try to get to those today if you don't. We should include both > of your fixes in each of those PRs. I think I got it: * PR against release-1.4 -> https://github.com/openshift/openshift-ansible/pull/5592 * PR against release-1.5 -> https://github.com/openshift/openshift-ansible/pull/5591 * PR against release-3.6 -> https://github.com/openshift/openshift-ansible/pull/5590 Thanks! Verified with openshift-ansible-3.7.0-0.147.0.git.0.2fb41ee.el7.noarch.rpm Both default vales and custom values works. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |