Description of problem: For flannel installs we are setting the pod network = to the service network. Instead the pod network needs to be set to the value we pass for osm_cluster_network_cidr when configuring flannel. Version-Release number of the following components: OCP 3.5 Additional info: The flannel configuration uses the portal_net and default to 172.30.0.0/16. We also hard set the min network to 172.30.5.0 This value should be set via the installer host variables passed. https://github.com/openshift/openshift-ansible/blob/master/roles/flannel/tasks/main.yml#L15 https://github.com/openshift/openshift-ansible/blob/master/roles/flannel_register/defaults/main.yaml Kubernetes OpenShift Ansible_Installer ======== ======= ============= --cluster-cidr clusterNetworkCIDR osm_cluster_network_cidr --service-cluster-ip-range serviceNetworkCIDR openshift_port_net
3.6 as well.
Tested with openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm Installation failed: TASK [flannel_register : Generate etcd configuration for etcd] ***************** Monday 18 September 2017 06:37:14 +0000 (0:00:00.130) 0:05:32.474 ****** fatal: [host-8-241-75.host.centralci.eng.rdu2.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText'
After setting -> osm_cluster_network_cidr=10.128.0.0/14 osm_host_subnet_length=9 I get the following error -> AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText' The following helped to set the CIDR -> openshift.master.sdn_cluster_network_cidr=10.128.0.0/14 openshift.master.sdn_host_subnet_length=9
So, I've created a cluster from scratch using the latest 3.6 bits + modifying manually the files provided in the PR, and using the default values (so I didn't touched osm_cluster_network_cidr nor osm_host_subnet_length) and it worked for me: # alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"' # oetcdctl get /openshift.com/network/config { "Network": "10.128.0.0/14", "SubnetLen": 23, "Backend": { "Type": "host-gw" } } # oetcdctl ls /openshift.com/network/subnets /openshift.com/network/subnets/10.128.10.0-23 /openshift.com/network/subnets/10.128.108.0-23 /openshift.com/network/subnets/10.128.118.0-23 /openshift.com/network/subnets/10.128.28.0-23 /openshift.com/network/subnets/10.128.40.0-23 /openshift.com/network/subnets/10.128.140.0-23 /openshift.com/network/subnets/10.128.98.0-23 /openshift.com/network/subnets/10.128.12.0-23 # oetcdctl get /openshift.com/network/subnets/10.128.98.0-23 {"PublicIP":"192.168.98.10","BackendType":"host-gw"} Is there any other modification that can affect that since the version in GA right now (the one I've used)? $ rpm -qf /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/flannel-config.json openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch
Eduardo, your change is only on the master branch you'll want to test a 3.7 version of the installer, right?
(In reply to Scott Dodson from comment #8) > Eduardo, your change is only on the master branch you'll want to test a 3.7 > version of the installer, right? I can do it if needed, but the thing is I think it should be backported to older releases as well. I've tested 3.6 + manually patching those files as is the GA bits I can use (IDK how to test 3.7 TBH)
(In reply to Eduardo Minguez from comment #7) > So, I've created a cluster from scratch using the latest 3.6 bits + > modifying manually the files provided in the PR, and using the default > values (so I didn't touched osm_cluster_network_cidr nor > osm_host_subnet_length) and it worked for me: > > > # alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt > --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt > --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel. > com:2379,https://master-2.edu.flannel.com:2379"' > # oetcdctl get /openshift.com/network/config > { > "Network": "10.128.0.0/14", > "SubnetLen": 23, > "Backend": { > "Type": "host-gw" > } > } > # oetcdctl ls /openshift.com/network/subnets > /openshift.com/network/subnets/10.128.10.0-23 > /openshift.com/network/subnets/10.128.108.0-23 > /openshift.com/network/subnets/10.128.118.0-23 > /openshift.com/network/subnets/10.128.28.0-23 > /openshift.com/network/subnets/10.128.40.0-23 > /openshift.com/network/subnets/10.128.140.0-23 > /openshift.com/network/subnets/10.128.98.0-23 > /openshift.com/network/subnets/10.128.12.0-23 > # oetcdctl get /openshift.com/network/subnets/10.128.98.0-23 > {"PublicIP":"192.168.98.10","BackendType":"host-gw"} > > Is there any other modification that can affect that since the version in GA > right now (the one I've used)? > > $ rpm -qf > /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/ > flannel-config.json > openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch So was it that the following values we used were ignored and the default values were taken ? openshift.master.sdn_cluster_network_cidr=10.128.0.0/14 openshift.master.sdn_host_subnet_length=9 Also what if a customer want set custom cluster cidr and host subnet length ?
So, I did deployed the OCP cluster not setting any values to pod network nor services network to check if it worked with default values (the most common scenario AFAIK) I didn't have the chance to test it with different values for the cidr or subnet yet.
I've tested with custom values and it failed. I've created a new PR[1] that fixes the issue in my tests [cloud-user@bastion ~]$ grep -E 'osm|portal' /etc/ansible/hosts osm_default_node_selector="role=app" osm_use_cockpit=true osm_cluster_network_cidr=10.130.0.0/14 osm_host_subnet_length=8 openshift_portal_net=10.111.0.0/16 After the installation: [root@master-0 ~]# alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"' [root@master-0 ~]# oetcdctl get /openshift.com/network/config { "Network": "10.130.0.0/14", "SubnetLen": 24, "Backend": { "Type": "host-gw" } } But, the subnets assigned to the nodes are on different subnet: [root@master-0 ~]# oetcdctl ls /openshift.com/network/subnets /openshift.com/network/subnets/10.128.83.0-24 /openshift.com/network/subnets/10.128.18.0-24 /openshift.com/network/subnets/10.128.77.0-24 /openshift.com/network/subnets/10.128.101.0-24 /openshift.com/network/subnets/10.128.20.0-24 /openshift.com/network/subnets/10.128.92.0-24 /openshift.com/network/subnets/10.128.58.0-24 /openshift.com/network/subnets/10.128.48.0-24 I think I will need some help with that, as TBH I'm not an openshift-ansible expert. [1] https://github.com/openshift/openshift-ansible/pull/5493
So the PR has been merged as the subnets were ok (it was just me not knowing how to subnet :D) Is there anything I can do in order to push it? Will it be backported to <3.7 releases? Thx!
PRs against release-3.6, release-1.5, and release-1.4 branches would be helpful. I'll try to get to those today if you don't. We should include both of your fixes in each of those PRs.
(In reply to Scott Dodson from comment #14) > PRs against release-3.6, release-1.5, and release-1.4 branches would be > helpful. I'll try to get to those today if you don't. We should include both > of your fixes in each of those PRs. I think I got it: * PR against release-1.4 -> https://github.com/openshift/openshift-ansible/pull/5592 * PR against release-1.5 -> https://github.com/openshift/openshift-ansible/pull/5591 * PR against release-3.6 -> https://github.com/openshift/openshift-ansible/pull/5590 Thanks!
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1499651
Verified with openshift-ansible-3.7.0-0.147.0.git.0.2fb41ee.el7.noarch.rpm Both default vales and custom values works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188