Bug 1473858 - Installer does not configure flannel correctly for openstack installs.
Installer does not configure flannel correctly for openstack installs.
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.5.0
Unspecified Unspecified
high Severity high
: ---
: 3.7.0
Assigned To: Scott Dodson
Gan Huang
:
Depends On:
Blocks: 1594310 1490388 1491412 1491413
  Show dependency treegraph
 
Reported: 2017-07-21 17:39 EDT by Ryan Howe
Modified: 2018-06-22 11:19 EDT (History)
8 users (show)

See Also:
Fixed In Version: openshift-ansible-3.7.0-0.126.1.git.0.0bb5b0c.el7.noarch
Doc Type: Bug Fix
Doc Text:
The flannel network was previously defined using the same subnet as the kubernetes services subnet. This caused a conflict between services and SDN networks. The flannel network is now correctly defined by the osm_cluster_network_cidr variable.
Story Points: ---
Clone Of:
: 1491412 1491413 1594310 (view as bug list)
Environment:
Last Closed: 2017-11-28 17:05:05 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-28 21:34:54 EST

  None (edit)
Description Ryan Howe 2017-07-21 17:39:00 EDT
Description of problem:

For flannel installs we are setting the pod network = to the service network. Instead the pod network needs to be set to the value we pass for osm_cluster_network_cidr when configuring flannel. 

Version-Release number of the following components:
OCP 3.5 


Additional info:

The flannel configuration uses the portal_net and default to 172.30.0.0/16. We also hard set the min network to 172.30.5.0  This value should be set via the installer host variables passed.

https://github.com/openshift/openshift-ansible/blob/master/roles/flannel/tasks/main.yml#L15

https://github.com/openshift/openshift-ansible/blob/master/roles/flannel_register/defaults/main.yaml

Kubernetes                OpenShift         Ansible_Installer
========                   =======           =============
--cluster-cidr               clusterNetworkCIDR   osm_cluster_network_cidr
--service-cluster-ip-range   serviceNetworkCIDR   openshift_port_net
Comment 2 Eduardo Minguez 2017-08-11 08:16:27 EDT
3.6 as well.
Comment 5 Gan Huang 2017-09-18 02:38:53 EDT
Tested with openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm


Installation failed: 

TASK [flannel_register : Generate etcd configuration for etcd] *****************
Monday 18 September 2017  06:37:14 +0000 (0:00:00.130)       0:05:32.474 ****** 
fatal: [host-8-241-75.host.centralci.eng.rdu2.redhat.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText'
Comment 6 Miheer Salunke 2017-09-18 02:52:44 EDT
After setting ->
osm_cluster_network_cidr=10.128.0.0/14
osm_host_subnet_length=9

I get the following error ->
AnsibleError: {{ 32 - openshift.master.sdn_host_subnet_length }}: Unexpected templating type error occurred on ({{ 32 - openshift.master.sdn_host_subnet_length }}): unsupported operand type(s) for -: 'int' and 'AnsibleUnsafeText'




The following helped to set the CIDR ->

openshift.master.sdn_cluster_network_cidr=10.128.0.0/14
openshift.master.sdn_host_subnet_length=9
Comment 7 Eduardo Minguez 2017-09-19 05:08:27 EDT
So, I've created a cluster from scratch using the latest 3.6 bits + modifying manually the files provided in the PR, and using the default values (so I didn't touched osm_cluster_network_cidr nor osm_host_subnet_length) and it worked for me:


# alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"'
# oetcdctl get /openshift.com/network/config
{
    "Network": "10.128.0.0/14",
    "SubnetLen": 23,
    "Backend": {
        "Type": "host-gw"
     }
}
# oetcdctl ls /openshift.com/network/subnets
/openshift.com/network/subnets/10.128.10.0-23
/openshift.com/network/subnets/10.128.108.0-23
/openshift.com/network/subnets/10.128.118.0-23
/openshift.com/network/subnets/10.128.28.0-23
/openshift.com/network/subnets/10.128.40.0-23
/openshift.com/network/subnets/10.128.140.0-23
/openshift.com/network/subnets/10.128.98.0-23
/openshift.com/network/subnets/10.128.12.0-23
# oetcdctl get /openshift.com/network/subnets/10.128.98.0-23
{"PublicIP":"192.168.98.10","BackendType":"host-gw"}

Is there any other modification that can affect that since the version in GA right now (the one I've used)?

$ rpm -qf /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/flannel-config.json 
openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch
Comment 8 Scott Dodson 2017-09-19 09:35:43 EDT
Eduardo, your change is only on the master branch you'll want to test a 3.7 version of the installer, right?
Comment 9 Eduardo Minguez 2017-09-20 04:30:07 EDT
(In reply to Scott Dodson from comment #8)
> Eduardo, your change is only on the master branch you'll want to test a 3.7
> version of the installer, right?

I can do it if needed, but the thing is I think it should be backported to older releases as well. I've tested 3.6 + manually patching those files as is the GA bits I can use (IDK how to test 3.7 TBH)
Comment 10 Miheer Salunke 2017-09-21 02:37:04 EDT
(In reply to Eduardo Minguez from comment #7)
> So, I've created a cluster from scratch using the latest 3.6 bits +
> modifying manually the files provided in the PR, and using the default
> values (so I didn't touched osm_cluster_network_cidr nor
> osm_host_subnet_length) and it worked for me:
> 
> 
> # alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt
> --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt
> --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.
> com:2379,https://master-2.edu.flannel.com:2379"'
> # oetcdctl get /openshift.com/network/config
> {
>     "Network": "10.128.0.0/14",
>     "SubnetLen": 23,
>     "Backend": {
>         "Type": "host-gw"
>      }
> }
> # oetcdctl ls /openshift.com/network/subnets
> /openshift.com/network/subnets/10.128.10.0-23
> /openshift.com/network/subnets/10.128.108.0-23
> /openshift.com/network/subnets/10.128.118.0-23
> /openshift.com/network/subnets/10.128.28.0-23
> /openshift.com/network/subnets/10.128.40.0-23
> /openshift.com/network/subnets/10.128.140.0-23
> /openshift.com/network/subnets/10.128.98.0-23
> /openshift.com/network/subnets/10.128.12.0-23
> # oetcdctl get /openshift.com/network/subnets/10.128.98.0-23
> {"PublicIP":"192.168.98.10","BackendType":"host-gw"}
> 
> Is there any other modification that can affect that since the version in GA
> right now (the one I've used)?
> 
> $ rpm -qf
> /usr/share/ansible/openshift-ansible/roles/flannel_register/templates/
> flannel-config.json 
> openshift-ansible-roles-3.6.173.0.21-2.git.0.44a4038.el7.noarch



So was it that the following values we used were ignored and the default values were taken ?

openshift.master.sdn_cluster_network_cidr=10.128.0.0/14
openshift.master.sdn_host_subnet_length=9

Also what if a customer want set custom cluster cidr and host subnet length ?
Comment 11 Eduardo Minguez 2017-09-21 03:47:14 EDT
So, I did deployed the OCP cluster not setting any values to pod network nor services network to check if it worked with default values (the most common scenario AFAIK)

I didn't have the chance to test it with different values for the cidr or subnet yet.
Comment 12 Eduardo Minguez 2017-09-21 12:04:41 EDT
I've tested with custom values and it failed. I've created a new PR[1] that fixes the issue in my tests


[cloud-user@bastion ~]$ grep -E 'osm|portal' /etc/ansible/hosts
osm_default_node_selector="role=app"
osm_use_cockpit=true
osm_cluster_network_cidr=10.130.0.0/14
osm_host_subnet_length=8
openshift_portal_net=10.111.0.0/16

After the installation:


[root@master-0 ~]# alias oetcdctl='etcdctl --cert-file=/etc/etcd/peer.crt --key-file=/etc/etcd/peer.key --ca-file=/etc/etcd/ca.crt --peers="https://master-0.edu.flannel.com:2379,https://master-1.edu.flannel.com:2379,https://master-2.edu.flannel.com:2379"'
[root@master-0 ~]# oetcdctl get /openshift.com/network/config
{
    "Network": "10.130.0.0/14",
    "SubnetLen": 24,
    "Backend": {
        "Type": "host-gw"
     }
}


But, the subnets assigned to the nodes are on different subnet:

[root@master-0 ~]# oetcdctl ls /openshift.com/network/subnets
/openshift.com/network/subnets/10.128.83.0-24
/openshift.com/network/subnets/10.128.18.0-24
/openshift.com/network/subnets/10.128.77.0-24
/openshift.com/network/subnets/10.128.101.0-24
/openshift.com/network/subnets/10.128.20.0-24
/openshift.com/network/subnets/10.128.92.0-24
/openshift.com/network/subnets/10.128.58.0-24
/openshift.com/network/subnets/10.128.48.0-24

I think I will need some help with that, as TBH I'm not an openshift-ansible expert.

[1] https://github.com/openshift/openshift-ansible/pull/5493
Comment 13 Eduardo Minguez 2017-09-29 05:02:23 EDT
So the PR has been merged as the subnets were ok (it was just me not knowing how to subnet :D)
Is there anything I can do in order to push it? Will it be backported to <3.7 releases?
Thx!
Comment 14 Scott Dodson 2017-09-29 09:21:46 EDT
PRs against release-3.6, release-1.5, and release-1.4 branches would be helpful. I'll try to get to those today if you don't. We should include both of your fixes in each of those PRs.
Comment 15 Eduardo Minguez 2017-09-29 10:41:24 EDT
(In reply to Scott Dodson from comment #14)
> PRs against release-3.6, release-1.5, and release-1.4 branches would be
> helpful. I'll try to get to those today if you don't. We should include both
> of your fixes in each of those PRs.

I think I got it:

* PR against release-1.4 -> https://github.com/openshift/openshift-ansible/pull/5592
* PR against release-1.5 -> https://github.com/openshift/openshift-ansible/pull/5591
* PR against release-3.6 -> https://github.com/openshift/openshift-ansible/pull/5590

Thanks!
Comment 19 Gan Huang 2017-10-09 04:18:17 EDT
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1499651
Comment 20 Gan Huang 2017-10-11 04:48:27 EDT
Verified with openshift-ansible-3.7.0-0.147.0.git.0.2fb41ee.el7.noarch.rpm

Both default vales and custom values works.
Comment 23 errata-xmlrpc 2017-11-28 17:05:05 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.