Bug 1459193

Summary: Service is unreachable on the newly added node while manually scaling up nodes in Flannel network mode.
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: Reference ArchitectureAssignee: Mark Lamourine <mlamouri>
Status: CLOSED CURRENTRELEASE QA Contact: Gan Huang <ghuang>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.1CC: mlamouri
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-03 18:48:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gan Huang 2017-06-06 13:55:45 UTC
Description of problem:
S2I build failed on the newly added node while scaling up nodes manually due to docker-registry was unreachable.

Version-Release number of selected component (if applicable):
openshift-heat-templates-0.9.9-2.el7ost.noarch
openshift-ansible-3.4.89-1.git.0.ac29ce8.el7.noarch
flannel-0.7.0-1.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. Trigger a openshift-on-openstack stack by using the parameters template

https://github.com/ganhuang/shell-learning/blob/master/ocp-on-osp-scritps/ocp34-on-osp10/ocp-templates/ha-master-dedicated-flannel.yaml

2. Manually edit `node_count` to 2, and update the stack

3. Trigger S2I build after scaling complete, make sure the app is assigned to the newly added node

Actual results:
#oc build-logs dancer-mysql-example-1
<--snip-->
Pushing image 172.30.10.181:5000/ghuang-test/dancer-mysql-example:latest ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Put http://172.30.10.181:5000/v1/repositories/ghuang-test/dancer-mysql-example/: dial tcp 172.30.10.181:5000: getsockopt: no route to host


[root@ha-master-dedicated-flannel-master-0 ~]# oc get po -o wide --all-namespaces=true
<--snip-->
ghuang-test    dancer-mysql-example-1-build    0/1       Error       0          43m       172.30.37.2     ha-master-dedicated-flannel-node-5m6m3589.ocp3.ghuang.com
<--snip-->

Login to the newly added node, failed to curl docker-registry service
[cloud-user@ha-master-dedicated-flannel-node-5m6m3589 ~]$ curl 172.30.10.181:5000                                                       
curl: (7) Failed connect to 172.30.10.181:5000; No route to host

But that works on other nodes

Expected results:
S2I build should succeed in any cases.

Additional info:

Comment 1 Mark Lamourine 2017-06-16 11:40:00 UTC
The scaleup playbook does not include two rules which are needed to complete the firewall configuration.  These rules are present in the deployment playbook but not in scaleup.

Adding these two rules, conditional on sdn_flannel should resolve this.

+  - name: Set up masquerading on flannel interface
 +    shell: iptables -t nat -A POSTROUTING -o {{ flannel_interface }} -j MASQUERADE
 +
 +  - name: Make iptables rules permanent
 +    shell: /usr/libexec/iptables/iptables.init save
 +

Comment 2 Mark Lamourine 2017-06-30 11:48:51 UTC
Fixed upstream - 

https://github.com/redhat-openstack/openshift-on-openstack/commit/9b9f90f44bb9d032ee11e7dcf7ad30370ffdd10a

Creating a package for testing.

Comment 3 Mark Lamourine 2017-07-02 23:35:39 UTC
Fixed and OCP version corrected in RPM 

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=570510

Comment 4 Gan Huang 2017-07-03 06:30:49 UTC
Verified with openshift-heat-templates-0.9.9-5.el7ost.noarch

Manual scaling succeed with Flannel network enabled. The services for both the newly node and existing node can be accessed successfully.

# openshift version
openshift v3.4.1.37
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

# rpm -q flannel
flannel-0.7.1-1.el7.x86_64