Bug 1654343

Summary: router deployment fails in openshift ansible installer 3.7
Product: OpenShift Container Platform Reporter: Paul Gier <pgier>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, kurt.stam, vrutkovs
Version: 3.7.1   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-29 19:20:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1644546    
Attachments:
Description Flags
ansible.log none

Description Paul Gier 2018-11-28 14:58:24 UTC
Description of problem:
When installing openshift 3.7 on AWS, the install fails during the openshift-hosted phase.

Version-Release number of the following components:

rpm -q openshift-ansible
  using latest from release-3.7 branch
rpm -q ansible
  ansible-2.4.6.0-1.el7ae.noarch
ansible --version
  ansible 2.4.6.0

How reproducible:  This happens every time for me.

Steps to Reproduce:
1. RHEL 7.6 image on AWS
2. Using all-in-one inventory
3. Latest installer on release-3.7 branch

Actual results:

STDOUT: Waiting for rollout to finish: 0 out of 1 new replicas have been updated...
STDERR: error: replication controller "router-1" has failed progressing
MSG: non-zero return code

Comment 1 Paul Gier 2018-11-28 15:00:30 UTC
The router-1 deployment log contains this:

error: couldn't get deployment router-1: Get https://172.30.0.1:443/api/v1/namespaces/default/replicationcontrollers/router-1: dial tcp 172.30.0.1:443: getsockopt: no route to host

Comment 2 Paul Gier 2018-11-28 15:09:36 UTC
Created attachment 1509541 [details]
ansible.log

Comment 3 Vadim Rutkovsky 2018-11-28 15:10:13 UTC
Seems API is up (we verify that during master install), but routers can't reach it via SDN

Comment 5 Kurt Stam 2019-04-20 21:46:38 UTC
There is no workaround? I'm using the aarch64 bistro and am running into the same issue. Any pointers would be super helpful. Thx! --Kurt (at Red Hat)

Comment 6 Kurt Stam 2019-04-21 01:32:37 UTC
BTW same exact issue on a Centos-7.6 x86_64 VM with OpenShift 3.7.1 running on my local box.

Comment 7 Kurt Stam 2019-04-22 01:37:18 UTC
The workaround can be found here: https://github.com/Project31/rh-middleware-on-arm/issues/10, downgrade iptables to the version that ships with 7.5.