Bug 1551088 - Listen address of 0.0.0.0 causes OpenShift to hang during failure mode
Summary: Listen address of 0.0.0.0 causes OpenShift to hang during failure mode
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.6.z
Assignee: Casey Callendrello
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-02 17:18 UTC by Greg Rodriguez II
Modified: 2021-12-10 15:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-20 14:53:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Greg Rodriguez II 2018-03-02 17:18:24 UTC
Description of problem:
From Customer - 
I have openshift systems with multiple nics.  The openshift network is configured to run on a bond vlan tag nic (not the default gw nic).

I have an external haproxy lb also for the 3 masters api connection.

While doing a failure test (taking down networking of one of the 3 masters), the api completely hangs even though the haproxy lb IP is still pinging.  

What I have found as an issue, is that by default the /etc/sysconfig/atomic-openshift-master-api and /etc/sysconfig/atomic-openshift-master-controllers are both configured with a LISTEN address of 0.0.0.0.  This doesnt' seem to work well during a failure mode, causes openshift to hang.

If I set the LISTEN address to the respective IP of each master, and restart the master controllers and api services..., and re-do the network failure test, the api connection tests work great.

< OPTIONS=--loglevel=2 --listen=https://10.1.202.11:8443 --master=https://master01.os01.infra.gen.adm1.prod.bamtech.co:8443
---
> OPTIONS=--loglevel=2 --listen=https://0.0.0.0:8443 --master=https://master01.os01.infra.gen.adm1.prod.bamtech.co:8443
[root@master01 sysconfig]# diff atomic-openshift-master-controllers atomic-openshift-master-controllers.mike
1c1
< OPTIONS=--loglevel=2 --listen=https://10.1.202.11:8444
---
> OPTIONS=--loglevel=2 --listen=https://0.0.0.0:8444

Is this a bug in how ansible setup these files?  Or, should there be specific entries in the ansible inventory file when setting up the cluster to ensure the LISTEN address = openshift_ip of each master?

Version-Release number of selected component (if applicable):
OCP 3.6

How reproducible:
Customer Verified

Steps to Reproduce:
Information is in Description

Actual results:
OpenShift hangs during failure with listen address of 0.0.0.0

Expected results:
Expect not to hang

Additional info:
N/A

Comment 1 Ben Bennett 2018-03-02 17:56:02 UTC
@dcbw: Can you clarify the expected behavior please?

Comment 2 Rajat Chopra 2018-05-03 18:17:21 UTC
Need some more details.
1. haproxy config details
   - what is the load balancing policy (roundrobin? stick on srcIP?)
   - what are the endpoint IPs
   - what are the health checks performed for the endpoint IPs
2. how is the network brought down on the failure test?
3. What are the addresses of the other interfaces?

Clearly, the workaround in the interim is to use specific listen addresses.

Comment 5 Stephen Cuppett 2019-11-20 14:53:36 UTC
OCP 3.6 has reached the end of full support [1]. Closing this BZ as WONTFIX. If there is a customer case to be attached with a valid support exception and we still need a fix here, please post those details and reopen.

[1] - https://access.redhat.com/support/policy/updates/openshift


Note You need to log in before you can comment on or make changes to this bug.