1464475 – [GSS] [OCP 3.4] haproxy config files are missing some servers

Bug 1464475 - [GSS] [OCP 3.4] haproxy config files are missing some servers

Summary: [GSS] [OCP 3.4] haproxy config files are missing some servers

Keywords:
Status:	CLOSED DUPLICATE of bug 1464567
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Phil Cameron
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-23 14:06 UTC by Francesco Marchioni
Modified:	2022-08-04 22:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-02 14:10:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
haproxy stats page (75.79 KB, application/pdf) 2017-06-23 14:06 UTC, Francesco Marchioni	no flags	Details
Router configurations (307.40 KB, application/zip) 2017-06-23 14:08 UTC, Francesco Marchioni	no flags	Details
router log files (143.62 KB, text/plain) 2017-06-23 14:10 UTC, Francesco Marchioni	no flags	Details
Deployment config and ha-proxy config template (12.85 KB, application/zip) 2017-06-23 14:12 UTC, Francesco Marchioni	no flags	Details
View All

Description Francesco Marchioni 2017-06-23 14:06:57 UTC

Created attachment 1291101 [details]
haproxy stats page

[customer case 01789221]

Description of problem:
after an upgrade to ocp 3.4.1.24 to solve the issue  (Bug 1440977 - Router hangs on deadlock) the following issue has been observed:

Some (2 out of 4) of the router's haproxy config files are getting de-synchronized as they are  missing some relevant "server" information.
By redeploying the application, the information in the haproxy is restored,  however as they recreate the pod (oc delete), the server disappears again from the haproxy config file.

I'm attaching the router config files and logs from the case. 

Version-Release number of selected component (if applicable):
ocp 3.4.1.24 

How reproducible:
The issue is not yet reproducible in a predictable way and it appears only on a few routers


Actual results:
Some servers are not included in the haproxy config file

Expected results:
All servers to be included in the haproxy config file

Additional info:
All servers are reachable with a curl.
A view of the HAProxy stats page shows that backend are active

Comment 1 Francesco Marchioni 2017-06-23 14:08:03 UTC

Created attachment 1291102 [details]
Router configurations

Comment 2 Francesco Marchioni 2017-06-23 14:10:14 UTC

Created attachment 1291103 [details]
router log files

Comment 3 Francesco Marchioni 2017-06-23 14:12:56 UTC

Created attachment 1291105 [details]
Deployment config and ha-proxy config template

Comment 4 Ben Bennett 2017-06-23 15:11:00 UTC

Per @fmarchio:

I've checked through the router files and could find the following "broken" configurations:

./router-cpyi0018-10-5255r/haproxy.config
./router-cpyi0087-11-rbfed/haproxy.config

More in detail, they are missing the following servers:

<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
4606d4604
<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
4670d4667
<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
routes

Comment 5 Ben Bennett 2017-06-23 19:19:19 UTC

It could be https://bugzilla.redhat.com/show_bug.cgi?id=1464567 (or one of the other event queue bugs that we fixed).  I know you aren't seeing the panics, but there are other queue problems that were identified when we fixed that bug.

Can they try enabling the debugging endpoint as the comment here outlines:
  https://github.com/openshift/ose/pull/700

Then get me the output from that curl command and we can see if we can see anything funky there.

Alternatively, would they be willing to run the router with an elevated log level?

Comment 6 Ben Bennett 2017-06-26 14:32:07 UTC

How to enable the debugging endpoints:

  This implements an http endpoint controlled by setting
  OPENSHIFT_PROFILE=web and then you can override the address it listens
  on (default is 127.0.0.1) and the port (default 6061) using the
  OPENSHIFT_PROFILE_HOST and OPENSHIFT_PROFILE_PORT environment
  variables respectively.

  This is disabled by default until OPENSHIFT_PROFILE=web is set.

  With the default setup, you can do:
    curl http://127.0.0.1:6061/debug/pprof/goroutine?debug=1

Comment 7 Phil Cameron 2017-06-26 18:51:13 UTC

bbennett comment 4 lists the same server 3 times. This suggest the router didn't see the event for it.

Comment 18 Ben Bennett 2017-07-25 18:43:58 UTC

 Can we get the current pod spec please?

Comment 27 Phil Cameron 2017-08-02 14:10:00 UTC


*** This bug has been marked as a duplicate of bug 1464567 ***

Note You need to log in before you can comment on or make changes to this bug.