Bug 1464475 - [GSS] [OCP 3.4] haproxy config files are missing some servers
[GSS] [OCP 3.4] haproxy config files are missing some servers
Status: CLOSED DUPLICATE of bug 1464567
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.4.1
Unspecified Unspecified
unspecified Severity urgent
: ---
: 3.4.z
Assigned To: Phil Cameron
zhaozhanqi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-23 10:06 EDT by Francesco Marchioni
Modified: 2017-08-02 10:10 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-02 10:10:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
haproxy stats page (75.79 KB, application/pdf)
2017-06-23 10:06 EDT, Francesco Marchioni
no flags Details
Router configurations (307.40 KB, application/zip)
2017-06-23 10:08 EDT, Francesco Marchioni
no flags Details
router log files (143.62 KB, text/plain)
2017-06-23 10:10 EDT, Francesco Marchioni
no flags Details
Deployment config and ha-proxy config template (12.85 KB, application/zip)
2017-06-23 10:12 EDT, Francesco Marchioni
no flags Details

  None (edit)
Description Francesco Marchioni 2017-06-23 10:06:57 EDT
Created attachment 1291101 [details]
haproxy stats page

[customer case 01789221]

Description of problem:
after an upgrade to ocp 3.4.1.24 to solve the issue  (Bug 1440977 - Router hangs on deadlock) the following issue has been observed:

Some (2 out of 4) of the router's haproxy config files are getting de-synchronized as they are  missing some relevant "server" information.
By redeploying the application, the information in the haproxy is restored,  however as they recreate the pod (oc delete), the server disappears again from the haproxy config file.

I'm attaching the router config files and logs from the case. 

Version-Release number of selected component (if applicable):
ocp 3.4.1.24 

How reproducible:
The issue is not yet reproducible in a predictable way and it appears only on a few routers


Actual results:
Some servers are not included in the haproxy config file

Expected results:
All servers to be included in the haproxy config file

Additional info:
All servers are reachable with a curl.
A view of the HAProxy stats page shows that backend are active
Comment 1 Francesco Marchioni 2017-06-23 10:08 EDT
Created attachment 1291102 [details]
Router configurations
Comment 2 Francesco Marchioni 2017-06-23 10:10 EDT
Created attachment 1291103 [details]
router log files
Comment 3 Francesco Marchioni 2017-06-23 10:12 EDT
Created attachment 1291105 [details]
Deployment config and ha-proxy config template
Comment 4 Ben Bennett 2017-06-23 11:11:00 EDT
Per @fmarchio:

I've checked through the router files and could find the following "broken" configurations:

./router-cpyi0018-10-5255r/haproxy.config
./router-cpyi0087-11-rbfed/haproxy.config

More in detail, they are missing the following servers:

<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
4606d4604
<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
4670d4667
<   server 7dd6fb3777b32ca1383ef13f0622de63 10.1.22.144:8080 check inter 5000ms cookie 7dd6fb3777b32ca1383ef13f0622de63 weight 100
routes
Comment 5 Ben Bennett 2017-06-23 15:19:19 EDT
It could be https://bugzilla.redhat.com/show_bug.cgi?id=1464567 (or one of the other event queue bugs that we fixed).  I know you aren't seeing the panics, but there are other queue problems that were identified when we fixed that bug.

Can they try enabling the debugging endpoint as the comment here outlines:
  https://github.com/openshift/ose/pull/700

Then get me the output from that curl command and we can see if we can see anything funky there.

Alternatively, would they be willing to run the router with an elevated log level?
Comment 6 Ben Bennett 2017-06-26 10:32:07 EDT
How to enable the debugging endpoints:

  This implements an http endpoint controlled by setting
  OPENSHIFT_PROFILE=web and then you can override the address it listens
  on (default is 127.0.0.1) and the port (default 6061) using the
  OPENSHIFT_PROFILE_HOST and OPENSHIFT_PROFILE_PORT environment
  variables respectively.

  This is disabled by default until OPENSHIFT_PROFILE=web is set.

  With the default setup, you can do:
    curl http://127.0.0.1:6061/debug/pprof/goroutine?debug=1
Comment 7 Phil Cameron 2017-06-26 14:51:13 EDT
bbennett@redhat.com comment 4 lists the same server 3 times. This suggest the router didn't see the event for it.
Comment 18 Ben Bennett 2017-07-25 14:43:58 EDT
 Can we get the current pod spec please?
Comment 27 Phil Cameron 2017-08-02 10:10:00 EDT

*** This bug has been marked as a duplicate of bug 1464567 ***

Note You need to log in before you can comment on or make changes to this bug.