Bug 1507664 - The health checks are disabled when there are multiple services
Summary: The health checks are disabled when there are multiple services
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.7.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-30 20:52 UTC by Ben Bennett
Modified: 2022-08-04 22:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A code change did not consider that a route could have more than one associated service. Consequence: Health checks could be disabled even if there were multiple backing servers if defined in multiple services. Fix: Corrected the code so the check was correct. Result: Health checks are disabled when there is one, and only one, backing server.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:20:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 17077 0 None None None 2017-10-30 21:18:24 UTC
Origin (Github) 17155 0 None None None 2017-11-03 12:22:43 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Ben Bennett 2017-10-30 20:52:45 UTC
Description of problem:

There was a change made to prevent health checks when there was only one endpoint for a service.  But that was too aggressive.  When a route has multiple services then it would disable health checks incorrectly.  We need to disable the checks when there is only one backend server for the whole route.

Version-Release number of selected component (if applicable):

3.7.0 pre-release


How reproducible:

Every time.



Steps to Reproduce:
1. Make a route that selects two services
2. Make each service have one endpoint


Actual results:

Health checks are disabled for both servers in the haproxy config.

Expected results:

Health checks are performed.


Additional info:

Comment 1 Ben Bennett 2017-10-30 21:18:25 UTC
PR https://github.com/openshift/origin/pull/17077

Comment 2 openshift-github-bot 2017-10-31 05:21:28 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/9835dca2c14d387a5a15ee14883b48e1f9073604
Skip health checks when there is one server that backs the route

Previously we had attempted to suppress the health checks, but it was
looking at the endpoints of the service, not the servers that back the
route.  The problem is that if we just look at endpoints, then if a
route has two backing services, each with one endpoint, we would
incorrectly suppress the health checks.

Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)

https://github.com/openshift/origin/commit/1bae363268fec9a0df7ac62fc1823ccc3df1cf94
Merge pull request #17077 from knobunc/fix/health-supression

Automatic merge from submit-queue.

Fix the "supress health checks when only one backing service" logic

This reverts commit 4833eb5d9f770fe8ab2b991f1c4d114cd09bf99c and then fixes the implementation.

Previously we had attempted to suppress the health checks, but it was looking at the endpoints of the service, not the servers that back the route.  The problem is that if we just look at endpoints, then if a route has two backing services, each with one endpoint, we would incorrectly suppress the health checks.

Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)

Comment 4 Hongan Li 2017-11-02 03:53:43 UTC
verified in v3.7.0-0.189.0 but failed, no "check inter" if route has two services  and service has one endpoint.


a) two services, one have two endpoints and another has one endpoint:
  server pod:test-rc-dsx7g:service-unsecure:10.129.0.11:8080 10.129.0.11:8080 cookie 51b8a3813373ee7b1281fc933f9d1ec3 weight 64 check inter 5000ms
  server pod:test-rc-7vcs2:service-unsecure:10.130.0.11:8080 10.130.0.11:8080 cookie b59a676bd37c1865c9b8c40d9db15c61 weight 64 check inter 5000ms
  server pod:caddy-docker-2:service-unsecure-2:10.130.0.9:8080 10.130.0.9:8080 cookie fbfdb60f74200c055330efafa409a7f5 weight 256


b) two services, each service has one endpoint:
  server pod:test-rc-dsx7g:service-unsecure:10.129.0.11:8080 10.129.0.11:8080 cookie 51b8a3813373ee7b1281fc933f9d1ec3 weight 128
  server pod:caddy-docker-2:service-unsecure-2:10.130.0.9:8080 10.130.0.9:8080 cookie fbfdb60f74200c055330efafa409a7f5 weight 256

Comment 5 Weibin Liang 2017-11-02 20:21:59 UTC
@hongli,

I got same results yesterday with no "check inter" for the services when I create one route with three servers, each service has only one endpoint.

Comment 6 Ben Bennett 2017-11-03 12:22:30 UTC
New PR https://github.com/openshift/origin/pull/17155

Comment 7 openshift-github-bot 2017-11-03 17:18:04 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/8cb7b957e092fd73d6d84feeb16c77d68ef69d28
Fix the "supress health checks when only one backing service" logic

Previously we had attempted to suppress the health checks, but it was looking at the endpoints of the service, not the servers that back the route. The problem is that if we just look at endpoints, then if a route has two backing services, each with one endpoint, we would incorrectly suppress the health checks.

This is the third attempt.  I had missed a case in the last one, so now the logic is moved to be computed and recorded by the router controller, but used in the template.

Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)

https://github.com/openshift/origin/commit/fc8725b6d2f46261f12862dd6185be67352fd53a
Merge pull request #17155 from knobunc/bug/bz1507664-suppress-health-checks

Automatic merge from submit-queue.

Bug/bz1507664 suppress health checks

Fix the "supress health checks when only one backing service" logic

This reverts commit 9835dca2c14d387a5a15ee14883b48e1f9073604 because it is wrong, then the second commit implements the correct change.
    
Previously we had attempted to suppress the health checks, but it was looking at the endpoints of the service, not the servers that back the route. The problem is that if we just look at endpoints, then if a route has two backing services, each with one endpoint, we would incorrectly suppress the health checks.
    
This is the third attempt.  I had missed a case in the last one, so now the logic is moved to be computed and recorded by the router controller, but used in the template.
    
Fixes bug 1507664 (https://bugzilla.redhat.com/show_bug.cgi?id=1507664)

Comment 9 Hongan Li 2017-11-07 03:34:47 UTC
verified in OCP v3.7.0-0.196.0 and issue has been fixed.

a) two service and each service has one endpoint
  server pod:test-rc-qrxjt:service-unsecure:10.128.0.9:8080 10.128.0.9:8080 cookie 5708008fda5943a37452de2ab223d0bc weight 256 check inter 5000ms
  server pod:caddy-docker-2:service-unsecure-2:10.128.0.8:8080 10.128.0.8:8080 cookie 0b0853154930d7393cdbcb1c3f06306a weight 256 check inter 5000ms


b) one service has two endpoints
  server pod:test-rc-qrxjt:service-unsecure:10.128.0.9:8080 10.128.0.9:8080 cookie 5708008fda5943a37452de2ab223d0bc weight 256 check inter 5000ms
  server pod:test-rc-9t69p:service-unsecure:10.129.0.6:8080 10.129.0.6:8080 cookie 991bec356d21c2a6ea189d9f69e0b244 weight 256 check inter 5000ms


c) one service has one endpoint
server pod:test-rc-qrxjt:service-unsecure:10.128.0.9:8080 10.128.0.9:8080 cookie 5708008fda5943a37452de2ab223d0bc weight 256

Comment 12 errata-xmlrpc 2017-11-28 22:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.