Bug 1840366

Summary: Haproxy should use readyz endpoint for kube-apiserver on Bare Metal deployments
Product: OpenShift Container Platform Reporter: Sai Sindhur Malleni <smalleni>
Component: InstallerAssignee: Egor Lunin <elunin>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Eldar Weiss <eweiss>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: augol, beth.white, bperkins, dblack, eweiss, hpokorny, jtaleric, sdasu, vvoronko
Version: 4.4Keywords: Reopened, Triaged, UpcomingSprint
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1837676 Environment:
Last Closed: 2020-09-01 19:41:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1837676    
Bug Blocks:    

Description Sai Sindhur Malleni 2020-05-26 19:25:53 UTC
+++ This bug was initially created as a clone of Bug #1837676 +++

Description of problem:

Right now the haproxy config laid down by the baremetal installer and consumed by the haproxy pods on the masters through hostpath volumes looks like
defaults
  maxconn 20000
  mode    tcp
  log     /var/run/haproxy/haproxy-log.sock local0
  option  dontlognull
  retries 3
  timeout http-request 10s
  timeout queue        1m
  timeout connect      10s
  timeout client       86400s
  timeout server       86400s
  timeout tunnel       86400s
frontend  main
  bind :::9443 v4v6
  default_backend masters
listen health_check_http_url
  bind :::50936 v4v6
  mode http
  monitor-uri /healthz
  option dontlognull
listen stats
  bind localhost:50000
  mode http
  stats enable
  stats hide-version
  stats uri /haproxy_stats
  stats refresh 30s
  stats auth Username:Password
backend masters
   option  httpchk GET /healthz HTTP/1.0
   option  log-health-checks
   balance roundrobin
   server master-0 192.168.222.10:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
   server master-1 192.168.222.11:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
   server master-2 192.168.222.12:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3

However, we should switch the endpoint to readyz based on https://github.com/openshift/installer/blob/master/docs/dev/kube-apiserver-health-check.md


We believe this could be causing https://bugzilla.redhat.com/show_bug.cgi?id=1834914


Version-Release number of the following components:

4.5.0-0.nightly-2020-05-08-222601

How reproducible:

Steps to Reproduce:
1. Install OCP on BM using IPI
2.
3.

--- Additional comment from Sai Sindhur Malleni on 2020-05-19 20:13:04 UTC ---

https://github.com/openshift/machine-config-operator/commit/022933c07a4e37bed097f1cd1fa4cd2d637decc0 fixes it for 4.5, however I wonder if we should backport this to 4.4.

--- Additional comment from Sai Sindhur Malleni on 2020-05-19 23:09:57 UTC ---

https://github.com/openshift/machine-config-operator/commit/022933c07a4e37bed097f1cd1fa4cd2d637decc0 fixes it for 4.5, however I wonder if we should backport this to 4.4.

--- Additional comment from Honza Pokorny on 2020-05-26 16:08:02 UTC ---

If you need this in 4.4, please clone the original bug to track the backport.

--- Additional comment from Sai Sindhur Malleni on 2020-05-26 18:09:07 UTC ---

I cannot find the BZ for 4.5, just found the PR. Regardless, this should be in 4.4 I believe. Can you link me to the 4.5 bug? I don't believe this should be closed like how it was done.

Comment 1 Victor Voronkov 2020-06-09 17:33:44 UTC
Original 4.5 bug verified https://bugzilla.redhat.com/show_bug.cgi?id=1837676
Please proceed with the backport

Comment 4 Scott Dodson 2020-08-13 19:19:26 UTC
Bumping to high because this will affect stability during upgrades. Just for purposes of managing the patch queue, no action necessary from you.

Comment 8 Eldar Weiss 2020-08-23 07:56:03 UTC
Description of problem:

Right now the haproxy config laid down by the baremetal installer and consumed by the haproxy pods on the masters through hostpath volumes looks like
defaults


Version release tested for fix:

4.5.0-0.nightly-2020-08-19-232923


How To verify:
[core@master-0-0 ~]$ cat /etc/haproxy/haproxy.cfg | grep readyz
  monitor-uri /readyz
   option  httpchk GET /readyz HTTP/1.0


[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.ci-2020-08-20-023616   True        False         31m     Cluster version is 4.4.0-0.ci-2020-08-20-023616

Comment 10 errata-xmlrpc 2020-09-01 19:41:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.19 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3514