1840366 – Haproxy should use readyz endpoint for kube-apiserver on Bare Metal deployments

Bug 1840366 - Haproxy should use readyz endpoint for kube-apiserver on Bare Metal deployments

Summary: Haproxy should use readyz endpoint for kube-apiserver on Bare Metal deployments

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.4.z
Assignee:	Egor Lunin
QA Contact:	Eldar Weiss
Docs Contact:
URL:
Whiteboard:
Depends On:	1837676
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-26 19:25 UTC by Sai Sindhur Malleni
Modified:	2020-09-01 19:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1837676
Environment:
Last Closed:	2020-09-01 19:41:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1997	0	None	closed	Bug 1840366: [baremetal] Switch to /readyz for haproxy healthchecking	2020-10-12 01:03:29 UTC
Red Hat Product Errata	RHBA-2020:3514	0	None	None	None	2020-09-01 19:41:54 UTC

Description Sai Sindhur Malleni 2020-05-26 19:25:53 UTC

+++ This bug was initially created as a clone of Bug #1837676 +++

Description of problem:

Right now the haproxy config laid down by the baremetal installer and consumed by the haproxy pods on the masters through hostpath volumes looks like
defaults
  maxconn 20000
  mode    tcp
  log     /var/run/haproxy/haproxy-log.sock local0
  option  dontlognull
  retries 3
  timeout http-request 10s
  timeout queue        1m
  timeout connect      10s
  timeout client       86400s
  timeout server       86400s
  timeout tunnel       86400s
frontend  main
  bind :::9443 v4v6
  default_backend masters
listen health_check_http_url
  bind :::50936 v4v6
  mode http
  monitor-uri /healthz
  option dontlognull
listen stats
  bind localhost:50000
  mode http
  stats enable
  stats hide-version
  stats uri /haproxy_stats
  stats refresh 30s
  stats auth Username:Password
backend masters
   option  httpchk GET /healthz HTTP/1.0
   option  log-health-checks
   balance roundrobin
   server master-0 192.168.222.10:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
   server master-1 192.168.222.11:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3
   server master-2 192.168.222.12:6443 weight 1 verify none check check-ssl inter 3s fall 2 rise 3

However, we should switch the endpoint to readyz based on https://github.com/openshift/installer/blob/master/docs/dev/kube-apiserver-health-check.md


We believe this could be causing https://bugzilla.redhat.com/show_bug.cgi?id=1834914


Version-Release number of the following components:

4.5.0-0.nightly-2020-05-08-222601

How reproducible:

Steps to Reproduce:
1. Install OCP on BM using IPI
2.
3.

--- Additional comment from Sai Sindhur Malleni on 2020-05-19 20:13:04 UTC ---

https://github.com/openshift/machine-config-operator/commit/022933c07a4e37bed097f1cd1fa4cd2d637decc0 fixes it for 4.5, however I wonder if we should backport this to 4.4.

--- Additional comment from Sai Sindhur Malleni on 2020-05-19 23:09:57 UTC ---

https://github.com/openshift/machine-config-operator/commit/022933c07a4e37bed097f1cd1fa4cd2d637decc0 fixes it for 4.5, however I wonder if we should backport this to 4.4.

--- Additional comment from Honza Pokorny on 2020-05-26 16:08:02 UTC ---

If you need this in 4.4, please clone the original bug to track the backport.

--- Additional comment from Sai Sindhur Malleni on 2020-05-26 18:09:07 UTC ---

I cannot find the BZ for 4.5, just found the PR. Regardless, this should be in 4.4 I believe. Can you link me to the 4.5 bug? I don't believe this should be closed like how it was done.

Comment 1 Victor Voronkov 2020-06-09 17:33:44 UTC

Original 4.5 bug verified https://bugzilla.redhat.com/show_bug.cgi?id=1837676
Please proceed with the backport

Comment 4 Scott Dodson 2020-08-13 19:19:26 UTC

Bumping to high because this will affect stability during upgrades. Just for purposes of managing the patch queue, no action necessary from you.

Comment 8 Eldar Weiss 2020-08-23 07:56:03 UTC

Description of problem:

Right now the haproxy config laid down by the baremetal installer and consumed by the haproxy pods on the masters through hostpath volumes looks like
defaults


Version release tested for fix:

4.5.0-0.nightly-2020-08-19-232923


How To verify:
[core@master-0-0 ~]$ cat /etc/haproxy/haproxy.cfg | grep readyz
  monitor-uri /readyz
   option  httpchk GET /readyz HTTP/1.0


[kni@provisionhost-0-0 ~]$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.ci-2020-08-20-023616   True        False         31m     Cluster version is 4.4.0-0.ci-2020-08-20-023616

Comment 10 errata-xmlrpc 2020-09-01 19:41:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.19 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3514

Note You need to log in before you can comment on or make changes to this bug.