Bug 1828382
Summary: | Azure IPI: Both Internal and External load balancers for kube-apiserver should use /readyz | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Abu Kashem <akashem> | |
Component: | Installer | Assignee: | Abu Kashem <akashem> | |
Installer sub component: | openshift-installer | QA Contact: | Etienne Simard <esimard> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | bleanhar, esimard, ffranz, jminter, mgahagan, mjudeiki, wking, xxia | |
Version: | 4.5 | Keywords: | Upgrades | |
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1836016 1836017 1836018 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:31:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1831760 | |||
Bug Blocks: | 1836038 |
Description
Abu Kashem
2020-04-27 15:53:21 UTC
*** Bug 1820577 has been marked as a duplicate of this bug. *** The fix for this will be made for 4.4 too, right? I tried and failed to make this work on ARO today. AFAICS the gap is as follows: the bootstrap node never indicates /readyz so it never joins the ILB and install fails. AFAICS Azure does not permit different backends to listen on the same frontend port with different probe configurations. So you'll need to get the bootstrap node to indicate /readyz as a precondition for making this work. Hi jminter, If I am not mistaken the bootstrap node should run the version of kube-apiserver in the release image which should offer /readyz. You mentioned in the slack thread that you have tried with "4.3.13" which has /readyz. Also, bootstrap logic seems to be the same for aws/gcp/azure - https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L202 We would like to debug this further. Is it possible to ssh to the bootstrap node and directly probe the kube-apiserver (while installation is in progress)? We expect to see an "ok" response in this case. # curl -k https://localhost:6443/readyz ok This will help us narrow it down. This is not purely an update-time issue, but updates need node reboots, and node reboots need LB target adjustment. So this issue will impact API connectivity on Azure and other platforms which aren't using /readyz. @Abu Kashem
> We would like to debug this further. Is it possible to ssh to the bootstrap node and directly probe the kube-apiserver (while installation is in progress)? We expect to see an "ok" response in this case.
> # curl -k https://localhost:6443/readyz
> ok
I did that yesterday and got a '404 not found' text, hence my message on Slack.
Leaving this as Component Installer but assigning to Abu from apiserver team to define the exact implementation details as the api server team should be outlining exactly how this works. Installer team can help if necessary. Hi jminter, I ran a test and the bootstrap node does offer /readyz with 4.3.13. These are the steps I followed I don't have access to Azure, so I did this on gcp. - kick off a 4.4 cluster with "oc adm release extract --tools quay.io/openshift-release-dev/ocp-release:4.3.13-x86_64" - ssh into the bootstrap node as soon as it comes up in gce console. - run the probe - while true; do curl -k https://localhost:6443/readyz; sleep 2; done [core@akashe-2fxlj-b ~]$ while true; do curl -k https://localhost:6443/readyz; sleep 2; done curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused curl: (7) Failed to connect to localhost port 6443: Connection refused { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "forbidden: User \"system:anonymous\" cannot get path \"/readyz\"", "reason": "Forbidden", "details": { }, "code": 403 }okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok > While we do this, can somebody give more background if https endpoint check was working in the first place at any time?
Yes, kube-apiserver offers /readyz over HTTPS only. To my knowledge, aws and gcp use /readyz probe over https.
Looks like installer#3600 addressed installer-provisioned Azure. Are we spinning off a new bug for vSphere (which, per comment 0, was also not using /readyz) and for user-provisioned Azure (also called out in comment 0)? (In reply to W. Trevor King from comment #13) > Looks like installer#3600 addressed installer-provisioned Azure. Are we > spinning off a new bug for vSphere (which, per comment 0, was also not using > /readyz) and for user-provisioned Azure (also called out in comment 0)? It would be a good idea. I will only able to verify the fix for the IPI on Azure. Spun off into: * Bug 1836016: user-provisioned Azure * Bug 1836017: user-provisioned vSphere * Bug 1836018: user-provisioned AWS Thanks Trevor! This Azure IPI bug fix also depends on https://bugzilla.redhat.com/show_bug.cgi?id=1831760 and it would be best to test those together for any backport. > This Azure IPI bug fix also depends on https://bugzilla.redhat.com/show_bug.cgi?id=1831760 and it would be best to test those together for any backport. Bug 1832137 is already ON_QA in 4.4, so we should be good to go there. I dunno if these are going to go all the way back to 4.3 or not. Verified with: https://openshift-release.svc.ci.openshift.org/releasestream/4.5.0-0.nightly/release/4.5.0-0.nightly-2020-05-14-231228 The cluster was installed without issue and I did not remark any visible issue. Load balancer probes seen after installation: Internal: "name": "sint-probe", "numberOfProbes": 2, "port": 22623, "protocol": "Https", "provisioningState": "Succeeded", "requestPath": "/healthz", "resourceGroup": "qeipi-98s24-rg", "type": "Microsoft.Network/loadBalancers/probes" "name": "api-internal-probe", "numberOfProbes": 2, "port": 6443, "protocol": "Https", "provisioningState": "Succeeded", "requestPath": "/readyz", "resourceGroup": "qeipi-98s24-rg", "type": "Microsoft.Network/loadBalancers/probes" Public: "name": "api-internal-probe", "numberOfProbes": 2, "port": 6443, "protocol": "Https", "provisioningState": "Succeeded", "requestPath": "/readyz", "resourceGroup": "qeipi-98s24-rg", "type": "Microsoft.Network/loadBalancers/probes" *** Bug 1820577 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |