Bug 1684547
Summary: | kube-apiserver certificate rotation causes API service impact | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sebastian Jug <sejug> |
Component: | Master | Assignee: | David Eads <deads> |
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.1.0 | CC: | akrzos, aos-bugs, ekuric, florin-alexandru.peter, hongkliu, jeder, jmencak, jokerman, maszulik, mifiedle, mmccomas, nelluri, sponnaga, wsun, xtian, xxia |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | 4.1.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | aos-scalability-41 | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:44:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sebastian Jug
2019-03-01 14:14:42 UTC
Sounds similar to bug 1678847 ? That bug has more comments and logs fyi @Xingxing I agree it's the same symptom. I suppose the difference is that we now expect the kube-apiserver pods to restart, but we want to ensure that it doesn't cause apiserver outages & errors. Bug 1684602 appears to "avoid unnecessary restarts" We are going to avoid restarts on cert rotations before we ship 4.0. David is working on dynamic cert reloading. Just so I'm clear "dynamic cert reloading" makes it so that a client connection to the apiserver having it's cert reloaded will *not* be interrupted? We'll see how the golang stack handles it. If the default stack doesn't terminate connections, they'll remain open. If it does terminate connections, then the connections will be broken. *** Bug 1688503 has been marked as a duplicate of this bug. *** The fix landed in https://github.com/openshift/origin/pull/22322 and https://github.com/openshift/installer/pull/1421 Same reliability bug 1678847 is closed. This bug 1684547 should be kept for verification. Hongkai Liu, could you help check this bug? (In reply to Xingxing Xia from comment #12) > Same reliability bug 1678847 is closed. This bug 1684547 should be kept for > verification. Hongkai Liu, could you help check this bug? Hi Sebastian, can you help verify if the fug is fixed? The PRs in Comment 11 have been merged. Thanks. (In reply to Hongkai Liu from comment #13) > (In reply to Xingxing Xia from comment #12) > > Same reliability bug 1678847 is closed. This bug 1684547 should be kept for > > verification. Hongkai Liu, could you help check this bug? > > Hi Sebastian, > > can you help verify if the fug is fixed? The PRs in Comment 11 have been > merged. > > Thanks. Yes I see that, what build are the fixes in? Both PRs got merged 3/4 days ago. My best guess would be just using the latest nightly builds. ^_^ Hongkai, Sebastian, after fix landing to payload please help verify this bug considering it is a reliable issue for which SVT team seems better to check, thank you! Thanks, Xingxing. I will keep checking if the PRs are included. Glad to see the commands to check. I did not know before. Checked the latest green build for the moment: 4.0.0-0.nightly-2019-03-20-153904 The PR is not there yet. # BUILD_TAG=4.0.0-0.nightly-2019-03-20-153904 # IMAGE_NAME=hyperkube # oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:${BUILD_TAG} | grep "${IMAGE_NAME}" hyperkube https://github.com/openshift/ose bfd0e7ce8aa0777eb7d8022bee8eb831c08ecb28 # COMMIT_HASH=bfd0e7ce8aa0777eb7d8022bee8eb831c08ecb28 # PR_NUMBER=#22322 # git clone https://github.com/openshift/ose # cd ose/ # git log --oneline "${COMMIT_HASH}" | grep "${PR_NUMBER}" FYI, all above PRs landed in 4.0.0-0.nightly-2019-03-23-222829 (latest Accepted one as of now), please have a check, thanks. Please help check if it could be verified,thanks Hi Sebastian, Please help verify. Thanks. Yes... I'm not having any luck installing new builds. I was able to get the clusters up yesterday afternoon but now the issue is that there's no way to manually trigger cert rotation and as of now no user configurable way to change duration either. With the increase of cert rotation to 1 month, this is not very easy to verify. Some workarounds were tried without success. However, since we do have the 31 day cert rotation and we know from bug 1688820 that we can run commands without error for > 24 hours, I am removing the BetaBlocker flag. The current state of things should be fine for beta customers. Verified the /readyz endpoint is available and returning the correct status. Build verified on 4.0.0-0.nightly-2019-03-28-030453. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |