Bug 1717602
| Summary: | incoming machines make csrs indefinitely if original is not approved | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Justin Pierce <jupierce> | |
| Component: | Cloud Compute | Assignee: | Michael Gugino <mgugino> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jianwei Hou <jhou> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.1.0 | CC: | agarcial, bbreard, brad.ison, brad.williams, dustymabe, imcleod, jligon, nstielau | |
| Target Milestone: | --- | |||
| Target Release: | 4.2.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1746513 (view as bug list) | Environment: | ||
| Last Closed: | 2019-08-28 16:18:26 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1746513 | |||
|
Description
Justin Pierce
2019-06-05 19:12:34 UTC
Either node has to be modified to generate new CSRs properly or the CRS approved needs to properly react (e.g. by deleting new requests). Assigning to RHCOS team to decide if this is node issue or not. I don't think that RHCOS is the cause here. I could be wrong, but I believe it's much more likely to be within the code generating and/or approving CSRs. The openshift installer does create a unit called `approve-csr.service` (https://github.com/openshift/installer/blob/master/data/data/bootstrap/systemd/units/approve-csr.service) which executes `/usr/local/bin/approve-csr.sh` (https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/approve-csr.sh). Moving to the installer team for further review. cluster-machine-approver as i've been told under cloud team... so moving.. Justin Pierce, can we get an answer for "did this happen because they have Disabled bootstrap csr approval?" and machine approval logs? Thanks! This did not happen because the csr approver was disabled. I offered that as a method to reproduce the issue easily. The reproduce normally requires hitting an AWS limit or scaling machines extremely rapidly. There are a couple of issues at play here. I believe that this bug as originally filed was due to the time-restriction element of CSR approval as evidenced by comment #8. If a machine-object is created, but the machine-controller fails to provision an actual VM due to api quota or other temporary condition, the CSRs will never be approved. This has been (partially) addressed here by moving the time limit to 2 hours instead of 10 minutes: https://github.com/openshift/cluster-machine-approver/pull/37 Long term, we should try to capture network-address creation time. This allows for a long window between machine creation and CSR approval, but a very tight window between actual instance provisioning and CSR approval, and would support a variety of use-cases. The other issue is too many machines being scaled at once, a potential fix was referenced here: https://github.com/openshift/cluster-machine-approver/pull/33 Unfortunately, we didn't come to a consensus as to how best fix the issue. In any case, I don't believe this issue is a release blocker as work-arounds exist and this is not particular to 4.2. We think this is fixed in the 4.2 release. I've cloned it for the 4.1.z stream: https://bugzilla.redhat.com/show_bug.cgi?id=1746513 |