Bug 1957951
Summary: | [aws] destroy can get blocked on instances stuck in shutting-down state | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matthew Staebler <mstaeble> |
Component: | Installer | Assignee: | Aditya Narayanaswamy <anarayan> |
Installer sub component: | openshift-installer | QA Contact: | Yunfei Jiang <yunjiang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | ||
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Some of the instances in AWS were stuck in shutting-down state and were never terminated. In order to make sure that all the instances are removed, a fresh termination will now be requested after 10 minutes to ensure that they are destroyed.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:07:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matthew Staebler
2021-05-06 19:54:49 UTC
Recently, there were 6 separate CI clusters in us-west-2 that were all blocked by a shutting-down instance. Each instance had a State Transition Reason of Server.InternalError. Manually terminating the instances resolved the issue. Hello Matthew, is there a way to reproduce this issue? I don't remember that we met this issue before, I just searched all instances under QE account, they are all `Terminated` or `Running`. Thanks. (In reply to Yunfei Jiang from comment #3) > Hello Matthew, is there a way to reproduce this issue? I don't remember that > we met this issue before, I just searched all instances under QE account, > they are all `Terminated` or `Running`. > > Thanks. I unfortunately do not know of a way to reproduce this issue. It is something that happens very rarely due to AWS issues and not something that we control. Hello Matthew, after this PR merged, have you met the issue again in your side? If this fix works well, I'm going setting status as VERIFIED, since it is related to AWS platform, and can not be reproduced on QE side. (In reply to Yunfei Jiang from comment #5) > Hello Matthew, after this PR merged, have you met the issue again in your > side? If this fix works well, I'm going setting status as VERIFIED, since it > is related to AWS platform, and can not be reproduced on QE side. I only know of 2 cases in the past 6 months where there have been instances stuck shutting down. In both cases, there were multiple instances across multiple clusters, implying a temporary error in AWS itself. I have not seen the issue since the PR merged, but I have no indication whether AWS has had the issue or not since then, unfortunately. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |