Bug 1523142
Summary: | timeout expired waiting for volumes to attach/mount for pod | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vladislav Walek <vwalek> | |
Component: | Storage | Assignee: | Tomas Smetana <tsmetana> | |
Status: | CLOSED ERRATA | QA Contact: | Qin Ping <piqin> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.7.0 | CC: | aos-bugs, aos-storage-staff, bchilds, cstark, dzhukous, hekumar, nnosenzo, tsmetana | |
Target Milestone: | --- | |||
Target Release: | 3.9.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause:
When a node with OpenStack Cinder type of persistent volume attached was shut down or crashed, the attached volume has never been attached.
Consequence:
The pods could not be successfully migrated from the failed node due to unavailable persistent volumes and the volumes could not be accessed from any other node or pod.
Fix:
The problem was fixed in the OpenShift code.
Result:
When the node fails all its OpenStack Cinder attached volumes are being correctly detached after a time-out.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1590243 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-28 14:14:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1590243 |
Description
Vladislav Walek
2017-12-07 10:02:56 UTC
https://github.com/kubernetes/kubernetes/pull/56846 PR is ready for merge. We are just waiting for someone with approver access to approve it (I have already lgtmed it). yeah I was about to post -he aformentioned patch isn't suppposed to fix Multi-Attach error. It fixes two cases: 1. On Cinder, we were never detaching volumes from shutdown nodes. So if a node was running a DC and you brought it down - then the pod on new node will fail to start. Can we verify if that is fixed? 2. if volume information is lost from A/D controller's ActualStateOfWorld - the patch uses same dangling volume mechanism in AWS to correct the error. What I did: 1. Started up a cluster with 1 master and 2 nodes 2. Created a cinder PVC/PV 3. Created a pod using the PVC 4. Shut down the node the pod was running on and waited for the pod to disappear from the API server 5. Started the same pod (using the same, already attached PV) again I verified the pod came up again. This looks to be the case #1. I guess I need one more test (restarting the controller after the pod disappears). In OCP version: v3.9.0-0.36.0, after 8 minutes, Pod's status becomes to running. In OCP version: v3.7.27, after 22 minutes, Pod's status is ContainerCreating. So, changed bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |