Bug 1710168
Summary: | [e2e] Attach followed by a detach of same volume is stuck waiting for volume attachment on old device path | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> |
Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | aos-bugs, aos-storage-staff |
Target Milestone: | --- | ||
Target Release: | 4.1.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-15 14:24:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Hemant Kumar
2019-05-15 03:16:19 UTC
This is not a blocker. We will fix this in 4.1.z okay, I think I know why Kubelet saw old device Path. The problem is - when attach for second pod failed with "Error attaching EBS volume \"vol-04f136c2462a930a1\"" to instance "i-018e05c99167cc3f9" since volume is currently attached to "i-018e05c99167cc3f9, this caused an dangling volume error from AWS. The dangling volume error is supposed to auto-correct any volumes which are attached to a node that it shouldn't be. But in this case - the dangling volume error was triggered by the fact that, volume was not actually attached to the node but because `describeVolume` call for 2nd attachment returned stale data (because of AWS eventual consistency). So, dangling volume exception caused volume to be added to actual state of world (so as it can be detached later), but this caused volume to be reported to node's Status field. And that caused second pod to assume volume as attached to invalid device Path and hence 2nd pod was stuck for 10 minutes waiting for volume to attach on a device path (/dev/xvdbl) which does not exist. Logged https://github.com/kubernetes/kubernetes/issues/77949 in upstream to track it and discuss a fix. Opened a PR for fixing this - https://github.com/kubernetes/kubernetes/pull/78595 Fix for openshift. Already merged - https://github.com/openshift/origin/pull/23013 @Hemant Could you direct QE how to reproduce and/or verify this bug ? Thanks. Checked the builds in https://openshift-gce-devel.appspot.com/builds/origin-ci-test/pr-logs/pull/22819/pull-ci-openshift-origin-master-e2e-aws/ All the builds after 8711 (the failed one which is reported in #comment 0), 8716, 8720, 8722, 8725 are passed. Move bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2417 |