Bug 1772189
Summary: | Failed to create client pod: timed out waiting for the condition | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michal Fojtik <mfojtik> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.3.0 | CC: | aos-bugs, jsafrane, lxia |
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-04 11:15:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michal Fojtik
2019-11-13 20:50:58 UTC
Something is wrong with attach. A pod that uses a volume was deleted and another pod that uses the same volume was scheduled to the same node: Nov 13 12:12:27.676: INFO: At 2019-11-13 12:02:10 +0000 UTC - event for aws-client: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "aws-volume-0" : disk attachment of "aws://us-east-1a/vol-01a363c6613e81301" to "ip-10-0-137-61.ec2.internal" failed: requested device "/dev/xvdbn" but found "/dev/xvdbb" AWS eventual consistency strikes again? In controller-manager logs, I can see: 1. The volume gets detached I1113 12:02:08.141511 1 operation_generator.go:558] DetachVolume.Detach succeeded for volume "aws-volume-0" (UniqueName: "kubernetes.io/aws-ebs/aws://us-east-1a/vol-01a363c6613e81301") on node "ip-10-0-137-61.ec2.internal" 2. Immediately after that A/D controller tries to attach the volume again: I1113 12:02:08.258146 1 aws.go:1942] Assigned mount device bn -> volume vol-01a363c6613e81301 ... I1113 12:02:08.584687 1 aws.go:2321] AttachVolume volume="vol-01a363c6613e81301" instance="i-096f624460ef7d2f9" request returned { AttachTime: 2019-11-13 12:02:08.564 +0000 UTC, Device: "/dev/xvdbn", InstanceId: "i-096f624460ef7d2f9", State: "attaching", VolumeId: "vol-01a363c6613e81301" } ... I1113 12:02:10.716451 1 aws.go:1965] Releasing in-process attachment entry: bn -> volume vol-01a363c6613e81301 3. DescribeVolumes shows that the volume is still attached as xvdbb (= the attachment from 1.): E1113 12:02:10.716596 1 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1a/vol-01a363c6613e81301\"" failed. No retries permitted until 2019-11-13 12:02:11.216560511 +0000 UTC m=+2010.013859030 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume \"aws-volume-0\" (UniqueName: \"kubernetes.io/aws-ebs/aws://us-east-1a/vol-01a363c6613e81301\") from node \"ip-10-0-137-61.ec2.internal\" : disk attachment of \"aws://us-east-1a/vol-01a363c6613e81301\" to \"ip-10-0-137-61.ec2.internal\" failed: requested device \"/dev/xvdbn\" but found \"/dev/xvdbb\"" 4. A/D controller tries again. This time, the old attachment is already gone, but AWS plugin sees the volume is already attached: W1113 12:02:11.399486 1 aws.go:1906] Got assignment call for already-assigned volume: bn@vol-01a363c6613e81301, volume status: attaching ... W1113 12:02:12.505496 1 aws.go:1906] Got assignment call for already-assigned volume: bn@vol-01a363c6613e81301, volume status: attached ... We can see that the volume is already attached, but the AWS plugin cannot be sure if it's the old attachment (that's being torn down) or the new one we're waiting for. Upstream PR: https://bugzilla.redhat.com/show_bug.cgi?id=1774582 Sorry, this is the upstream PR: https://github.com/kubernetes/kubernetes/pull/85675 Passed on version 4.4.0-0.nightly-2020-02-02-201619 True False 4h22m Cluster version is 4.4.0-0.nightly-2020-02-02-201619 1.Create pvc, pod1 2.oc delete pod pod1;oc create -f pod.yaml Tried several times. did not meet above issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |