Bug 1523142 - timeout expired waiting for volumes to attach/mount for pod
Summary: timeout expired waiting for volumes to attach/mount for pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Tomas Smetana
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks: 1590243
TreeView+ depends on / blocked
 
Reported: 2017-12-07 10:02 UTC by Vladislav Walek
Modified: 2018-06-12 09:53 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When a node with OpenStack Cinder type of persistent volume attached was shut down or crashed, the attached volume has never been attached. Consequence: The pods could not be successfully migrated from the failed node due to unavailable persistent volumes and the volumes could not be accessed from any other node or pod. Fix: The problem was fixed in the OpenShift code. Result: When the node fails all its OpenStack Cinder attached volumes are being correctly detached after a time-out.
Clone Of:
: 1590243 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:14:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:14:59 UTC

Description Vladislav Walek 2017-12-07 10:02:56 UTC
Description of problem:

Cinder volumes taking to much time to be reloaded.
Related PR in github for k8s:
https://github.com/kubernetes/kubernetes/pull/56846

Possibly related to 
https://bugzilla.redhat.com/show_bug.cgi?id=1481729

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.7

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 9 Hemant Kumar 2018-01-10 21:54:08 UTC
https://github.com/kubernetes/kubernetes/pull/56846 PR is ready for merge. We are just waiting for someone with approver access to approve it (I have already lgtmed it).

Comment 22 Hemant Kumar 2018-01-17 12:49:56 UTC
yeah I was about to post -he aformentioned patch isn't suppposed to fix Multi-Attach error. It fixes two cases:

1. On Cinder, we were never detaching volumes from shutdown nodes. So if a node was running a DC and you brought it down - then the pod on new node will fail to start. Can we verify if that is fixed?
2. if volume information is lost from A/D controller's ActualStateOfWorld - the patch uses same dangling volume mechanism in AWS to correct the error.

Comment 23 Tomas Smetana 2018-01-17 13:06:32 UTC
What I did:

1. Started up a cluster with 1 master and 2 nodes
2. Created a cinder PVC/PV
3. Created a pod using the PVC
4. Shut down the node the pod was running on and waited for the pod to disappear from the API server
5. Started the same pod (using the same, already attached PV) again

I verified the pod came up again. This looks to be the case #1. I guess I need one more test (restarting the controller after the pod disappears).

Comment 25 Tomas Smetana 2018-01-17 17:01:28 UTC
https://github.com/openshift/origin/pull/18140

Comment 27 Qin Ping 2018-02-05 07:21:03 UTC
In OCP version: v3.9.0-0.36.0, after 8 minutes, Pod's status becomes to running.
In OCP version: v3.7.27, after 22 minutes, Pod's status is ContainerCreating.

So, changed bug to verified.

Comment 30 errata-xmlrpc 2018-03-28 14:14:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.