Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1523142 - timeout expired waiting for volumes to attach/mount for pod
timeout expired waiting for volumes to attach/mount for pod
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage (Show other bugs)
3.7.0
Unspecified Unspecified
high Severity high
: ---
: 3.9.0
Assigned To: Tomas Smetana
Qin Ping
:
Depends On:
Blocks: 1590243
  Show dependency treegraph
 
Reported: 2017-12-07 05:02 EST by Vladislav Walek
Modified: 2018-06-12 05:53 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When a node with OpenStack Cinder type of persistent volume attached was shut down or crashed, the attached volume has never been attached. Consequence: The pods could not be successfully migrated from the failed node due to unavailable persistent volumes and the volumes could not be accessed from any other node or pod. Fix: The problem was fixed in the OpenShift code. Result: When the node fails all its OpenStack Cinder attached volumes are being correctly detached after a time-out.
Story Points: ---
Clone Of:
: 1590243 (view as bug list)
Environment:
Last Closed: 2018-03-28 10:14:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:14 EDT

  None (edit)
Description Vladislav Walek 2017-12-07 05:02:56 EST
Description of problem:

Cinder volumes taking to much time to be reloaded.
Related PR in github for k8s:
https://github.com/kubernetes/kubernetes/pull/56846

Possibly related to 
https://bugzilla.redhat.com/show_bug.cgi?id=1481729

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.7

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Comment 9 Hemant Kumar 2018-01-10 16:54:08 EST
https://github.com/kubernetes/kubernetes/pull/56846 PR is ready for merge. We are just waiting for someone with approver access to approve it (I have already lgtmed it).
Comment 22 Hemant Kumar 2018-01-17 07:49:56 EST
yeah I was about to post -he aformentioned patch isn't suppposed to fix Multi-Attach error. It fixes two cases:

1. On Cinder, we were never detaching volumes from shutdown nodes. So if a node was running a DC and you brought it down - then the pod on new node will fail to start. Can we verify if that is fixed?
2. if volume information is lost from A/D controller's ActualStateOfWorld - the patch uses same dangling volume mechanism in AWS to correct the error.
Comment 23 Tomas Smetana 2018-01-17 08:06:32 EST
What I did:

1. Started up a cluster with 1 master and 2 nodes
2. Created a cinder PVC/PV
3. Created a pod using the PVC
4. Shut down the node the pod was running on and waited for the pod to disappear from the API server
5. Started the same pod (using the same, already attached PV) again

I verified the pod came up again. This looks to be the case #1. I guess I need one more test (restarting the controller after the pod disappears).
Comment 25 Tomas Smetana 2018-01-17 12:01:28 EST
https://github.com/openshift/origin/pull/18140
Comment 27 Qin Ping 2018-02-05 02:21:03 EST
In OCP version: v3.9.0-0.36.0, after 8 minutes, Pod's status becomes to running.
In OCP version: v3.7.27, after 22 minutes, Pod's status is ContainerCreating.

So, changed bug to verified.
Comment 30 errata-xmlrpc 2018-03-28 10:14:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.