Description of problem: Cinder volumes taking to much time to be reloaded. Related PR in github for k8s: https://github.com/kubernetes/kubernetes/pull/56846 Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1481729 Version-Release number of selected component (if applicable): OpenShift Container Platform 3.7 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
https://github.com/kubernetes/kubernetes/pull/56846 PR is ready for merge. We are just waiting for someone with approver access to approve it (I have already lgtmed it).
yeah I was about to post -he aformentioned patch isn't suppposed to fix Multi-Attach error. It fixes two cases: 1. On Cinder, we were never detaching volumes from shutdown nodes. So if a node was running a DC and you brought it down - then the pod on new node will fail to start. Can we verify if that is fixed? 2. if volume information is lost from A/D controller's ActualStateOfWorld - the patch uses same dangling volume mechanism in AWS to correct the error.
What I did: 1. Started up a cluster with 1 master and 2 nodes 2. Created a cinder PVC/PV 3. Created a pod using the PVC 4. Shut down the node the pod was running on and waited for the pod to disappear from the API server 5. Started the same pod (using the same, already attached PV) again I verified the pod came up again. This looks to be the case #1. I guess I need one more test (restarting the controller after the pod disappears).
https://github.com/openshift/origin/pull/18140
In OCP version: v3.9.0-0.36.0, after 8 minutes, Pod's status becomes to running. In OCP version: v3.7.27, after 22 minutes, Pod's status is ContainerCreating. So, changed bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489