Bug 1542781
| Summary: | Pod with Azure Persistent Volume stuck in "Container creating" after node shutdown | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Greg Rodriguez II <grodrigu> |
| Component: | Storage | Assignee: | hchen |
| Status: | CLOSED ERRATA | QA Contact: | Wenqi He <wehe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.6.0 | CC: | aos-bugs, aos-storage-staff, bchilds, erich, jcrumple, rhowe, smunilla |
| Target Milestone: | --- | Keywords: | Reopened, UpcomingRelease |
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-28 14:26:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This is similar to a known issue in Cinder volume [1]. 1. https://github.com/kubernetes/kubernetes/issues/57497 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |
Description of problem: We have an OCP 3.6 cluster in Azure. The nodes, master API and master controller are configured for Azure, and we use dynamically provisioned Azure disks as persistent volumes. A PostgreSQL pod with an Azure disk as a persistent volume was running on node01. As a test, we shut down node01 in the Azure portal. OpenShift tried to start a new pod on node03, but is unable to attach the PV to node03. In the log file of node03, we see this repeatedly: Jan 18 17:16:53 node03 atomic-openshift-node[1506]: E0118 17:16:53.340451 1506 kubelet.go:1556] Unable to mount volumes for pod "postgresql-1-flt8m_testprj2(0533a2bc-fbf6-11e7-b2a9-000d3a3612dc)": timeout expired waiting for volumes to attach/mount for pod "testprj2"/"postgresql-1-flt8m". list of unattached/unmounted volumes=[postgresql-data]; skipping pod Jan 18 17:16:53 node03 atomic-openshift-node[1506]: E0118 17:16:53.340495 1506 pod_workers.go:182] Error syncing pod 0533a2bc-fbf6-11e7-b2a9-000d3a3612dc ("postgresql-1-flt8m_testprj2(0533a2bc-fbf6-11e7-b2a9-000d3a3612dc)"), skipping: timeout expired waiting for volumes to attach/mount for pod "testprj2"/"postgresql-1-flt8m". list of unattached/unmounted volumes=[postgresql-data] On the active master controller, we see these log entries: Jan 18 16:16:54 master1 atomic-openshift-master-controllers[39459]: I0118 16:16:54.091419 39459 actual_state_of_world.go:310] Volume "kubernetes.io/azure-disk/kubernetes-dynamic-pvc-ed84ece7-fbf4-11e7-b20f-000d3a371192.vhd" is already added to attachedVolume list to node "node01", update device path "2" Jan 18 16:16:54 master1 atomic-openshift-master-controllers[39459]: W0118 16:16:54.092262 39459 reconciler.go:269] (Volume : "kubernetes.io/azure-disk/kubernetes-dynamic-pvc-ed84ece7-fbf4-11e7-b20f-000d3a371192.vhd") from node "node03" failed to attach - volume is already exclusively attached to another node Jan 18 16:16:54 master1 atomic-openshift-master-controllers[39459]: I0118 16:16:54.092553 39459 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"testprj2", Name:"postgresql-1-flt8m", UID:"0533a2bc-fbf6-11e7-b2a9-000d3a3612dc", APIVersion:"v1", ResourceVersion:"8340", FieldPath:""}): type: 'Warning' reason: 'FailedAttachVolume' (Volume : "kubernetes.io/azure-disk/kubernetes-dynamic-pvc-ed84ece7-fbf4-11e7-b20f-000d3a371192.vhd") from node "node03" failed to attach - volume is already exclusively attached to another node Jan 18 16:16:54 master1 atomic-openshift-master-controllers[39459]: I0118 16:16:54.107198 39459 node_status_updater.go:136] Updating status for node "node01" succeeded. patchBytes: "{}" VolumesAttached: [{kubernetes.io/azure-disk/kubernetes-dynamic-pvc-ed84ece7-fbf4-11e7-b20f-000d3a371192.vhd 2}] The annotation volumes.kubernetes.io/controller-managed-attach-detach=true is set for all nodes, so the master controller should be able to detach the volume from node01. Version-Release number of selected component (if applicable): openshift v3.6.173.0.83 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 How reproducible: Customer verified repeatable Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: