Description of problem: If a node that mounts ceph PVs crashes, subsequent ceph PV mounts on other nodes are blocked until the locks are manually cleared. Version-Release number of selected component (if applicable): Reported in upstream issue referencing origin versions v1.1.3 - 1.3.3. How reproducible: I have not had time to reproduce with OSE, but based on the upstream report this should be 100% reproducible Steps to Reproduce: 1. Have a node running pods that access ceph-backed PVs 2. Crash that node Actual results: Pods starting on other nodes that use ceph PVs remain in ContainerCreating status, manual admin intervention required to clean up logs. Expected results: Automatic recovery. Additional info: Lacking a fencing mechanism I'm not sure how much this can be safely automated
Huamin, can you please take a look?
this is a known issue, there are card and k8s/openshift issues tracking it[1]. The plan is to use attach/detach to do controller master initiatied to detach call to unlock the lock. 1. https://trello.com/c/Y1j4dTBO/131-bug-the-ceph-rbd-volume-plugin-seems-to-hold-a-lock-when-the-container-fails
https://github.com/kubernetes/kubernetes/pull/12502/files
Upstream fix is proposed at https://github.com/kubernetes/kubernetes/pull/33660
33660 depends on the following https://github.com/kubernetes/kubernetes/pull/35433 https://github.com/kubernetes/kubernetes/pull/35434
*** Bug 1409237 has been marked as a duplicate of this bug. ***
Still reproducible in openshift v3.6.86 Steps: 1. Create StorageClass for rbd provisioner. 2. Create a PVC that dynamically provisions a PV, create a ReplicationController(rc=1). 3. After Pod is running, stop its the node service. 4. New Pod is recreated in another node, but stuck at status 'ContainerCreating'. Old Pod became 'Unkown'. The new Pod would become 'Running' when original node was recovered or the lock is manually removed. # oc get pods NAME READY STATUS RESTARTS AGE rbdpd-8n8zd 0/1 ContainerCreating 0 8m rbdpd-xwn25 1/1 Unknown 0 1h # oc describe pod rbdpd-8n8zd Name: rbdpd-8n8zd Namespace: jhou Security Policy: restricted Node: ip-172-18-6-78.ec2.internal/172.18.6.78 Start Time: Fri, 02 Jun 2017 16:12:37 +0800 Labels: app=rbd Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"jhou","name":"rbdpd","uid":"31b871f0-475a-11e7-8550-0e259545e72a","api... openshift.io/scc=restricted Status: Pending IP: Controllers: ReplicationController/rbdpd Containers: myfrontend: Container ID: Image: jhou/hello-openshift Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/rbd from pvol (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-xk75q (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: pvol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: rbdc ReadOnly: false default-token-xk75q: Type: Secret (a volume populated by a Secret) SecretName: default-token-xk75q Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 9m 9m 1 default-scheduler Normal Scheduled Successfully assigned rbdpd-8n8zd to ip-172-18-6-78.ec2.internal 9m 1m 12 kubelet, ip-172-18-6-78.ec2.internal Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/rbd/3d88d0cc-476b-11e7-8550-0e259545e72a-pvc-3874b558-4757-11e7-8550-0e259545e72a" (spec.Name: "pvc-3874b558-4757-11e7-8550-0e259545e72a") pod "3d88d0cc-476b-11e7-8550-0e259545e72a" (UID: "3d88d0cc-476b-11e7-8550-0e259545e72a") with: rbd: image kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a is locked by other nodes 7m 56s 4 kubelet, ip-172-18-6-78.ec2.internal Warning FailedMount Unable to mount volumes for pod "rbdpd-8n8zd_jhou(3d88d0cc-476b-11e7-8550-0e259545e72a)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"rbdpd-8n8zd". list of unattached/unmounted volumes=[pvol] 7m 56s 4 kubelet, ip-172-18-6-78.ec2.internal Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "jhou"/"rbdpd-8n8zd". list of unattached/unmounted volumes=[pvol] [root@ip-172-18-3-46 ~]# rbd lock list kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a There is 1 exclusive lock on this image. Locker ID Address client.4193 kubelet_lock_magic_ip-172-18-1-167.ec2.internal 172.18.1.167:0/1037989 [root@ip-172-18-3-46 ~]# rbd lock remove kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a kubelet_lock_magic_ip-172-18-1-167.ec2.internal client.4193
Verified on openshift v3.6.106. Given the node is down(I shut it down), the replication controller creates a Pod in another functional node and the Pod could become running. The rbd lock does not prevents other nodes from mounting.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716
this is resolved through rbd attach/detach refactoring.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days