Bug 1365867 - Ceph: Unable to mount volumes for pod : rbd: image is locked by other nodes
Summary: Ceph: Unable to mount volumes for pod : rbd: image is locked by other nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.2.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: hchen
QA Contact: Jianwei Hou
URL:
Whiteboard:
: 1409237 (view as bug list)
Depends On:
Blocks: 1267746
TreeView+ depends on / blocked
 
Reported: 2016-08-10 11:32 UTC by Josep 'Pep' Turro Mauri
Modified: 2023-09-14 03:29 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When an openshift node crashes before umapping a rbd volume, the advisory lock held on the rbd volume is not released and prevents other nodes to map it. Consequence: The rbd volume cannot be used by other nodes unless the advisory lock is manully removed. Fix: If no rbd client is using the rbd volume, the advisory lock is removed automatically. Result: The rbd volume can be used by other nodes without manually removing the lock.
Clone Of:
Environment:
Last Closed: 2017-08-10 05:15:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 7983 0 None None None 2016-08-10 11:32:04 UTC
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Josep 'Pep' Turro Mauri 2016-08-10 11:32:04 UTC
Description of problem:

If a node that mounts ceph PVs crashes, subsequent ceph PV mounts on other nodes are blocked until the locks are manually cleared.

Version-Release number of selected component (if applicable):
Reported in upstream issue referencing origin versions v1.1.3 - 1.3.3.

How reproducible:
I have not had time to reproduce with OSE, but based on the upstream report this should be 100% reproducible

Steps to Reproduce:
1. Have a node running pods that access ceph-backed PVs
2. Crash that node

Actual results:
Pods starting on other nodes that use ceph PVs remain in ContainerCreating status, manual admin intervention required to clean up logs.

Expected results:
Automatic recovery.

Additional info:
Lacking a fencing mechanism I'm not sure how much this can be safely automated

Comment 1 Jan Safranek 2016-08-10 15:03:22 UTC
Huamin, can you please take a look?

Comment 2 hchen 2016-08-10 17:44:37 UTC
this is a known issue, there are card and k8s/openshift issues tracking it[1]. The plan is to use attach/detach to do controller master initiatied to detach call to unlock the lock.

1. https://trello.com/c/Y1j4dTBO/131-bug-the-ceph-rbd-volume-plugin-seems-to-hold-a-lock-when-the-container-fails

Comment 5 hchen 2016-09-29 14:20:39 UTC
Upstream fix is proposed at 
https://github.com/kubernetes/kubernetes/pull/33660

Comment 11 hchen 2017-02-01 15:44:54 UTC
*** Bug 1409237 has been marked as a duplicate of this bug. ***

Comment 20 Jianwei Hou 2017-06-02 08:40:02 UTC
Still reproducible in openshift v3.6.86

Steps:
1. Create StorageClass for rbd provisioner.
2. Create a PVC that dynamically provisions a PV, create a ReplicationController(rc=1). 
3. After Pod is running, stop its the node service.
4. New Pod is recreated in another node, but stuck at status 'ContainerCreating'. Old Pod became 'Unkown'. The new Pod would become 'Running' when original node was recovered or the lock is manually removed.

# oc get pods  
NAME          READY     STATUS              RESTARTS   AGE
rbdpd-8n8zd   0/1       ContainerCreating   0          8m
rbdpd-xwn25   1/1       Unknown             0          1h

# oc describe pod rbdpd-8n8zd
Name:			rbdpd-8n8zd
Namespace:		jhou
Security Policy:	restricted
Node:			ip-172-18-6-78.ec2.internal/172.18.6.78
Start Time:		Fri, 02 Jun 2017 16:12:37 +0800
Labels:			app=rbd
Annotations:		kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"jhou","name":"rbdpd","uid":"31b871f0-475a-11e7-8550-0e259545e72a","api...
			openshift.io/scc=restricted
Status:			Pending
IP:			
Controllers:		ReplicationController/rbdpd
Containers:
  myfrontend:
    Container ID:	
    Image:		jhou/hello-openshift
    Image ID:		
    Port:		80/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /mnt/rbd from pvol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xk75q (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  pvol:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	rbdc
    ReadOnly:	false
  default-token-xk75q:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-xk75q
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----					-------------	--------	------		-------
  9m		9m		1	default-scheduler					Normal		Scheduled	Successfully assigned rbdpd-8n8zd to ip-172-18-6-78.ec2.internal
  9m		1m		12	kubelet, ip-172-18-6-78.ec2.internal			Warning		FailedMount	MountVolume.SetUp failed for volume "kubernetes.io/rbd/3d88d0cc-476b-11e7-8550-0e259545e72a-pvc-3874b558-4757-11e7-8550-0e259545e72a" (spec.Name: "pvc-3874b558-4757-11e7-8550-0e259545e72a") pod "3d88d0cc-476b-11e7-8550-0e259545e72a" (UID: "3d88d0cc-476b-11e7-8550-0e259545e72a") with: rbd: image kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a is locked by other nodes
  7m		56s		4	kubelet, ip-172-18-6-78.ec2.internal			Warning		FailedMount	Unable to mount volumes for pod "rbdpd-8n8zd_jhou(3d88d0cc-476b-11e7-8550-0e259545e72a)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"rbdpd-8n8zd". list of unattached/unmounted volumes=[pvol]
  7m		56s		4	kubelet, ip-172-18-6-78.ec2.internal			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "jhou"/"rbdpd-8n8zd". list of unattached/unmounted volumes=[pvol]



[root@ip-172-18-3-46 ~]# rbd lock list kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a
There is 1 exclusive lock on this image.
Locker      ID                                              Address                
client.4193 kubelet_lock_magic_ip-172-18-1-167.ec2.internal 172.18.1.167:0/1037989 
[root@ip-172-18-3-46 ~]# rbd lock remove kubernetes-dynamic-pvc-387acbcd-4757-11e7-8550-0e259545e72a kubelet_lock_magic_ip-172-18-1-167.ec2.internal client.4193

Comment 23 Jianwei Hou 2017-06-15 08:12:21 UTC
Verified on openshift v3.6.106.

Given the node is down(I shut it down), the replication controller creates a Pod in another functional node and the Pod could become running. The rbd lock does not prevents other nodes from mounting.

Comment 26 errata-xmlrpc 2017-08-10 05:15:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Comment 27 hchen 2017-11-13 14:59:35 UTC
this is resolved through rbd attach/detach refactoring.

Comment 29 Red Hat Bugzilla 2023-09-14 03:29:25 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.