Bug 1003958

Summary: cinder: locks on cinder vg when we have a stuck task prevents from running other tasks in the system
Product: Red Hat OpenStack Reporter: Dafna Ron <dron>
Component: openstack-cinderAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, eharney, hateya, yeylon
Target Milestone: ---   
Target Release: 4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-08 11:23:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log none

Description Dafna Ron 2013-09-03 14:54:30 UTC
Created attachment 793232 [details]
log

Description of problem:

because of a bug in delete snapshot I had several stuck tasks for delete snapshots. 
I tried running create volume task which also got stuck and running a simple vgs on the system will get stuck as well. 
if I terminate the vgs we can see that it's a lock issue with lvn: 

[root@opens-vdsb lvm(keystone_admin)]# vgs
  /dev/cinder-volumes/_snapshot-69f4bd94-f7df-40bf-94ba-f9e1f6661a3f: read failed after 0 of 4096 at 10737352704: Input/output error
  /dev/cinder-volumes/_snapshot-69f4bd94-f7df-40bf-94ba-f9e1f6661a3f: read failed after 0 of 4096 at 10737410048: Input/output error
  /dev/cinder-volumes/_snapshot-69f4bd94-f7df-40bf-94ba-f9e1f6661a3f: read failed after 0 of 4096 at 0: Input/output error
  /dev/cinder-volumes/_snapshot-69f4bd94-f7df-40bf-94ba-f9e1f6661a3f: read failed after 0 of 4096 at 4096: Input/output error
  /dev/cinder-volumes/_snapshot-24d4e530-5eef-4f4c-8b3b-c1dd51f1b295: read failed after 0 of 4096 at 10737352704: Input/output error
  /dev/cinder-volumes/_snapshot-24d4e530-5eef-4f4c-8b3b-c1dd51f1b295: read failed after 0 of 4096 at 10737410048: Input/output error
  /dev/cinder-volumes/_snapshot-24d4e530-5eef-4f4c-8b3b-c1dd51f1b295: read failed after 0 of 4096 at 0: Input/output error
  /dev/cinder-volumes/_snapshot-24d4e530-5eef-4f4c-8b3b-c1dd51f1b295: read failed after 0 of 4096 at 4096: Input/output error
  /dev/cinder-volumes/_snapshot-0f48b93d-32bb-4078-89be-76ff6468ee97: read failed after 0 of 4096 at 10737352704: Input/output error
  /dev/cinder-volumes/_snapshot-0f48b93d-32bb-4078-89be-76ff6468ee97: read failed after 0 of 4096 at 10737410048: Input/output error
  /dev/cinder-volumes/_snapshot-0f48b93d-32bb-4078-89be-76ff6468ee97: read failed after 0 of 4096 at 0: Input/output error
  /dev/cinder-volumes/_snapshot-0f48b93d-32bb-4078-89be-76ff6468ee97: read failed after 0 of 4096 at 4096: Input/output error
  /dev/cinder-volumes/_snapshot-5a14b4c4-9a14-4de6-abd2-236208bc1429: read failed after 0 of 4096 at 10737352704: Input/output error
  /dev/cinder-volumes/_snapshot-5a14b4c4-9a14-4de6-abd2-236208bc1429: read failed after 0 of 4096 at 10737410048: Input/output error
  /dev/cinder-volumes/_snapshot-5a14b4c4-9a14-4de6-abd2-236208bc1429: read failed after 0 of 4096 at 0: Input/output error
  /dev/cinder-volumes/_snapshot-5a14b4c4-9a14-4de6-abd2-236208bc1429: read failed after 0 of 4096 at 4096: Input/output error
  /dev/cinder-volumes/_snapshot-9a68a5b8-781c-4661-be5c-6a791ff79370: read failed after 0 of 4096 at 10737352704: Input/output error
  /dev/cinder-volumes/_snapshot-9a68a5b8-781c-4661-be5c-6a791ff79370: read failed after 0 of 4096 at 10737410048: Input/output error
  /dev/cinder-volumes/_snapshot-9a68a5b8-781c-4661-be5c-6a791ff79370: read failed after 0 of 4096 at 0: Input/output error
  /dev/cinder-volumes/_snapshot-9a68a5b8-781c-4661-be5c-6a791ff79370: read failed after 0 of 4096 at 4096: Input/output error


^C  CTRL-c detected: giving up waiting for lock
  /var/lock/lvm/V_cinder-volumes:aux: flock failed: Interrupted system call
  Can't get lock for cinder-volumes
  VG   #PV #LV #SN Attr   VSize   VFree
  vg0    1   2   0 wz--n- 232.69g    0 

Version-Release number of selected component (if applicable):

openstack-cinder-2013.1.3-2.el6ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. install Grizzly/2013-08-29.1
2. create a volume + snapshots
3. delete the snapshots
4. create a new volume 
5. run vgs

Actual results:

since the delete snapshots tasks are stuck, all other tasks are stuck as well (create or even simple vgs). seems like a lock issue on vg during create/delete of tasks 

Expected results:

just because one task is stuck does not mean that all others should get stuck as well. 
we need to change the way we take a lock on vg during delete/create and create a time out in case an exclusive lock is stuck. 

Additional info: cinder volumes log

Comment 1 Dafna Ron 2013-09-03 14:55:20 UTC
https://bugs.launchpad.net/cinder/+bug/1220275

Comment 2 Ayal Baron 2013-09-08 11:23:00 UTC
The VG lock is taken automatically by LVM (which is correct).
lvcreate, lvdelete etc all change the VG metadata hence must take the VG lock.