Bug 971291 - vdsm: cannot attach imported pre-existing iso domain and export domains
Summary: vdsm: cannot attach imported pre-existing iso domain and export domains
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.3.0
Assignee: Maor
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-06 08:26 UTC by Dafna Ron
Modified: 2016-02-10 16:38 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-01 08:15:54 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.41 MB, application/x-gzip)
2013-06-06 08:26 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2013-06-06 08:26:18 UTC
Created attachment 757551 [details]
logs

Description of problem:

I tried attaching an export domain which is unattached in my other setup and failed with the following error: 

'Error while executing action Attach Storage Domain: Could not obtain lock'

I tried attaching an iso domain which is also unattached in my other setup (although it should not matter if its attached or unattached) and failed with the following error the first time we try to attach but the second and so on vdsm will give an error that the domain does not exist:

Error while executing action Attach Storage Domain: Could not obtain lock


Version-Release number of selected component (if applicable):

vdsm-4.10.3-0.416.git5358ed2.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create an export domain in 3.2 but do not attach it to DC
2. create an iso domainin 3.2 (not relevant if its attached or not)
3. try to import both to 3.3 setup 

Actual results:

 we fail to attach the domains

Expected results:

we should be able to attach the domain

Additional info:log

Comment 2 Maor 2013-06-21 09:39:08 UTC
Fede, do you got any insights about the following error:
AcquireLockFailure: Cannot obtain lock: "id=72ec1321-a114-451f-bee1-6790cbca1bc6, rc=1, out=['error - lease file does not exist or is not writeable', 'usage: /usr/libexec/vdsm/spmprotect.sh COMMAND PARAMETERS', 'Commands:', '  start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries }', 'Parameters:', '  sdUUID -                domain uuid', '  hostId -                host id in pool', '  renewal_interval_sec -  intervals for lease renewals attempts', '  lease_path -            path to lease file/volume', '  offset -                offset of lease within file', '  lease_time_ms -         time limit within which lease must be renewed (at least 2*renewal_interval_sec)', '  io_op_timeout_ms -      I/O operation timeout', '  fail_retries -          Maximal number of attempts to retry to renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)'], err=[]"


It apears that the lease file does not exists or not writeable,
looking at the sanlock logs I didn't found anything related to that issue


From the messages log it seems that during this time cougar12 encountered many I/O errors:

Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 1
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 26214399
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 26214399
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 2
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 3
Jun  6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 3
Jun  6 10:52:10 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:10 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:10 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:10 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:11 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state'
Jun  6 10:52:11 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down
Jun  6 10:52:11 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715072
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715184
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 0
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184
Jun  6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:11 cougar12 kernel: ata1: hard resetting link
Jun  6 10:52:11 cougar12 kernel: ata5: soft resetting link
Jun  6 10:52:11 cougar12 kernel: ata6: soft resetting link
Jun  6 10:52:12 cougar12 kernel: ata5: EH complete
Jun  6 10:52:12 cougar12 kernel: ata6: EH complete
Jun  6 10:52:12 cougar12 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun  6 10:52:12 cougar12 kernel: ata1.00: configured for UDMA/133
Jun  6 10:52:12 cougar12 kernel: ata1: EH complete
Jun  6 10:52:13 cougar12 kernel: device-mapper: table: 253:25: multipath: error getting device
Jun  6 10:52:13 cougar12 kernel: device-mapper: ioctl: error adding target to table
Jun  6 10:52:13 cougar12 kernel: device-mapper: table: 253:25: multipath: error getting device
Jun  6 10:52:13 cougar12 kernel: device-mapper: ioctl: error adding target to table
Jun  6 10:52:13 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device
Jun  6 10:52:13 cougar12 kernel: device-mapper: multipath: Failing path 8:64.
Jun  6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715192
Jun  6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 24
Jun  6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent)
Jun  6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715072
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715184
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 0
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184
Jun  6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0
Jun  6 10:52:14 cougar12 vdsm TaskManager.Task ERROR Task=`06858c48-9d6d-4577-91f3-436c028f79cb`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012    res = f(*args, **kwargs)#012  File "/usr/share/vdsm/storage/hsm.py", line 1099, in attachStorageDomain#012    pool.attachSD(sdUUID)#012  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper#012    return f(self, *args, **kwargs)#012  File "/usr/share/vdsm/storage/sp.py", line 989, in attachSD#012    dom.acquireClusterLock(self.id)#012  File "/usr/share/vdsm/storage/sd.py", line 487, in acquireClusterLock#012    self._clusterLock.acquire(hostID)#012  File "/usr/share/vdsm/storage/clusterlock.py", line 112, in acquire#012    raise se.AcquireLockFailure(self._sdUUID, rc, out, err)#012AcquireLockFailure: Cannot obtain lock: "id=72ec1321-a114-451f-bee1-6790cbca1bc6, rc=1, out=['error - lease file does not exist or is not writeable', 'usage: /usr/libexec/vdsm/spmprotect.sh COMMAND PARAMETERS', 'Commands:', '  start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries }', 'Parameters:', '  sdUUID -                domain uuid', '  hostId -                host id in pool', '  renewal_interval_sec -  intervals for lease renewals attempts', '  lease_path -            path to lease file/volume', '  offset -                offset of lease within file', '  lease_time_ms -         time limit within which lease must be renewed (at least 2*renewal_interval_sec)', '  io_op_timeout_ms -      I/O operation timeout', '  fail_retries -          Maximal number of attempts to retry to renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)'], err=[]"
Jun  6 10:52:15 cougar12 kernel: ata1: hard resetting link
Jun  6 10:52:16 cougar12 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun  6 10:52:16 cougar12 kernel: ata1.00: configured for UDMA/133
Jun  6 10:52:16 cougar12 kernel: ata1: EH complete
Jun  6 10:52:16 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state'
Jun  6 10:52:16 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down
Jun  6 10:52:16 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device
Jun  6 10:52:16 cougar12 kernel: ata5: soft resetting link
Jun  6 10:52:16 cougar12 kernel: ata6: soft resetting link
Jun  6 10:52:16 cougar12 kernel: ata5: EH complete
Jun  6 10:52:16 cougar12 kernel: ata6: EH complete
Jun  6 10:52:21 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state'
Jun  6 10:52:21 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down

Comment 3 Maor 2013-06-23 09:00:28 UTC
Dafna, I saw a similar bug https://bugzilla.redhat.com/842146 with the same error "lease file does not exist or is not writeable".

looking at comment 6 (https://bugzilla.redhat.com/show_bug.cgi?id=842146#c6) it seems that the reason for that was a permission issue.
Was your export domain had permission of root?
Do you think it is a duplicate of 842146?

Comment 5 Maor 2013-07-01 08:15:54 UTC
I tried to reproduce the issue with
vdsm-4.11.0-13.git44ecff5.el6.x86_64
sanlock-2.6-2.el6.x86_64

I succeeded to  import and attach the EXPORT and ISO domains.
from the messages.log it seems there were I/O errors which could probably cause the failure you had encountered (see comment 2).
Closing the bug since it could not reproduce, feel free to open it when reproduce occurs


Note You need to log in before you can comment on or make changes to this bug.