Created attachment 757551 [details] logs Description of problem: I tried attaching an export domain which is unattached in my other setup and failed with the following error: 'Error while executing action Attach Storage Domain: Could not obtain lock' I tried attaching an iso domain which is also unattached in my other setup (although it should not matter if its attached or unattached) and failed with the following error the first time we try to attach but the second and so on vdsm will give an error that the domain does not exist: Error while executing action Attach Storage Domain: Could not obtain lock Version-Release number of selected component (if applicable): vdsm-4.10.3-0.416.git5358ed2.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. create an export domain in 3.2 but do not attach it to DC 2. create an iso domainin 3.2 (not relevant if its attached or not) 3. try to import both to 3.3 setup Actual results: we fail to attach the domains Expected results: we should be able to attach the domain Additional info:log
Fede, do you got any insights about the following error: AcquireLockFailure: Cannot obtain lock: "id=72ec1321-a114-451f-bee1-6790cbca1bc6, rc=1, out=['error - lease file does not exist or is not writeable', 'usage: /usr/libexec/vdsm/spmprotect.sh COMMAND PARAMETERS', 'Commands:', ' start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries }', 'Parameters:', ' sdUUID - domain uuid', ' hostId - host id in pool', ' renewal_interval_sec - intervals for lease renewals attempts', ' lease_path - path to lease file/volume', ' offset - offset of lease within file', ' lease_time_ms - time limit within which lease must be renewed (at least 2*renewal_interval_sec)', ' io_op_timeout_ms - I/O operation timeout', ' fail_retries - Maximal number of attempts to retry to renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)'], err=[]" It apears that the lease file does not exists or not writeable, looking at the sanlock logs I didn't found anything related to that issue From the messages log it seems that during this time cougar12 encountered many I/O errors: Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 1 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 26214399 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 26214399 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 0 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 2 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 3 Jun 6 10:52:10 cougar12 kernel: Buffer I/O error on device dm-18, logical block 3 Jun 6 10:52:10 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:10 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:10 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:10 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:11 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state' Jun 6 10:52:11 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down Jun 6 10:52:11 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715072 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715184 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-3, sector 0 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184 Jun 6 10:52:11 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:11 cougar12 kernel: ata1: hard resetting link Jun 6 10:52:11 cougar12 kernel: ata5: soft resetting link Jun 6 10:52:11 cougar12 kernel: ata6: soft resetting link Jun 6 10:52:12 cougar12 kernel: ata5: EH complete Jun 6 10:52:12 cougar12 kernel: ata6: EH complete Jun 6 10:52:12 cougar12 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jun 6 10:52:12 cougar12 kernel: ata1.00: configured for UDMA/133 Jun 6 10:52:12 cougar12 kernel: ata1: EH complete Jun 6 10:52:13 cougar12 kernel: device-mapper: table: 253:25: multipath: error getting device Jun 6 10:52:13 cougar12 kernel: device-mapper: ioctl: error adding target to table Jun 6 10:52:13 cougar12 kernel: device-mapper: table: 253:25: multipath: error getting device Jun 6 10:52:13 cougar12 kernel: device-mapper: ioctl: error adding target to table Jun 6 10:52:13 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device Jun 6 10:52:13 cougar12 kernel: device-mapper: multipath: Failing path 8:64. Jun 6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715192 Jun 6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:13 cougar12 kernel: end_request: I/O error, dev dm-18, sector 24 Jun 6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:13 cougar12 multipathd: dm-25: remove map (uevent) Jun 6 10:52:13 cougar12 multipathd: dm-25: devmap not registered, can't remove Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715072 Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 209715184 Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-3, sector 0 Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715072 Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 209715184 Jun 6 10:52:14 cougar12 kernel: end_request: I/O error, dev dm-18, sector 0 Jun 6 10:52:14 cougar12 vdsm TaskManager.Task ERROR Task=`06858c48-9d6d-4577-91f3-436c028f79cb`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 1099, in attachStorageDomain#012 pool.attachSD(sdUUID)#012 File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper#012 return f(self, *args, **kwargs)#012 File "/usr/share/vdsm/storage/sp.py", line 989, in attachSD#012 dom.acquireClusterLock(self.id)#012 File "/usr/share/vdsm/storage/sd.py", line 487, in acquireClusterLock#012 self._clusterLock.acquire(hostID)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 112, in acquire#012 raise se.AcquireLockFailure(self._sdUUID, rc, out, err)#012AcquireLockFailure: Cannot obtain lock: "id=72ec1321-a114-451f-bee1-6790cbca1bc6, rc=1, out=['error - lease file does not exist or is not writeable', 'usage: /usr/libexec/vdsm/spmprotect.sh COMMAND PARAMETERS', 'Commands:', ' start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries }', 'Parameters:', ' sdUUID - domain uuid', ' hostId - host id in pool', ' renewal_interval_sec - intervals for lease renewals attempts', ' lease_path - path to lease file/volume', ' offset - offset of lease within file', ' lease_time_ms - time limit within which lease must be renewed (at least 2*renewal_interval_sec)', ' io_op_timeout_ms - I/O operation timeout', ' fail_retries - Maximal number of attempts to retry to renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)'], err=[]" Jun 6 10:52:15 cougar12 kernel: ata1: hard resetting link Jun 6 10:52:16 cougar12 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jun 6 10:52:16 cougar12 kernel: ata1.00: configured for UDMA/133 Jun 6 10:52:16 cougar12 kernel: ata1: EH complete Jun 6 10:52:16 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state' Jun 6 10:52:16 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down Jun 6 10:52:16 cougar12 kernel: sd 11:0:0:1: rejecting I/O to offline device Jun 6 10:52:16 cougar12 kernel: ata5: soft resetting link Jun 6 10:52:16 cougar12 kernel: ata6: soft resetting link Jun 6 10:52:16 cougar12 kernel: ata5: EH complete Jun 6 10:52:16 cougar12 kernel: ata6: EH complete Jun 6 10:52:21 cougar12 multipathd: overflow in attribute '/sys/devices/platform/host11/session6/target11:0:0/11:0:0:1/state' Jun 6 10:52:21 cougar12 multipathd: 1Dafna-tiger-31367928: sde - directio checker reports path is down
Dafna, I saw a similar bug https://bugzilla.redhat.com/842146 with the same error "lease file does not exist or is not writeable". looking at comment 6 (https://bugzilla.redhat.com/show_bug.cgi?id=842146#c6) it seems that the reason for that was a permission issue. Was your export domain had permission of root? Do you think it is a duplicate of 842146?
I tried to reproduce the issue with vdsm-4.11.0-13.git44ecff5.el6.x86_64 sanlock-2.6-2.el6.x86_64 I succeeded to import and attach the EXPORT and ISO domains. from the messages.log it seems there were I/O errors which could probably cause the failure you had encountered (see comment 2). Closing the bug since it could not reproduce, feel free to open it when reproduce occurs