Created attachment 609408 [details] logs Description of problem: In a setup where the domains are made out of luns from different storage's (extended) I removed my host's from one of the Storage's access list. after about 2 hours I restored the access list and the hosts can see the storage and yet we do not recover - vdsm log still shows that the domains are inaccessible after vdsm restart we cannot see some of the device and spm becomes non-operational Version-Release number of selected component (if applicable): si16 vdsm-4.9.6-31.0.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: 1. create domains which have luns from different storage servers 2. remove the hosts from one of the storage's access list's 3. Actual results: we do not recover when hosts are added back to the storage access list although running vgs I can see that we are seeing the luns. Expected results: we should be able to recover Additional info: VG #PV #LV #SN Attr VSize VFree 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz--n- 352.38g 241.50g 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz--n- 416.62g 412.75g b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 352.75g bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g vg0 1 3 0 wz--n- 136.24g 0 VG #PV #LV #SN Attr VSize VFree PV 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c VG #PV #LV #SN Attr VSize VFree PV bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e2 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e7 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e8 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002ea bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032d bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032e bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580030f bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c5695800310 hread-9850::DEBUG::2012-09-03 17:25:24,204::vm::739::vm.Vm::(_lvExtend) vmId=`689e4e9b-19cf-44f1-b014-cdcf17687c4e`::b1f82e00-4647-4df0-8190-6ba4e811f5a5/f948e894-8631-4fdb-81 8a-4e9b618e90dc (hda): apparentsize 2048 req 3072 Dummy-119::DEBUG::2012-09-03 17:25:24,361::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/inbox iflag =direct,fullblock count=1 bs=1024000' (cwd None) Thread-50::ERROR::2012-09-03 17:25:24,408::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain bc7fde7a-4d43-4dd4-874a-bff5ca517bae monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('bc7fde7a-4d43-4dd4-874a-bff5ca517bae',) Thread-48::ERROR::2012-09-03 17:25:24,410::domainMonitor::191::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 0dc1433f-72e6-4b62-9845-dc022a191f4f monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 169, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/blockSD.py", line 714, in selftest raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('0dc1433f-72e6-4b62-9845-dc022a191f4f',) Dummy-119::DEBUG::2012-09-03 17:25:24,510::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0778565 s, 13.2 MB/s\n'; <rc> = 0 Dummy-119::DEBUG::2012-09-03 17:25:24,511::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 1 validated, checking mail Dummy-119::DEBUG::2012-09-03 17:25:24,515::storage_mailbox::580::Storage.MailBox.SpmMailMonitor::(_handleRequests) SPM_MailMonitor: Mailbox 2 validated, checking mail Dummy-119::DEBUG::2012-09-03 17:25:24,525::__init__::1164::Storage.Misc.excCmd::(_log) 'dd of=/rhev/data-center/f570527f-004a-4cab-8bee-129fa589bec5/mastersd/dom_md/outbox oflag=direct iflag=fullblock conv=notrunc count=1 bs=1024000' (cwd None) Dummy-119::DEBUG::2012-09-03 17:25:24,629::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0705315 s, 14.5 MB/s\n'; <rc> = 0 after vdsm restart the hosts cannot see devices on some of the domains and spm becomes non-operational: vgs -o+pv_name VG #PV #LV #SN Attr VSize VFree PV 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g /dev/mapper/1Dafna-si16-011346574 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800315 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800332 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g /dev/mapper/3514f0c5695800333 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g unknown device 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g unknown device 0dc1433f-72e6-4b62-9845-dc022a191f4f 7 59 0 wz-pn- 352.38g 241.50g unknown device 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032a 6a1f9f02-2f53-4d67-8f6a-567c0d01777c 7 6 0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c569580032c 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/1Dafna-si16-021346574 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e3 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e6 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e4 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c56958002e5 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800330 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g /dev/mapper/3514f0c5695800331 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g unknown device 7d1dd321-16a3-4557-99c2-7d524b60f33f 9 6 0 wz-pn- 416.62g 412.75g unknown device b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ee b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ef b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f3 b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f2 b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f0 b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f1 b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800327 b1f82e00-4647-4df0-8190-6ba4e811f5a5 8 23 0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c5695800328 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e2 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e7 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002e8 bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c56958002ea bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032d bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g /dev/mapper/3514f0c569580032e bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g unknown device bc7fde7a-4d43-4dd4-874a-bff5ca517bae 9 7 0 wz-pn- 416.62g 410.75g unknown device vg0 1 3 0 wz--n- 136.24g 0 /dev/sda2
There are 7 devices listed as unknown in the output above so clearly multipath has not recovered these paths yet, making the domains partial (only a subset of the disks are accessible). Closing as dup of: 854140 *** This bug has been marked as a duplicate of bug 854140 ***
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Haim, can you retest this as well? after the changes in getDeviceList it could be fixed.
We need to retest. no extra info needed here, removing the"needinfo" flag and taking it for verification.
After mapping hosts back to LUN on the storage server, vdsm is able to activate the domain again. Verified on RHEVM3.3-IS8 vdsm-4.12.0-rc3.13.git06ed3cc.el6ev.x86_64 rhevm-3.3.0-0.13.master.el6ev.noarch
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0040.html