Description of problem: LUN addition to existing iSCSI storage domain (according to https://access.redhat.com/site/solutions/17434 [Option 2]) failed and web admin portal shows non-senses about the affected storage domain. Version-Release number of selected component (if applicable): rhevm-3.2.0-11.30.el6ev.x86_64 vdsm-4.10.2-22.0.el6ev.x86_64 (RHEV-M 3.2.0-11.30.el6ev and most up-to-date RHEL hypervisors.) How reproducible: ? I don't have another cluster to destroy. Steps to Reproduce/What I did: Follow https://access.redhat.com/site/solutions/17434 [Option 2]: To expand storage by adding a new LUN to an existing storage domain: Create a new LUN on the SAN Edit the Storage Domain Make sure the new LUN is discovered (In case of iSCSI, the Discovery and Login steps described in the Admin Guide should be followed) Add the LUN to the existing Storage Domain. This will make the Storage Domain grow by the size of the added LUN. Actual results: Messages in web admin portal were: * Storage Domain vm1-belka was updated by admin@internal * Data Center is being initialized, please wait for initialization to complete. * Storage Pool Manager runs on Host laika (Address: laika...). (^^^ SPM jumped to the other hypervisor. It was running on host "belka..." before the storage expansion.) * Failed to extend Storage Domain vm1-belka. (Timestamp for log correlation: 2013-Jun-12, 13:15) Observation from web admin portal: - Storage domain values 'Size' and 'Available' are bigger. - Storage domain properties in web admin do not show the new LUN as used: There is still LUN id with un-checked box, see the attached image. Observation from hypervisor: - VG scan on one of hypervisors shows this: # vgs VG #PV #LV #SN Attr VSize VFree bd06c2cb-3246-4e1b-8d73-737cc165e7a5 2 33 0 wz--n- 478.75g 311.62g There are two PVs but the web admin shows only one! - Pvdisplay shows the same information: # pvdisplay --- Physical volume --- PV Name /dev/mapper/1belka-vm1 VG Name bd06c2cb-3246-4e1b-8d73-737cc165e7a5 PV Size 200.00 GiB / not usable 384.00 MiB Allocatable yes PE Size 128.00 MiB Total PE 1597 Free PE 260 Allocated PE 1337 PV UUID jSYina-zLmW-om8z-q761-LuVC-UZT7-Pg9CSV --- Physical volume --- PV Name /dev/mapper/1belka-vm2 VG Name bd06c2cb-3246-4e1b-8d73-737cc165e7a5 PV Size 279.40 GiB / not usable 278.26 MiB Allocatable yes PE Size 128.00 MiB Total PE 2233 Free PE 2233 Allocated PE 0 PV UUID M9gWXH-yXvm-UgSt-TwVe-476y-nm5L-FHZcc4 Both PVs are used by VG bd06c2cb-3246-4e1b-8d73-737cc165e7a5. Expected results: Storage domain is extended properly :-) Additional info: Ping me if you want to access the machines directly.
Created attachment 760115 [details] Used LUN is shown as unused in Web Admin Portal The red arrow on picture used_lun_shown_as_unused.png points to the problematic LUN. I attempted to add this LUN to storage domain, but Web Admin thrown an error. The checkbox is unchecked but LVM thinks that the PV representing the LUN is used.
Looks like vdsm doesn't get us TraceBack for unexpected error. it's pity. Thread-109::ERROR::2013-06-12 13:15:34,522::task::850::TaskManager.Task::(_setError) Task=`921429b7-2ee8-4275-91d0-7f4c6e4b8086`::Unexpected error Thread-109::DEBUG::2013-06-12 13:15:34,525::task::869::TaskManager.Task::(_run) Task=`921429b7-2ee8-4275-91d0-7f4c6e4b8086`::Task._run: 921429b7-2ee8-4275-91d0-7f4c6e4b8086 ('bd06c2cb-3246-4e1b-8d73-737cc165e7a5', 'bb2b1888-ac09-41a8-ba53-5b86aa3417a2', ['1belka-vm2'], False) {} failed - stopping task
Human readable dm names are used: Thread-19::DEBUG::2013-06-12 13:15:41,400::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1belka-vm1|1belka-vm2|1laika-vm1|1laika-vm2|35000cca03ca52d20%\', \'r%.*%\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free bd06c2cb-3246-4e1b-8d73-737cc165e7a5' (cwd None) Thread-19::DEBUG::2013-06-12 13:15:41,620::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ' Found duplicate PV M9gWXHyXvmUgStTwVe476ynm5LFHZcc4: using /dev/mapper/1belka-vm2 not /dev/mapper/35000cca03ca52d20p1\n'; <rc> = 0
Interesting. These 'human readable names' were produced by RHEL6 iscsi initiator in default configuration, without any modification to /etc/multipath.conf etc.
Pavel, can you attach a LC from this setup ? ooc, does this device '35000cca03ca52d20' have a partition ('....not /dev/mapper/35000cca03ca52d20p1') ?
/dev/mapper/35000cca03ca52d20 is a physical disk and /dev/mapper/35000cca03ca52d20p1 was (the only one) partition on that disk. This partition was exported via /etc/tgtd/targets.conf as LUN 2 on iSCSI target belka.idm.lab.eng.brq.redhat.com:belka-vm1. Excerpt from /etc/tgtd/targets.conf: <target belka.idm.lab.eng.brq.redhat.com:belka-vm1> <backing-store /dev/vg_belka/lv_vm1> lun 1 scsi_sn 7614adbc6e528292 scsi_id belka-vm1 </backing-store> <backing-store /dev/disk/by-id/scsi-35000cca03ca52d20-part1> lun 1 scsi_sn 5000cca03ca52d2001 scsi_id belka-vm2 </backing-store> initiator-address 10.34.47.1 initiator-address 10.34.47.2 </target> AFAIK the same partition was accessible via /dev/mapper/35000cca03ca52d20p1 and /dev/mapper/1belka-vm2. Is this the root cause of the problem?
Would it help if I force LVM to use only /dev/sd.* names? For example, I can use this LVM filter (/etc/lvm/lvm.conf): filter = [ "a|^/dev/sd.*$|", "r/.*/" ] I.e. LVM will use iSCSI devices only via names /dev/sdb, /dev/sdc etc.
If I am referring correct time-stamp: pvcreate was tried on /dev/mapper/1laika-vm1 /dev/mapper/1belka-vm2 /dev/mapper/1belka-vm1 when 2 of the pvs were already engaged ?? : ( I am not sure why we want to try pvcreate like that) physical volume "/dev/mapper/1laika-vm1" of volume group "c67449cb-e5ec-4342-9c56-af1a67cf5868" && physical volume "/dev/mapper/1belka-vm1" of volume group "bd06c2cb-3246-4e1b-8d73-737cc165e7a5" Thread-107::DEBUG::2013-06-12 13:02:35,326::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm pvcreate --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1belka-vm1|1belka-vm2|1laika-vm1|35000cca03ca52d20%\', \'r%.*%\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --test --metadatasize 128m --metadatacopies 2 --metadataignore y /dev/mapper/1laika-vm1 /dev/mapper/1belka-vm2 /dev/mapper/1belka-vm1' (cwd None) Which made warning from LVM : Thread-107::DEBUG::2013-06-12 13:02:35,537::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.\n Found duplicate PV pCGBIojdpiPc8tl7qwJZ2aqSPjY1Y3uw: using /dev/mapper/1belka-vm2 not /dev/mapper/35000cca03ca52d20p1\n Can\'t initialize physical volume "/dev/mapper/1laika-vm1" of volume group "c67449cb-e5ec-4342-9c56-af1a67cf5868" without -ff\n Can\'t initialize physical volume "/dev/mapper/1belka-vm1" of volume group "bd06c2cb-3246-4e1b-8d73-737cc165e7a5" without -ff\n'; <rc> = 5 Thread-107::DEBUG::2013-06-12 13:02:35,539::lvm::809::Storage.LVM::(testPVCreate) rc: 5, out: [' Physical volume "/dev/mapper/1belka-vm2" successfully created'], err: [' TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.', ' Found duplicate PV pCGBIojdpiPc8tl7qwJZ2aqSPjY1Y3uw: using /dev/mapper/1belka-vm2 not /dev/mapper/35000cca03ca52d20p1', ' Can\'t initialize physical volume "/dev/mapper/1laika-vm1" of volume group "c67449cb-e5ec-4342-9c56-af1a67cf5868" without -ff', ' Can\'t initialize physical volume "/dev/mapper/1belka-vm1" of volume group "bd06c2cb-3246-4e1b-8d73-737cc165e7a5" without -ff'], unusedDevs: set(['/dev/mapper/1belka-vm2']), usedDevs: set(['/dev/mapper/1belka-vm1', '/dev/mapper/1laika-vm1']) How-ever it looks like : vgextend worked fine: Thread-109::DEBUG::2013-06-12 13:15:33,632::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm vgextend --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1belka-vm1|1belka-vm2|1laika-vm1|1laika-vm2|35000cca03ca52d20%\', \'r%.*%\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " bd06c2cb-3246-4e1b-8d73-737cc165e7a5 /dev/mapper/1belka-vm2' (cwd None) Thread-109::DEBUG::2013-06-12 13:15:33,924::lvm::921::Storage.LVM::(extendVG) Cache after extending vg {'bd06c2cb-3246-4e1b-8d73-737cc165e7a5': Stub(name='bd06c2cb-3246-4e1b-8d73-737cc165e7a5', stale=True)} Thread-109::DEBUG::2013-06-12 13:15:33,925::persistentDict::167::Storage.PersistentDict::(transaction) Starting transaction Thread-109::INFO::2013-06-12 13:15:33,925::blockSD::779::Storage.StorageDomain::(readMetadataMapping) META MAPPING: {u'PV0': {u'uuid': u'jSYina-zLmW-om8z-q761-LuVC-UZT7-Pg9CSV', u'pestart': u'0', u'mapoffset': u'0', u'pecount': u'1597', 'guid': u'1belka-vm1'}} Thread-109::DEBUG::2013-06-12 13:15:34,148::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/lvm pvs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1belka-vm1|1belka-vm2|1laika-vm1|1laika-vm2|35000cca03ca52d20%\', \'r%.*%\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/1belka-vm2' (cwd None) .... how-ever, later I can see a rollback becuase of an attributeError in persisstentDict: Thread-109::WARNING::2013-06-12 13:15:34,340::persistentDict::178::Storage.PersistentDict::(transaction) Error in transaction, rolling back changes Traceback (most recent call last): File "/usr/share/vdsm/storage/persistentDict.py", line 170, in transaction yield File "/usr/share/vdsm/storage/blockSD.py", line 669, in updateMapping mapping = self.getMetaDataMapping(self.sdUUID, self.readMetadataMapping()) File "/usr/share/vdsm/storage/blockSD.py", line 620, in getMetaDataMapping pvlist = lvm.listPVNames(vgName) File "/usr/share/vdsm/storage/lvm.py", line 1262, in listPVNames vgPVs = [pv.name for pv in pvs if pv.vg_name == vgName] AttributeError: 'Stub' object has no attribute 'vg_name' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *** Thread-109::DEBUG::2013-06-12 13:15:34,521::lvm::402::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-109::ERROR::2013-06-12 13:15:34,522::task::850::TaskManager.Task::(_setError) Task=`921429b7-2ee8-4275-91d0-7f4c6e4b8086`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 41, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 684, in extendStorageDomain pool.extendSD(sdUUID, devlist, force) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 2018, in extendSD sdCache.produce(sdUUID).extend(devlist, force) File "/usr/share/vdsm/storage/blockSD.py", line 696, in extend self.updateMapping() File "/usr/share/vdsm/storage/blockSD.py", line 675, in updateMapping del self._metadata[key] File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__ self.gen.throw(type, value, traceback) File "/usr/share/vdsm/storage/persistentDict.py", line 170, in transaction yield File "/usr/share/vdsm/storage/blockSD.py", line 669, in updateMapping mapping = self.getMetaDataMapping(self.sdUUID, self.readMetadataMapping()) File "/usr/share/vdsm/storage/blockSD.py", line 620, in getMetaDataMapping pvlist = lvm.listPVNames(vgName) File "/usr/share/vdsm/storage/lvm.py", line 1262, in listPVNames vgPVs = [pv.name for pv in pvs if pv.vg_name == vgName] AttributeError: 'Stub' object has no attribute 'vg_name' Where PV_FIELDS are uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size"
We introduced a change that updates the list of luns in the domain upon activation of the domain, so deactivating and reactivating the domain should solve the issue. Note however that this is only good for domains V2 and up. This will not be fixed for older version domains (which includes export domains). The issue there is that although the luns would be updated the metadata mapping would not.
Hi all, I probably missing something here but - executing the same steps [option2] using is12 works great even without reactivating. everything was updated automatically with success. [option2] - from https://access.redhat.com/site/solutions/17434 ====== 1. Create a new LUN on the SAN 2. Edit the Storage Domain 3. Make sure the new LUN is discovered (In case of iSCSI, the Discovery and Login steps described in the Admin Guide should be followed) 4. Add the LUN to the existing Storage Domain. This will make the Storage Domain grow by the size of the added LUN. please let me know if you think I did something wrong, else I will mark as verified.
you could make getMetaDataMapping fail (after vgextend) so that you'd get the same end result. The change that was introduced allows the user to get out of this situation.
Verified on is13 vs is12. Exact steps to reproduce: * create an iscsi non-master storage domain * modify /usr/share/vdsm/storage/blockSD.py by adding line: raise se.MetaDataMappingError("testing") right after 'pvlist = list(lvm.listPVNames(vgName))' in getMetaDataMapping function * restart vdsm * try to extend the storage domain (it fails) * revert changes in vdsm * restart vdsm In is12 even after restarting vdsm and deactivating and reactivating the storage domain it was in the incorrect state (it was extended, but RHEV-M showed the added LUN as not used by any domain, however when I wanted to add it to a domain, I got a warning that the LUN is used by a VG). In is13 after deactivating and reactivating the storage domain the information shown by RHEV-M was correct (LUN was shown as used by the storage domain we extended).
Closing - RHEV 3.3 Released