Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 726949

Summary:

VDSM:Prepare volume fail (after connection to iSCSI storage is blocked and unblocked)

Product:

Red Hat Enterprise Linux 6

Reporter:

Evgeniy German <egerman>

Component:

vdsm

Assignee:

Eduardo Warszawski <ewarszaw>

Status:

CLOSED ERRATA

QA Contact:

Tomas Dosek <tdosek>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.1

CC:

abaron, bazulay, danken, iheim, ilvovsky, jlibosva, smizrahi, tdosek, yeylon, ykaul

Target Milestone:

Keywords:

TestBlocker

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

vdsm-4.9-92.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-12-06 07:36:03 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vdsm log	none
vdsm, spm-lock, metadata logs	none

Description Evgeniy German 2011-07-31 11:04:22 UTC

Description of problem:
After block connection to 10 min and unblock connection (from host to storage)
The action prepareVolume was fail.


clientIFinit::INFO::2011-07-31 10:53:57,539::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareVolume, args: ( sdUUID=b07eb16c-66e3-4408-9e6e-71afd174edc7 spUUID=28c0b162-6c21-4165-9354-f6c4eb63333a imgUUID=d24ad0a9-edda-424d-9b7a-1bfbd531ba67 volUUID=1962ce6d-cf7c-44c5-bd14-9877d2b7ecb2)
clientIFinit::DEBUG::2011-07-31 10:53:57,539::task::492::TaskManager.Task::(_debug) Task efc04cca-719a-4c4d-b647-413d0d49da4e: moving from state init -> state preparing
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,593::lvm::371::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,595::lvm::463::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,595::persistentDict::204::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=[u'CLASS=Data', u'VERSION=2', u'SDUUID=95d9264e-2578-4c2e-ae17-f760f67a1261', u'TYPE=ISCSI', u'DESCRIPTION=ISCSIDataDomain2', u'PV0=pv:1987653345,uuid:HGlHMt-SOEr-lrJA-LzPv-SUHQ-2rFt-sAUzTK,pestart:0,pecount:398,mapoffset:0', u'LEASERETRIES=3', u'VGUUID=tbFpPV-ihCO-f3N6-6wbv-X6jn-uoaS-3dm1KE', u'LOCKRENEWALINTERVALSEC=5', u'LOCKPOLICY=', u'POOL_UUID=28c0b162-6c21-4165-9354-f6c4eb63333a', u'LEASETIMESEC=60', u'IOOPTIMEOUTSEC=10', u'POOL_SPM_ID=-1', u'MASTER_VERSION=3', u'ROLE=Master', u'POOL_SPM_LVER=-1', u'POOL_DESCRIPTION=TestDataCenter', u'POOL_DOMAINS=b07eb16c-66e3-4408-9e6e-71afd174edc7:Active,95d9264e-2578-4c2e-ae17-f760f67a1261:Active', u'_SHA_CKSUM=1b6efa75d9bae6bd4d2d2f028809536af569b064']
clientIFinit::DEBUG::2011-07-31 10:53:57,596::resourceManager::154::ResourceManager.Request::(__init__) ResName=`Storage.b07eb16c-66e3-4408-9e6e-71afd174edc7`ReqID=`8619a893-81c4-4f19-926b-35ebe32e054c`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1668' at 'public_prepareVolume'
clientIFinit::DEBUG::2011-07-31 10:53:57,597::resourceManager::467::ResourceManager::(registerResource) Trying to register resource 'Storage.b07eb16c-66e3-4408-9e6e-71afd174edc7' for lock type 'shared'
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,597::lvm::551::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,597::lvm::553::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
clientIFinit::DEBUG::2011-07-31 10:53:57,598::resourceManager::508::ResourceManager::(registerResource) Resource 'Storage.b07eb16c-66e3-4408-9e6e-71afd174edc7' is free. Now locking as 'shared' (1 active user)
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,598::lvm::562::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
clientIFinit::DEBUG::2011-07-31 10:53:57,598::resourceManager::191::ResourceManager.Request::(grant) ResName=`Storage.b07eb16c-66e3-4408-9e6e-71afd174edc7`ReqID=`8619a893-81c4-4f19-926b-35ebe32e054c`::Granted request
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,599::lvm::574::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
clientIFinit::DEBUG::2011-07-31 10:53:57,599::task::492::TaskManager.Task::(_debug) Task efc04cca-719a-4c4d-b647-413d0d49da4e: _resourcesAcquired: Storage.b07eb16c-66e3-4408-9e6e-71afd174edc7 (shared)
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,599::persistentDict::163::Storage.PersistentDict::(__init__) Created a persistant dict with LvMetadataRW backend
clientIFinit::DEBUG::2011-07-31 10:53:57,600::task::492::TaskManager.Task::(_debug) Task efc04cca-719a-4c4d-b647-413d0d49da4e: ref 1 aborting False
b19b1ddb-adda-477f-bfb9-e6423f332d6f::DEBUG::2011-07-31 10:53:57,600::blockSD::151::Storage.Misc.excCmd::(readlines) '/bin/dd iflag=direct skip=0 bs=2048 if=/dev/95d9264e-2578-4c2e-ae17-f760f67a1261/metadata count=1' (cwd None)
clientIFinit::DEBUG::2011-07-31 10:53:57,600::resourceManager::154::ResourceManager.Request::(__init__) ResName=`b07eb16c-66e3-4408-9e6e-71afd174edc7_imageNS.d24ad0a9-edda-424d-9b7a-1bfbd531ba67`ReqID=`ccc6755f-983d-4fd5-b688-dd0c385d727f`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1672' at 'public_prepareVolume'
clientIFinit::DEBUG::2011-07-31 10:53:57,604::resourceManager::467::ResourceManager::(registerResource) Trying to register resource 'b07eb16c-66e3-4408-9e6e-71afd174edc7_imageNS.d24ad0a9-edda-424d-9b7a-1bfbd531ba67' for lock type 'exclusive'
clientIFinit::WARNING::2011-07-31 10:53:57,605::resourceManager::500::ResourceManager::(registerResource) Resource factory failed to create resource 'b07eb16c-66e3-4408-9e6e-71afd174edc7_imageNS.d24ad0a9-edda-424d-9b7a-1bfbd531ba67'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 498, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 163, in createResource
    volResourcesList = self.__getResourceCandidatesList(resourceName, lockType)
  File "/usr/share/vdsm/storage/resourceFactories.py", line 111, in __getResourceCandidatesList
    chain = image.Image(repoPath).getChain(sdUUID=self.sdUUID, imgUUID=resourceName)
  File "/usr/share/vdsm/storage/image.py", line 312, in getChain
    uuidlist = volclass.getImageVolumes(self.repoPath, sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 502, in getImageVolumes
    return lvm.lvsByTag(sdUUID, "%s%s" % (TAG_PREFIX_IMAGE, imgUUID))
  File "/usr/share/vdsm/storage/lvm.py", line 1168, in lvsByTag
    return [lv.name for lv in getLV(vg) if tag in lv.tags]
  File "/usr/share/vdsm/storage/lvm.py", line 815, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))


Version-Release number of selected component (if applicable):
Setup:
1)Two ISCSI storage domains on the same server
2)One vdsm host 

Steps to Reproduce:
1.Block connection from host to storage (using iptables command)
2.wait 10 min
3.unblock connection 
4.Activate master storage domain
  
Additional info:
full log is attached.

Almost all storage negative tests using iptables,because of this issue its a test blocker for our tests.

Comment 1 Evgeniy German 2011-07-31 11:05:21 UTC

Created attachment 516003 [details]
vdsm log

Comment 6 Tomas Dosek 2011-08-25 10:16:37 UTC

Verified - vdsm-4.9-95.el6 - above specified scenario no longer reproduces on current build. Master domain was properly activated and datacenter's status was set to up.

Comment 7 Jakub Libosvar 2011-09-05 12:58:10 UTC

Re-opening, reproducible in vdsm-4.9-96.el6.x86_64.

Scenario:
1. Have two hosts, one data-domain
2. Block storage using iptables on both hosts
3. After SPM goes non-operational and DC goes to maintenance, delete blocking iptables rules on both host.
4. Activate Non-operational host and master domain.

----
Former HSM cannot see iscsi lun, iscsi session is active but block device is not:

[root@srh-11 ~]# iscsiadm -m session
tcp: [11] 10.34.63.202:3260,1 iqn.2011-01.brq.str-01:jlibosva
[root@srh-11 ~]# multipath -ll
1jlibosva2 dm-3 RH-str01,VIRTUAL-DISK
size=200G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 14:0:0:2 sdd 8:48 failed faulty running
1jlibosva1 dm-4 RH-str01,VIRTUAL-DISK
size=200G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 14:0:0:1 sdb 8:16 failed faulty running


Note that this faulty host was SPM for a while even when it had blocked connection to storage.


service vdsmd restart solves the problem.

Not sure if this is the same bug or regression since there is no prepareVolume task in logs, but steps to reproduce seems very similar.

Comment 8 Jakub Libosvar 2011-09-05 12:58:34 UTC

Created attachment 521499 [details]
vdsm, spm-lock, metadata logs

Comment 9 Saggi Mizrahi 2011-09-06 12:39:00 UTC

You can clearly see that in the log lvm has read failures from all drives. This means even though you turned it on something in the way didn't pick that up.

I suggest you pick this up with lvm and then continue downstack until you find the culprit.

log excrept:

Thread-137::DEBUG::2011-09-05 14:50:42,110::lvm::374::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"a%1IET_0005000d|1jlibosva1|1jlibosva2%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 24bc5ef6-8a37-48f1-a6fc-6b0a4e3140b5' (cwd None)

Thread-137::DEBUG::2011-09-05 14:50:42,240::lvm::374::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  /dev/mapper/1jlibosva2: read failed after 0 of 4096 at 214748299264: Input/output error\n  /dev/mapper/1jlibosva2: read failed after 0 of 4096 at 214748356608: Input/output error\n  /dev/mapper/1jlibosva2: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1jlibosva2 was disabled\n  /dev/mapper/1jlibosva1: read failed after 0 of 4096 at 214748299264: Input/output error\n  /dev/mapper/1jlibosva1: read failed after 0 of 4096 at 214748356608: Input/output error\n  /dev/mapper/1jlibosva1: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1jlibosva1 was disabled\n  /dev/mapper/1IET_0005000d: read failed after 0 of 4096 at 16106061824: Input/output error\n  /dev/mapper/1IET_0005000d: read failed after 0 of 4096 at 16106119168: Input/output error\n  /dev/mapper/1IET_0005000d: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1IET_0005000d was disabled\n  Volume group "24bc5ef6-8a37-48f1-a6fc-6b0a4e3140b5" not found\n'; <rc> = 5

Comment 10 Jakub Libosvar 2011-09-06 19:17:19 UTC

After a few hours, devices were still faulty - I'd say iscsid doesn't refresh the luns, I used manually command 'iscsiadm -m session -R' to rescan and it solved the problem. If you think this is not vdsm's responsibility but iscsid I'm moving this bug back to verified.

Comment 11 errata-xmlrpc 2011-12-06 07:36:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html