Bug 968293 - vdsm: can't create a vm from a template as thin copy when the template also has a copy on an inactive domain
vdsm: can't create a vm from a template as thin copy when the template also h...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
x86_64 Linux
unspecified Severity urgent
: ---
: 3.3.0
Assigned To: Liron Aravot
Elad
storage
: Regression, ZStream
Depends On:
Blocks: 973184
  Show dependency treegraph
 
Reported: 2013-05-29 07:57 EDT by Dafna Ron
Modified: 2016-02-10 15:37 EST (History)
14 users (show)

See Also:
Fixed In Version: is1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 973184 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (587.58 KB, application/x-gzip)
2013-05-29 07:57 EDT, Dafna Ron
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 13994 None None None Never

  None (edit)
Description Dafna Ron 2013-05-29 07:57:15 EDT
Created attachment 754316 [details]
logs

Description of problem:

I have two domains and both have a copy of the template. 
after blocking one of the domains in both my host and the domain became inactive I tried creating a vm from a template on the active domain. 
vdsm gets an error for getVolumeInfo with image does not exists. 
engine rolls back (we alert the user that vm is not created) and image is stuck in image locked (separate bug will be opened)

Version-Release number of selected component (if applicable):

vdsm-4.10.2-22.0.el6ev.x86_64
sf17.2

How reproducible:

100%

Steps to Reproduce:
1. in iscsi storage create two domains located on two different storage servers
2. create a template located on both domains
3. from both hosts, block connectivity to non-master domain
4. after the domain becomes inactive, try to create a vm from the template with thin provision copy on the active domain


Actual results:

we fail to create the vm 

Expected results:logs

Thread-44236::ERROR::2013-05-29 14:19:06,684::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 7414f930-bbdb-4ec6-8132-4640cbb3c722 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1270, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 1242, in findDomainPath
    raise se.StorageDomainDoesNotExist()
StorageDomainDoesNotExist: Storage domain does not exist: ()
Thread-44236::ERROR::2013-05-29 14:19:06,685::task::850::TaskManager.Task::(_setError) Task=`51c48dbd-018d-41dc-b732-976e35f6c129`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2980, in getVolumeInfo
    sdUUID=sdUUID).produceVolume(
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1270, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 1242, in findDomainPath
    raise se.StorageDomainDoesNotExist()
StorageDomainDoesNotExist: Storage domain does not exist: ()
Thread-44236::DEBUG::2013-05-29 14:19:06,686::task::869::TaskManager.Task::(_run) Task=`51c48dbd-018d-41dc-b732-976e35f6c129`::Task._run: 51c48dbd-018d-41dc-b732-976e35f6c129 ('7414f930-bbdb-4ec6-8132-4640cbb3c722', '7fd33b43-a9f4-4eb7-a885-e9583a929ceb', 'cd5f0e01-8f73-4aa0-b837-b7f9bf4bc333', 'f82a0d58-0791-4137-b1e6-22a8794acd2a') {} failed - stopping task



Additional info:
Comment 4 Allon Mureinik 2013-05-30 10:56:44 EDT
Edu, is this related to the pre-fetch patch?
Comment 5 Eduardo Warszawski 2013-06-05 10:00:50 EDT
(In reply to Allon Mureinik from comment #4)
> Edu, is this related to the pre-fetch patch?

No. Why do you think so?
Comment 6 Eduardo Warszawski 2013-06-05 10:09:21 EDT
The vg 7414f930-bbdb can't be found because the lun1Dafna-tiger-21367928  is not found.
This is probably related to a disconnect operation but the vdsm log provided is insufficient in order to determine this.

In spite of this _today_ (+7 days) we can reach this VG from the same host.

[root@cougar02 ~]# vgs -o vg_name,pv_name 7414f930-bbdb-4ec6-8132-4640cbb3c722
  VG                                   PV                               
  7414f930-bbdb-4ec6-8132-4640cbb3c722 /dev/mapper/1Dafna-tiger-21367928
[root@cougar02 ~]# date
Wed Jun  5 16:59:03 IDT 2013

#############################################################################
Thread-36779::DEBUG::2013-05-29 14:01:06,702::misc::83::Storage.Misc.excCmd::(<lambda>) u'/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \'a%1Dafna-32-021366808|1Dafna-32-031366808|1Dafna-tiger-11367928|1Dafna-tiger-21367928|1Dafna-tiger-31367928|1Dafna-upgrade-011368364%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 7414f930-bbdb-4ec6-8132-4640cbb3c722' (cwd None)

Thread-36779::DEBUG::2013-05-29 14:01:07,027::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = '  /dev/mapper/1Dafna-32-021366808: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-32-02136680
8: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-32-021366808: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-32-021366808 was disabled\n  /dev/mapper/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-upgrade-011368364 was disabled\n  /dev/mapper/1Dafna-tiger-21367928: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tiger-21367928: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-tiger-21367928: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-tiger-21367928 was disabled\n  /dev/mapper/1Dafna-tiger-11367928: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tiger-11367928: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-tiger-11367928: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-tiger-11367928 was disabled\n  /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-tiger-31367928 was disabled\n  Volume group "7414f930-bbdb-4ec6-8132-4640cbb3c722" not found\n'; <rc> = 5


Thread-36779::DEBUG::2013-05-29 14:19:18,521::misc::83::Storage.Misc.excCmd::(<lambda>) u'/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_
after_error_count=3 filter = [ \'a%1Dafna-32-021366808|1Dafna-32-031366808|1Dafna-tiger-11367928|1Dafna-tiger-21367928|1Dafna-tiger-31367928|1Dafna-upgrade-011368364%\', \'r%.*%\' ] }  global {  locking_type=1  prioritise_write_locks=1  
wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 7414f930-bbdb-4ec6-8132-4640c
bb3c722' (cwd None)

Thread-36779::DEBUG::2013-05-29 14:19:18,844::misc::83::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = '  /dev/mapper/1Dafna-32-021366808: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-32-02136680
8: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-32-021366808: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-32-02136
6808 was disabled\n  /dev/mapper/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/map
per/1Dafna-upgrade-011368364: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-upgrade-011368364 was disabled\n  /dev/mapper/1Dafna-tiger-21367928: read failed 
after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tiger-21367928: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-tiger-21367928: read failed after 0 of 4096 at 0: Input/outpu
t error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-tiger-21367928 was disabled\n  /dev/mapper/1Dafna-tiger-11367928: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tige
r-11367928: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/1Dafna-tiger-11367928: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Daf
na-tiger-11367928 was disabled\n  /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 107374174208: Input/output error\n 
 /dev/mapper/1Dafna-tiger-31367928: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/1Dafna-tiger-31367928 was disabled\n  Volume group "7414f930-bbdb-4ec6-8132-4640cb
b3c722" not found\n'; <rc> = 5

Thread-36779::ERROR::2013-05-29 14:19:18,849::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 7414f930-bbdb-4ec6-8132-4640cbb3c722 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1270, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 1242, in findDomainPath
    raise se.StorageDomainDoesNotExist()
StorageDomainDoesNotExist: Storage domain does not exist: ()
Thread-36779::ERROR::2013-05-29 14:19:18,849::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 7414f930-bbdb-4ec6-8132-4640cbb3c722 monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
    return getattr(self.getRealDomain(), attrName)
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1270, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 1242, in findDomainPath
    raise se.StorageDomainDoesNotExist()
StorageDomainDoesNotExist: Storage domain does not exist: ()
Comment 7 Eduardo Warszawski 2013-06-05 10:25:26 EDT
From the engine log we can see that the domain was reported "in problem" but not reconnected for cougar02.

##############################################################################
2013-05-29 11:20:00,007 INFO  [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (QuartzScheduler_Worker-52) [5fc4a867] Running command: ConnectDomainToStorageCommand internal: true. Entities affected :  ID: 7414f930-bbdb-
4ec6-8132-4640cbb3c722 Type: Storage
2013-05-29 11:20:00,007 INFO  [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (QuartzScheduler_Worker-52) [5fc4a867] ConnectDomainToStorage. Before Connect all hosts to pool. Time:5/29/13 11:20 AM
2013-05-29 11:20:00,040 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-4-thread-50) [5fc4a867] START, ConnectStorageServerVDSCommand(HostName = cougar01, HostId = 4497d431-7c5e-4924-96e0-3f9cdbf826
e5, storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, storageType = ISCSI, connectionList = [{ id: 812e5036-b4ab-4038-9bfd-9c8d1d163724, connection: 10.35.64.106, iqn: Dafna-tiger-2, vfsType: null, mountOptions: null, nfsVersion: nul
l, nfsRetrans: null, nfsTimeo: null };]), log id: d58d796
2013-05-29 11:20:00,064 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-4-thread-47) [5fc4a867] START, ConnectStorageServerVDSCommand(HostName = cougar02, HostId = a4af81ea-c847-4c9c-885e-cb763d5bf9
ea, storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, storageType = ISCSI, connectionList = [{ id: 812e5036-b4ab-4038-9bfd-9c8d1d163724, connection: 10.35.64.106, iqn: Dafna-tiger-2, vfsType: null, mountOptions: null, nfsVersion: nul
l, nfsRetrans: null, nfsTimeo: null };]), log id: d6e4f53
2013-05-29 11:20:00,158 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-4-thread-50) [5fc4a867] FINISH, ConnectStorageServerVDSCommand, return: {812e5036-b4ab-4038-9bfd-9c8d1d163724=0}, log id: d58d
796
2013-05-29 11:20:00,171 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-4-thread-47) [5fc4a867] FINISH, ConnectStorageServerVDSCommand, return: {812e5036-b4ab-4038-9bfd-9c8d1d163724=0}, log id: d6e4
f53
2013-05-29 11:20:00,171 INFO  [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (QuartzScheduler_Worker-52) [5fc4a867] ConnectDomainToStorage. After Connect all hosts to pool. Time:5/29/13 11:20 AM
Comment 8 Eduardo Warszawski 2013-06-05 10:38:57 EDT
Moving to engine in order to investigate if there is a better way for dealing with this situation.
Comment 11 Liron Aravot 2013-06-09 09:44:56 EDT
BTW, forgot to add in my last comment - the exception that occured here is actually an NPE, which was caused by the fact that GetImageInfo may return null as it's return value and not throw an exception - in that case the calling method (that was removed in the provided patch) has no null check.
Comment 14 Elad 2013-07-17 10:21:12 EDT
Creation of VM from template (thin copy) completed successfully, when there was a copy of the template on a blocked data domain

Verified on RHEVM3.3 - is5:
vdsm-4.11.0-121.git082925a.el6.x86_64
rhevm-3.3.0-0.7.master.el6ev.noarch
Comment 15 Itamar Heim 2014-01-21 17:22:13 EST
Closing - RHEV 3.3 Released
Comment 16 Itamar Heim 2014-01-21 17:27:19 EST
Closing - RHEV 3.3 Released

Note You need to log in before you can comment on or make changes to this bug.