Created attachment 899074 [details] vdsm+engine logs Description of problem: Creating a template always prompts a failure concerning cache memory, it seems that one of the threads clears the cache right before vdsm tries to find it in order to complete the operation. Thread-76::ERROR::2014-05-25 15:07:35,030::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain ceade227-c2df-4f82-b77d-30ce8075dacc Thread-76::ERROR::2014-05-25 15:07:35,030::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain ceade227-c2df-4f82-b77d-30ce8075dacc Thread-76::DEBUG::2014-05-25 15:07:35,030::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-76::DEBUG::2014-05-25 15:07:35,031::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/360060160f4a030007aeed85291dbe311|/dev/mapper/360060160f4a030007beed85291dbe311|/dev/mapper/360060160f4a030007ceed85291dbe311|/dev/mapper/360060160f4a030007deed85291dbe311|/dev/mapper/360060160f4a030007eeed85291dbe311|/dev/mapper/360060160f4a03000fa65675991dbe311|/dev/mapper/360060160f4a03000fb65675991dbe311|/dev/mapper/360060160f4a03000fc65675991dbe311|/dev/mapper/360060160f4a03000fd65675991dbe311|/dev/mapper/360060160f4a03000fe65675991dbe311|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name ceade227-c2df-4f82-b77d-30ce8075dacc' (cwd None) Thread-76::DEBUG::2014-05-25 15:07:35,454::lvm::296::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' Volume group "ceade227-c2df-4f82-b77d-30ce8075dacc" not found\n Skipping volume group ceade227-c2df-4f82-b77d-30ce8075dacc\n'; <rc> = 5 Thread-76::WARNING::2014-05-25 15:07:35,459::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' Volume group "ceade227-c2df-4f82-b77d-30ce8075dacc" not found', ' Skipping volume group ceade227-c2df-4f82-b77d-30ce8075dacc'] Thread-76::DEBUG::2014-05-25 15:07:35,459::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-76::ERROR::2014-05-25 15:07:35,466::sdc::143::Storage.StorageDomainCache::(_findDomain) domain ceade227-c2df-4f82-b77d-30ce8075dacc not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('ceade227-c2df-4f82-b77d-30ce8075dacc',) Thread-76::ERROR::2014-05-25 15:07:35,474::task::866::TaskManager.Task::(_setError) Task=`efae12d0-1fcb-4d73-8e1e-c073a13fcd89`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3092, in getVolumeInfo sdUUID=sdUUID).produceVolume(imgUUID=imgUUID, File "/usr/share/vdsm/storage/sdc.py", line 98, in produce domain.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('ceade227-c2df-4f82-b77d-30ce8075dacc',) Thread-76::DEBUG::2014-05-25 15:07:35,476::task::885::TaskManager.Task::(_run) Task=`efae12d0-1fcb-4d73-8e1e-c073a13fcd89`::Task._run: efae12d0-1fcb-4d73-8e1e-c073a13fcd89 ('ceade227-c2df-4f82-b77d-30ce8075dacc', 'Error', 'using', 'command:') {} failed - stopping task ** in the end operation succeeds.** Version-Release number of selected component (if applicable): vdsm-4.14.7-3.el6ev.x86_64 rhevm-3.4.0-0.21.el6ev.noarch How reproducible: 25% approximately 1 out of 4 Steps to Reproduce: 1.create a template Actual results: An Error appears on logs, operation is a success Expected results: The Error shouldn't appear Additional info:
This seems like the SD cache oddity again. Fede - can we finally get rid of this?
Engine address was 10.35.163.66. The request mentioned in the bug description came from the same ip of the vdsm host: 10.35.116.2 Thread-76::DEBUG::2014-05-25 15:07:35,026::BindingXMLRPC::251::vds::(wrapper) client [10.35.116.2] Thread-76::DEBUG::2014-05-25 15:07:35,027::task::595::TaskManager.Task::(_updateState) Task=`efae12d0-1fcb-4d73-8e1e-c073a13fcd89`::moving from state init -> state preparing Thread-76::INFO::2014-05-25 15:07:35,027::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeInfo(sdUUID='ceade227-c2df-4f82-b77d-30ce8075dacc', spUUID='Error', imgUUID='using', volUUID='command:', options=None) the four parameters: sdUUID='ceade227-c2df-4f82-b77d-30ce8075dacc' spUUID='Error' imgUUID='using' volUUID='command:' look really suspicious. Beside the obviously wrong spUUID, imgUUID, volUUID, also the sdUUID was wrong (in fact ceade227-c2df-4f82-b77d-30ce8075dacc is an image, not a storage domain), and that's what triggered the StorageDomainDoesNotExist error. Were you running manual getVolumeInfo commands? Any automated script? This seems completely unrelated to VDSM and engine.
Federico Simoncelli,This is a whole different issue which was resolved at BZ #1112722, sorry about this mix-up,this has nothing to do with the current bug
Ori, can we get CLEAN logs for this scenario please?
Created attachment 916059 [details] vdsm+engine logs sure reproduced on av3.6 Thread-300::ERROR::2014-07-07 15:28:45,514::sdc::143::Storage.StorageDomainCache::(_findDomain) domain ce6b5c90-72f4-4a7e-b079-715b268bf706 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID)
(In reply to Ori from comment #5) > Created attachment 916059 [details] > vdsm+engine logs > > sure reproduced on av3.6 > > Thread-300::ERROR::2014-07-07 > 15:28:45,514::sdc::143::Storage.StorageDomainCache::(_findDomain) domain > ce6b5c90-72f4-4a7e-b079-715b268bf706 not found > Traceback (most recent call last): > File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain > dom = findMethod(sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain > raise se.StorageDomainDoesNotExist(sdUUID) BZ title needed an update. The warning is part of the createStorageDomain flow: Thread-300::DEBUG::2014-07-07 15:28:44,481::BindingXMLRPC::251::vds::(wrapper) client [10.35.161.37] flowID [761f2c8d] Thread-300::DEBUG::2014-07-07 15:28:44,482::task::595::TaskManager.Task::(_updateState) Task=`84a8cda5-de74-4502-b77a-00199b48f8d5`::moving from state init -> state preparing Thread-300::INFO::2014-07-07 15:28:44,482::logUtils::44::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=1, sdUUID='ce6b5c90-72f4-4a7e-b079-715b268bf706', domainName='nfs_3', typeSpecificArg='10.35.160.108:/RHEV/ogofen/8', domClass=1, domVersion='3', options=None) ... Thread-300::ERROR::2014-07-07 15:28:45,514::sdc::143::Storage.StorageDomainCache::(_findDomain) domain ce6b5c90-72f4-4a7e-b079-715b268bf706 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('ce6b5c90-72f4-4a7e-b079-715b268bf706',) and it's making sure that the new domain we're creating doesn't exists already. In general logging the lookup for a domain that doesn't exists is interesting.
(In reply to Federico Simoncelli from comment #6) > In general logging the lookup for a domain that doesn't exists is > interesting. Sure, but why is this an ERROR? Seems to me it will needlessly alarm customers.
(In reply to Allon Mureinik from comment #7) > (In reply to Federico Simoncelli from comment #6) > > In general logging the lookup for a domain that doesn't exists is > > interesting. > > Sure, but why is this an ERROR? > Seems to me it will needlessly alarm customers. Looking for a domain that cannot be found (StorageDomainDoesNotExist) generates an exception, and that is an error. The question is if it should be always logged. At the moment we are logging it in all cases because there's only one situation where we expect it to fail (storage domain creation).
(In reply to Federico Simoncelli from comment #8) > (In reply to Allon Mureinik from comment #7) > > (In reply to Federico Simoncelli from comment #6) > > > In general logging the lookup for a domain that doesn't exists is > > > interesting. > > > > Sure, but why is this an ERROR? > > Seems to me it will needlessly alarm customers. > > Looking for a domain that cannot be found (StorageDomainDoesNotExist) > generates an exception, and that is an error. > > The question is if it should be always logged. At the moment we are logging > it in all cases because there's only one situation where we expect it to > fail (storage domain creation). Sean - I suggest closing this as WONTFIX. Any objection?
(In reply to Federico Simoncelli from comment #8) > (In reply to Allon Mureinik from comment #7) > > (In reply to Federico Simoncelli from comment #6) > > > In general logging the lookup for a domain that doesn't exists is > > > interesting. > > > > Sure, but why is this an ERROR? > > Seems to me it will needlessly alarm customers. > > Looking for a domain that cannot be found (StorageDomainDoesNotExist) > generates an exception, and that is an error. > > The question is if it should be always logged. At the moment we are logging > it in all cases because there's only one situation where we expect it to > fail (storage domain creation). for some reason I have filtered this discussion,it is important to mark that this bug happens also when creating a TEMPLATE as described in the description,not only on domain creation.
(In reply to Ori from comment #11) > (In reply to Federico Simoncelli from comment #8) > > (In reply to Allon Mureinik from comment #7) > > > (In reply to Federico Simoncelli from comment #6) > > > > In general logging the lookup for a domain that doesn't exists is > > > > interesting. > > > > > > Sure, but why is this an ERROR? > > > Seems to me it will needlessly alarm customers. > > > > Looking for a domain that cannot be found (StorageDomainDoesNotExist) > > generates an exception, and that is an error. > > > > The question is if it should be always logged. At the moment we are logging > > it in all cases because there's only one situation where we expect it to > > fail (storage domain creation). > > for some reason I have filtered this discussion,it is important to mark that > this bug happens also when creating a TEMPLATE as described in the > description,not only on domain creation. No, as per comment 2 and comment 3 the error during template creation was because of bug 1112722. If not please attach more logs.
*** Bug 1203296 has been marked as a duplicate of this bug. ***
*** Bug 1256036 has been marked as a duplicate of this bug. ***