Created attachment 1238577 [details] Full suite logs from Jenkins Description of problem: The suite fails on "hotplug_nic" phase under basic_sanity scenario. According to Roy Golan, this is a bug in the message handling. Attached are the logs from the Jenkins. Version-Release number of selected component (if applicable): 4.1 How reproducible: 2 options: 1) Run the following jenkins job manually: http://jenkins.ovirt.org/job/ovirt_4.1_system-tests_manual/ 2) Clone ovirt-system-tests repo: https://gerrit.ovirt.org/#/admin/projects/ovirt-system-tests and run 'basic_suite_4.1' locally. Actual results: The build fail on a specific stage. Expected results: The build should succeed, and all the tests should pass.
Please rephrase title. please provide more information.
Piotr, can you take a look?
Sure, will try to reproduce it
Restoring needinfo on Daniel.
Daniel, I run system tests 2 time and I was unable to reproduce it. I hit storage issue instead: 2017-01-09 09:34:26,791-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler1) [287f4e8e] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = Volume does not exist: (u'cfb05ab4-2873-4d18-a97e-c1bc6e2db4ec',), code = 201 -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/28131fcf-e6ad-46da-8814-35b254d7c5cc', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': No such file or directory"], message=None, code = 100' 2017-01-09 09:21:21,572 WARN (jsonrpc/6) [storage.HSM] getPV failed for guid: 360014052b16e136e2784209bdc213013 (hsm:1970) Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList pv = lvm.getPV(guid) File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV raise se.InaccessiblePhysDev((pvName,)) InaccessiblePhysDev: Multipath cannot access physical device(s): "devices=(u'360014052b16e136e2784209bdc213013',)" 2017-01-09 09:23:11,046 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for unfetched domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:151) 2017-01-09 09:23:11,047 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:168) 2017-01-09 09:23:11,088 WARN (jsonrpc/0) [storage.LVM] lvm vgs failed: 5 [] [' WARNING: Not using lvmetad because config setting use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).', ' Volume group "e72271a3-806c-4d8b-8141-a4fe22c63fe6" not found', ' Cannot process volume group e72271a3-806c-4d8b-8141-a4fe22c63fe6'] (lvm:377) 2017-01-09 09:23:11,090 ERROR (jsonrpc/0) [storage.StorageDomainCache] domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 not found (sdc:157) Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'e72271a3-806c-4d8b-8141-a4fe22c63fe6',)
Tal - can someone take a look?
Nir, you're this QE contact, please have a look at what might be a storage issue in comment #5
Might be related to Bug 1410120, guchen saw that on his env and that triggered the miss handling - according to him step to reproduce - have 1 sd, iscsi + 1 host - add 1 sd iscsi
(In reply to Piotr Kliczewski from comment #5) > Daniel, > > I run system tests 2 time and I was unable to reproduce it. I hit storage > issue instead: Piotr, please attach full vdsm logs from your environment, we cannot do anything with the data in comment 5, there is no context.
(In reply to Roy Golan from comment #8) > Might be related to Bug 1410120, guchen saw that on his env and > that triggered the miss handling - according to him step to reproduce > - have 1 sd, iscsi + 1 host > - add 1 sd iscsi Roy, bug 1410120 seems like a scale issue, trying to start many vms in the same time with slow storage, overloading the request queue in ioprocess. I don't think it is related to system tests flows.
Nit I don't have the logs in my env. I saw similar issue in CI. Please see devel@ovirt where there is a link.
(In reply to Tal Nisan from comment #7) > Nir, you're this QE contact, please have a look at what might be a storage > issue in comment #5 We don't have the required data related to comment 5, so we can only look at the attached logs. I did not look into the attached logs but I don't see any indication that this is related to storage. This should be handled by the owner of the failing test (hotplug_nic). If Roy has more information about message handling lets add the information to the bug. Roy, can you add the information about message handling mentioned in comment 0?
Nir, Please see this for logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4658/
I think this is fixed now by Piotr. Please CLOSE-UPSTREAM if indeed this works now.