Created attachment 1238577 [details]
Full suite logs from Jenkins
Description of problem:
The suite fails on "hotplug_nic" phase under basic_sanity scenario.
According to Roy Golan, this is a bug in the message handling.
Attached are the logs from the Jenkins.
Version-Release number of selected component (if applicable): 4.1
1) Run the following jenkins job manually: http://jenkins.ovirt.org/job/ovirt_4.1_system-tests_manual/
2) Clone ovirt-system-tests repo:
and run 'basic_suite_4.1' locally.
The build fail on a specific stage.
The build should succeed, and all the tests should pass.
Please rephrase title. please provide more information.
Piotr, can you take a look?
Sure, will try to reproduce it
Restoring needinfo on Daniel.
I run system tests 2 time and I was unable to reproduce it. I hit storage issue instead:
2017-01-09 09:34:26,791-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler1) [287f4e8e] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = Volume does not exist: (u'cfb05ab4-2873-4d18-a97e-c1bc6e2db4ec',), code = 201
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/28131fcf-e6ad-46da-8814-35b254d7c5cc', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041'], ecode=1, stdout=, stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': No such file or directory"], message=None, code = 100'
2017-01-09 09:21:21,572 WARN (jsonrpc/6) [storage.HSM] getPV failed for guid: 360014052b16e136e2784209bdc213013 (hsm:1970)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
pv = lvm.getPV(guid)
File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
InaccessiblePhysDev: Multipath cannot access physical device(s): "devices=(u'360014052b16e136e2784209bdc213013',)"
2017-01-09 09:23:11,046 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for unfetched domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:151)
2017-01-09 09:23:11,047 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:168)
2017-01-09 09:23:11,088 WARN (jsonrpc/0) [storage.LVM] lvm vgs failed: 5  [' WARNING: Not using lvmetad because config setting use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).', ' Volume group "e72271a3-806c-4d8b-8141-a4fe22c63fe6" not found', ' Cannot process volume group e72271a3-806c-4d8b-8141-a4fe22c63fe6'] (lvm:377)
2017-01-09 09:23:11,090 ERROR (jsonrpc/0) [storage.StorageDomainCache] domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 not found (sdc:157)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
StorageDomainDoesNotExist: Storage domain does not exist: (u'e72271a3-806c-4d8b-8141-a4fe22c63fe6',)
Tal - can someone take a look?
Nir, you're this QE contact, please have a look at what might be a storage issue in comment #5
Might be related to Bug 1410120, firstname.lastname@example.org saw that on his env and that triggered the miss handling - according to him step to reproduce
- have 1 sd, iscsi + 1 host
- add 1 sd iscsi
(In reply to Piotr Kliczewski from comment #5)
> I run system tests 2 time and I was unable to reproduce it. I hit storage
> issue instead:
Piotr, please attach full vdsm logs from your environment, we cannot do anything
with the data in comment 5, there is no context.
(In reply to Roy Golan from comment #8)
> Might be related to Bug 1410120, email@example.com saw that on his env and
> that triggered the miss handling - according to him step to reproduce
> - have 1 sd, iscsi + 1 host
> - add 1 sd iscsi
Roy, bug 1410120 seems like a scale issue, trying to start many vms in the same
time with slow storage, overloading the request queue in ioprocess. I don't think
it is related to system tests flows.
Nit I don't have the logs in my env. I saw similar issue in CI. Please see devel@ovirt where there is a link.
(In reply to Tal Nisan from comment #7)
> Nir, you're this QE contact, please have a look at what might be a storage
> issue in comment #5
We don't have the required data related to comment 5, so we can only look at the
I did not look into the attached logs but I don't see any indication that this is
related to storage.
This should be handled by the owner of the failing test (hotplug_nic). If Roy
has more information about message handling lets add the information to the bug.
Roy, can you add the information about message handling mentioned in comment 0?
Please see this for logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4658/
I think this is fixed now by Piotr. Please CLOSE-UPSTREAM if indeed this works now.