Bug 1411221 - Basic sanity test fail on basic_sanity scenario
Summary: Basic sanity test fail on basic_sanity scenario
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Oved Ourfali
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-09 07:48 UTC by Daniel Belenky
Modified: 2017-02-27 13:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-19 07:01:26 UTC
oVirt Team: Infra


Attachments (Terms of Use)
Full suite logs from Jenkins (1.51 MB, application/zip)
2017-01-09 07:48 UTC, Daniel Belenky
no flags Details

Description Daniel Belenky 2017-01-09 07:48:38 UTC
Created attachment 1238577 [details]
Full suite logs from Jenkins

Description of problem:
The suite fails on "hotplug_nic" phase under basic_sanity scenario.
According to Roy Golan, this is a bug in the message handling.

Attached are the logs from the Jenkins.

Version-Release number of selected component (if applicable): 4.1

How reproducible:
2 options:
1) Run the following jenkins job manually: http://jenkins.ovirt.org/job/ovirt_4.1_system-tests_manual/
2) Clone ovirt-system-tests repo:
https://gerrit.ovirt.org/#/admin/projects/ovirt-system-tests
and run 'basic_suite_4.1' locally.


Actual results:
The build fail on a specific stage.

Expected results:
The build should succeed, and all the tests should pass.

Comment 1 Yaniv Kaul 2017-01-09 08:52:58 UTC
Please rephrase title. please provide more information.

Comment 2 Oved Ourfali 2017-01-09 13:05:50 UTC
Piotr, can you take a look?

Comment 3 Piotr Kliczewski 2017-01-09 13:38:57 UTC
Sure, will try to reproduce it

Comment 4 Oved Ourfali 2017-01-09 13:42:11 UTC
Restoring needinfo on Daniel.

Comment 5 Piotr Kliczewski 2017-01-09 15:31:45 UTC
Daniel,

I run system tests 2 time and I was unable to reproduce it. I hit storage issue instead:

2017-01-09 09:34:26,791-05 DEBUG [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler1) [287f4e8e] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVolumeInfoVDS, error = Volume does not exist: (u'cfb05ab4-2873-4d18-a97e-c1bc6e2db4ec',), code = 201


-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/28131fcf-e6ad-46da-8814-35b254d7c5cc', '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': Could not open '/rhev/data-center/mnt/blockSD/367a1527-1b4f-49a4-9339-2d124daf7719/images/1296f956-f482-46b6-896d-222ce27d81c8/3d23987f-603c-408e-9e1f-b27941fba041': No such file or directory"], message=None, code = 100'

2017-01-09 09:21:21,572 WARN  (jsonrpc/6) [storage.HSM] getPV failed for guid: 360014052b16e136e2784209bdc213013 (hsm:1970)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
    pv = lvm.getPV(guid)
  File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
    raise se.InaccessiblePhysDev((pvName,))
InaccessiblePhysDev: Multipath cannot access physical device(s): "devices=(u'360014052b16e136e2784209bdc213013',)"

2017-01-09 09:23:11,046 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for unfetched domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:151)
2017-01-09 09:23:11,047 ERROR (jsonrpc/0) [storage.StorageDomainCache] looking for domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 (sdc:168)
2017-01-09 09:23:11,088 WARN  (jsonrpc/0) [storage.LVM] lvm vgs failed: 5 [] ['  WARNING: Not using lvmetad because config setting use_lvmetad=0.', '  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).', '  Volume group "e72271a3-806c-4d8b-8141-a4fe22c63fe6" not found', '  Cannot process volume group e72271a3-806c-4d8b-8141-a4fe22c63fe6'] (lvm:377)
2017-01-09 09:23:11,090 ERROR (jsonrpc/0) [storage.StorageDomainCache] domain e72271a3-806c-4d8b-8141-a4fe22c63fe6 not found (sdc:157)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'e72271a3-806c-4d8b-8141-a4fe22c63fe6',)

Comment 6 Oved Ourfali 2017-01-10 09:49:14 UTC
Tal - can someone take a look?

Comment 7 Tal Nisan 2017-01-10 13:56:39 UTC
Nir, you're this QE contact, please have a look at what might be a storage issue in comment #5

Comment 8 Roy Golan 2017-01-11 10:26:48 UTC
Might be related to Bug 1410120, guchen@redhat.com saw that on his env and that triggered the miss handling - according to him step to reproduce
- have 1 sd, iscsi + 1 host
- add 1 sd iscsi

Comment 9 Nir Soffer 2017-01-11 10:55:05 UTC
(In reply to Piotr Kliczewski from comment #5)
> Daniel,
> 
> I run system tests 2 time and I was unable to reproduce it. I hit storage
> issue instead:

Piotr, please attach full vdsm logs from your environment, we cannot do anything 
with the data in comment 5, there is no context.

Comment 10 Nir Soffer 2017-01-11 10:57:51 UTC
(In reply to Roy Golan from comment #8)
> Might be related to Bug 1410120, guchen@redhat.com saw that on his env and
> that triggered the miss handling - according to him step to reproduce
> - have 1 sd, iscsi + 1 host
> - add 1 sd iscsi

Roy, bug 1410120 seems like a scale issue, trying to start many vms in the same
time with slow storage, overloading the request queue in ioprocess. I don't think
it is related to system tests flows.

Comment 11 Piotr Kliczewski 2017-01-11 11:38:32 UTC
Nit I don't have the logs in my env. I saw similar issue in CI. Please see devel@ovirt where there is a link.

Comment 12 Nir Soffer 2017-01-11 12:19:42 UTC
(In reply to Tal Nisan from comment #7)
> Nir, you're this QE contact, please have a look at what might be a storage
> issue in comment #5

We don't have the required data related to comment 5, so we can only look at the
attached logs.

I did not look into the attached logs but I don't see any indication that this is 
related to storage.

This should be handled by the owner of the failing test (hotplug_nic). If Roy
has more information about message handling lets add the information to the bug.

Roy, can you add the information about message handling mentioned in comment 0?

Comment 13 Piotr Kliczewski 2017-01-11 12:32:28 UTC
Nir,

Please see this for logs: http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4658/

Comment 14 Yaniv Kaul 2017-01-18 07:20:31 UTC
I think this is fixed now by Piotr. Please CLOSE-UPSTREAM if indeed this works now.


Note You need to log in before you can comment on or make changes to this bug.