Bug 1260409
Summary: | Migration 7.2->7.1 failed since libvirtError in 'virDomainMigrateToURI2' | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Israel Pinto <ipinto> | ||||||||
Component: | vdsm | Assignee: | Francesco Romani <fromani> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Israel Pinto <ipinto> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 3.5.4 | CC: | bazulay, fjin, fromani, gklein, ipinto, istein, lsurette, mavital, mgoldboi, michal.skrivanek, ycui, yeylon, ykaul | ||||||||
Target Milestone: | ovirt-3.6.0-rc3 | Keywords: | AutomationBlocker, Regression | ||||||||
Target Release: | 3.6.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libvirt-1.2.17-11.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-03-09 19:45:03 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1265111 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Description
Israel Pinto
2015-09-06 13:54:57 UTC
Created attachment 1070746 [details]
source_host_log
Created attachment 1070747 [details]
host_in_same_cluster_logs
Created attachment 1070748 [details]
engine_log
it seems this and https://bugzilla.redhat.com/show_bug.cgi?id=1260177 share the same root cause. not sure it is VDSM bug, but taking the bug to make sure to give enough bandwidth to it. (In reply to Israel Pinto from comment #0) > Description of problem: > Migration VM in automation testing, > In case of put host to maintenance with one VM. > > > Version-Release number of selected component (if applicable): > Red Hat Enterprise Virtualization Manager Version: 3.5.4.2-1.3.el6ev > VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev > > How reproducible: > All the time. > > > Steps to Reproduce: > 1.Create VM,run VM > 2.Put host to maintenance > 3.Check host and VM status > > Actual results: > Host did not switch to maintenance > > Expected results: > host switch to maintenance and VM up and running on second host > > > Additional info: > > from vdsm log: > Thread-905::DEBUG::2015-09-04 > 01:21:01,846::libvirtconnection::143::root::(wrapper) Unknown libvirterror: > ecode: 27 edom: 20 level: 2 message: XML error: graphics listen attribute > 10.35.160.55 must match address attribute of first listen element (found > none) > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::386::vm.Vm::(cancel) > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::canceling migration downtime > thread > Thread-906::DEBUG::2015-09-04 01:21:01,847::migration::383::vm.Vm::(run) > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::migration downtime thread > exiting > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::480::vm.Vm::(stop) > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::stopping migration monitor > thread > Thread-905::ERROR::2015-09-04 > 01:21:01,848::migration::161::vm.Vm::(_recover) > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::XML error: graphics listen > attribute 10.35.160.55 must match address attribute of first listen element > (found none) > Thread-905::ERROR::2015-09-04 01:21:02,170::migration::260::vm.Vm::(run) > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::Failed to migrate > Traceback (most recent call last): > File "/usr/share/vdsm/virt/migration.py", line 246, in run > self._startUnderlyingMigration(time.time()) > File "/usr/share/vdsm/virt/migration.py", line 335, in > _startUnderlyingMigration > None, maxBandwidth) > File "/usr/share/vdsm/virt/vm.py", line 702, in f > ret = attr(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line > 111, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in > migrateToURI2 > if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', > dom=self) > libvirtError: XML error: graphics listen attribute 10.35.160.55 must match > address attribute of first listen element (found none) OK, so both the source and destination host are running VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev, right? I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm? Do the plain, user-triggered migration between the two host work? (In reply to Francesco Romani from comment #6) > (In reply to Israel Pinto from comment #0) > > Description of problem: > > Migration VM in automation testing, > > In case of put host to maintenance with one VM. > > > > > > Version-Release number of selected component (if applicable): > > Red Hat Enterprise Virtualization Manager Version: 3.5.4.2-1.3.el6ev > > VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev > > > > How reproducible: > > All the time. > > > > > > Steps to Reproduce: > > 1.Create VM,run VM > > 2.Put host to maintenance > > 3.Check host and VM status > > > > Actual results: > > Host did not switch to maintenance > > > > Expected results: > > host switch to maintenance and VM up and running on second host > > > > > > Additional info: > > > > from vdsm log: > > Thread-905::DEBUG::2015-09-04 > > 01:21:01,846::libvirtconnection::143::root::(wrapper) Unknown libvirterror: > > ecode: 27 edom: 20 level: 2 message: XML error: graphics listen attribute > > 10.35.160.55 must match address attribute of first listen element (found > > none) > > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::386::vm.Vm::(cancel) > > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::canceling migration downtime > > thread > > Thread-906::DEBUG::2015-09-04 01:21:01,847::migration::383::vm.Vm::(run) > > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::migration downtime thread > > exiting > > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::480::vm.Vm::(stop) > > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::stopping migration monitor > > thread > > Thread-905::ERROR::2015-09-04 > > 01:21:01,848::migration::161::vm.Vm::(_recover) > > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::XML error: graphics listen > > attribute 10.35.160.55 must match address attribute of first listen element > > (found none) > > Thread-905::ERROR::2015-09-04 01:21:02,170::migration::260::vm.Vm::(run) > > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::Failed to migrate > > Traceback (most recent call last): > > File "/usr/share/vdsm/virt/migration.py", line 246, in run > > self._startUnderlyingMigration(time.time()) > > File "/usr/share/vdsm/virt/migration.py", line 335, in > > _startUnderlyingMigration > > None, maxBandwidth) > > File "/usr/share/vdsm/virt/vm.py", line 702, in f > > ret = attr(*args, **kwargs) > > File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line > > 111, in wrapper > > ret = f(*args, **kwargs) > > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in > > migrateToURI2 > > if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', > > dom=self) > > libvirtError: XML error: graphics listen attribute 10.35.160.55 must match > > address attribute of first listen element (found none) > > > OK, so both the source and destination host are running VDSM (RHEL 7.2): > vdsm-4.16.26-1.el7ev, right? > > I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm? > > Do the plain, user-triggered migration between the twsto host work? 1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev 2. I tested (manually): user triggered migration - it works fine maintenance with one VM - works fine But it automation those cases failed, not only in virt testing. (In reply to Israel Pinto from comment #7) > > > > OK, so both the source and destination host are running VDSM (RHEL 7.2): > > vdsm-4.16.26-1.el7ev, right? > > > > I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm? > > > > Do the plain, user-triggered migration between the twsto host work? > > 1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev > 2. I tested (manually): > user triggered migration - it works fine > maintenance with one VM - works fine > But it automation those cases failed, not only in virt testing. Israel, please answer the question. Please confirm libvirt version, on those automation machines I fully understand the urgency and I'm working to reproduce the issue and find the root cause. However, what I'd like to ask is: According from https://bugzilla.redhat.com/show_bug.cgi?id=1260409#c0 - automatic migration with just one VM fails - it always fails But on VDSM, all the flows uses the same verb, so this mean that every migration must fail, whatever the reason it is triggered, automatic or as per user request. Is migration completely broken, then? I surely can't reproduce this. libvirt version: libvirt-1.2.17-6.el7 (In reply to Israel Pinto from comment #7) > > I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm? > > > > Do the plain, user-triggered migration between the twsto host work? > > 1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev > 2. I tested (manually): > user triggered migration - it works fine > maintenance with one VM - works fine > But it automation those cases failed, not only in virt testing. OK, so I need to see full automation logs. If both maintenance and manual migration works, it's something else - and much less critical. Please point me to any recent automation failure, with full logs (engine, source, destination). A link to jenkins job is fine. Has anyone looked at the specific issue libvirt is complaining about: XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none) ? (In reply to Yaniv Kaul from comment #14) > Has anyone looked at the specific issue libvirt is complaining about: > XML error: graphics listen attribute 10.35.160.55 must match address > attribute of first listen element (found none) > > ? Yes. We already seen this error, but not in this context. This is indeed worrysome, if confirmed, so I tried to reproduce it few times, without luck. Please note that in the very same environment the issue does not happen 100% of times. I see few more runs now...so it seems we have: #92 - fails #93 - fails #94 - fixed #95 bogus - env error #96 network and storage failures, this particular test was skipped #97 bogus - env error so..is it still failing actually? Any chance it's related to bug 1261007 ? Error, same as in this bug description, seen on rhevm-3.5.4.2-1.3.el6ev, for VM migration From: -RHEL 7.2: vdsm-4.16.27-1.el7ev.x86_64 libvirt-1.2.17-9.el7.x86_64 To: -RHEV-H 7.1 for RHEV 3.5.4-1 ASYNC (rhev-hypervisor7-7.1-20150911.0): vdsm-4.16.26-1.el7ev.x86_64 libvirt-1.2.8-16.el7_1.3.x86_64 Removing need info from ipinto, following the Depends On: 1265111, by fromani. (In reply to Ilanit Stein from comment #18) > Error, same as in this bug description, seen on rhevm-3.5.4.2-1.3.el6ev, > for VM migration > > From: > -RHEL 7.2: > vdsm-4.16.27-1.el7ev.x86_64 > libvirt-1.2.17-9.el7.x86_64 > > To: > -RHEV-H 7.1 for RHEV 3.5.4-1 ASYNC (rhev-hypervisor7-7.1-20150911.0): > vdsm-4.16.26-1.el7ev.x86_64 > libvirt-1.2.8-16.el7_1.3.x86_64 that is a great finding and helped identifying a real compatibility issue, but we need a confirmation this is what was going on in these automated tests here as well #98 user aborted #99 bogus - env error #100 - 18 other errors (storage and sla), the one in question worked ok #101 - end abruptly, test skipped #102 - end abruptly, test skipped We have 2 successes and no failures recently, I suggest to close the bug and get environment stable. And focus on 1265111 instead (In reply to Michal Skrivanek from comment #21) > #98 user aborted > #99 bogus - env error > #100 - 18 other errors (storage and sla), the one in question worked ok > #101 - end abruptly, test skipped > #102 - end abruptly, test skipped > > We have 2 successes and no failures recently, I suggest to close the bug and > get environment stable. And focus on 1265111 instead finally, #103 fails clearly again, same error so, finally digged into all the jenkins logs and contrary to what the jenkins page says the migration actually happened from 7.1 -> 7.2 -> 7.1, the last one failed. That is consistent with the manual testing findings (e.g. in https://bugzilla.redhat.com/show_bug.cgi?id=1265111#c5) keeping open for .spec bump up after all the libvirt change will be in 7.2 GA hence no need for spec bump Note you need libvirt-1.2.17-11.el7 to test this, regardless RHEV/oVirt version I can reproduce this bug with build: rhel7.2: libvirt-1.2.17-10.el7.x86_64 rhel7.1: libvirt-1.2.8-16.el7_1.3.x86_64 Steps: 1.Register rhel7.1 host and rhel7.2 host to rhevm 2.Create a guest on rhel7.2 host via rhevm 3.Migrate the guest to rhel7.1 host, migration failed with the error in vdsm.log: Thread-336::ERROR::2015-09-28 18:59:55,284::migration::161::vm.Vm::(_recover) vmId=`6cf1a976-bd6f-4208-9793-fd0cb2b90188`::XML error: graphics listen attribute 10.66.106.26 must match address attribute of first listen element (found none) Thread-336::ERROR::2015-09-28 18:59:55,310::migration::260::vm.Vm::(run) vmId=`6cf1a976-bd6f-4208-9793-fd0cb2b90188`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 246, in run self._startUnderlyingMigration(time.time()) File "/usr/share/vdsm/virt/migration.py", line 325, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/virt/vm.py", line 689, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1701, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: XML error: graphics listen attribute 10.66.106.26 must match address attribute of first listen element (found none) Verify pass on build libvirt-1.2.17-11.el7.x86_64 Steps are same as comment 26, migration succeed. libvirt issue, no need to mention this in RHEV docs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |