Bug 1260409

Summary: Migration 7.2->7.1 failed since libvirtError in 'virDomainMigrateToURI2'
Product: Red Hat Enterprise Virtualization Manager Reporter: Israel Pinto <ipinto>
Component: vdsmAssignee: Francesco Romani <fromani>
Status: CLOSED ERRATA QA Contact: Israel Pinto <ipinto>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.5.4CC: bazulay, fjin, fromani, gklein, ipinto, istein, lsurette, mavital, mgoldboi, michal.skrivanek, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rc3Keywords: AutomationBlocker, Regression
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.17-11.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:45:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1265111    
Bug Blocks:    
Attachments:
Description Flags
source_host_log
none
host_in_same_cluster_logs
none
engine_log none

Description Israel Pinto 2015-09-06 13:54:57 UTC
Description of problem:
Migration VM in automation testing,
In case of put host to maintenance with one VM.
 

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Manager Version: 3.5.4.2-1.3.el6ev 
VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev

How reproducible:
All the time.


Steps to Reproduce:
1.Create VM,run VM
2.Put host to maintenance 
3.Check host and VM status

Actual results:
Host did not switch to maintenance

Expected results:
host switch to maintenance and VM up and running on second host


Additional info:

from vdsm log:
Thread-905::DEBUG::2015-09-04 01:21:01,846::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 27 edom: 20 level: 2 message: XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none)
Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::386::vm.Vm::(cancel) vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::canceling migration downtime thread
Thread-906::DEBUG::2015-09-04 01:21:01,847::migration::383::vm.Vm::(run) vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::migration downtime thread exiting
Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::480::vm.Vm::(stop) vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::stopping migration monitor thread
Thread-905::ERROR::2015-09-04 01:21:01,848::migration::161::vm.Vm::(_recover) vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none)
Thread-905::ERROR::2015-09-04 01:21:02,170::migration::260::vm.Vm::(run) vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 246, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 335, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 702, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none)

Comment 1 Israel Pinto 2015-09-06 14:00:04 UTC
Created attachment 1070746 [details]
source_host_log

Comment 2 Israel Pinto 2015-09-06 14:02:15 UTC
Created attachment 1070747 [details]
host_in_same_cluster_logs

Comment 3 Israel Pinto 2015-09-06 14:05:04 UTC
Created attachment 1070748 [details]
engine_log

Comment 4 Francesco Romani 2015-09-07 11:50:22 UTC
it seems this and https://bugzilla.redhat.com/show_bug.cgi?id=1260177 share the same root cause.

Comment 5 Francesco Romani 2015-09-07 14:46:23 UTC
not sure it is VDSM bug, but taking the bug to make sure to give enough bandwidth to it.

Comment 6 Francesco Romani 2015-09-07 20:13:04 UTC
(In reply to Israel Pinto from comment #0)
> Description of problem:
> Migration VM in automation testing,
> In case of put host to maintenance with one VM.
>  
> 
> Version-Release number of selected component (if applicable):
> Red Hat Enterprise Virtualization Manager Version: 3.5.4.2-1.3.el6ev 
> VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev
> 
> How reproducible:
> All the time.
> 
> 
> Steps to Reproduce:
> 1.Create VM,run VM
> 2.Put host to maintenance 
> 3.Check host and VM status
> 
> Actual results:
> Host did not switch to maintenance
> 
> Expected results:
> host switch to maintenance and VM up and running on second host
> 
> 
> Additional info:
> 
> from vdsm log:
> Thread-905::DEBUG::2015-09-04
> 01:21:01,846::libvirtconnection::143::root::(wrapper) Unknown libvirterror:
> ecode: 27 edom: 20 level: 2 message: XML error: graphics listen attribute
> 10.35.160.55 must match address attribute of first listen element (found
> none)
> Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::386::vm.Vm::(cancel)
> vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::canceling migration downtime
> thread
> Thread-906::DEBUG::2015-09-04 01:21:01,847::migration::383::vm.Vm::(run)
> vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::migration downtime thread
> exiting
> Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::480::vm.Vm::(stop)
> vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::stopping migration monitor
> thread
> Thread-905::ERROR::2015-09-04
> 01:21:01,848::migration::161::vm.Vm::(_recover)
> vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::XML error: graphics listen
> attribute 10.35.160.55 must match address attribute of first listen element
> (found none)
> Thread-905::ERROR::2015-09-04 01:21:02,170::migration::260::vm.Vm::(run)
> vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::Failed to migrate
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/migration.py", line 246, in run
>     self._startUnderlyingMigration(time.time())
>   File "/usr/share/vdsm/virt/migration.py", line 335, in
> _startUnderlyingMigration
>     None, maxBandwidth)
>   File "/usr/share/vdsm/virt/vm.py", line 702, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> 111, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in
> migrateToURI2
>     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
> dom=self)
> libvirtError: XML error: graphics listen attribute 10.35.160.55 must match
> address attribute of first listen element (found none)


OK, so both the source and destination host are running VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev, right?

I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm?

Do the plain, user-triggered migration between the two host work?

Comment 7 Israel Pinto 2015-09-08 05:44:12 UTC
(In reply to Francesco Romani from comment #6)
> (In reply to Israel Pinto from comment #0)
> > Description of problem:
> > Migration VM in automation testing,
> > In case of put host to maintenance with one VM.
> >  
> > 
> > Version-Release number of selected component (if applicable):
> > Red Hat Enterprise Virtualization Manager Version: 3.5.4.2-1.3.el6ev 
> > VDSM (RHEL 7.2): vdsm-4.16.26-1.el7ev
> > 
> > How reproducible:
> > All the time.
> > 
> > 
> > Steps to Reproduce:
> > 1.Create VM,run VM
> > 2.Put host to maintenance 
> > 3.Check host and VM status
> > 
> > Actual results:
> > Host did not switch to maintenance
> > 
> > Expected results:
> > host switch to maintenance and VM up and running on second host
> > 
> > 
> > Additional info:
> > 
> > from vdsm log:
> > Thread-905::DEBUG::2015-09-04
> > 01:21:01,846::libvirtconnection::143::root::(wrapper) Unknown libvirterror:
> > ecode: 27 edom: 20 level: 2 message: XML error: graphics listen attribute
> > 10.35.160.55 must match address attribute of first listen element (found
> > none)
> > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::386::vm.Vm::(cancel)
> > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::canceling migration downtime
> > thread
> > Thread-906::DEBUG::2015-09-04 01:21:01,847::migration::383::vm.Vm::(run)
> > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::migration downtime thread
> > exiting
> > Thread-905::DEBUG::2015-09-04 01:21:01,847::migration::480::vm.Vm::(stop)
> > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::stopping migration monitor
> > thread
> > Thread-905::ERROR::2015-09-04
> > 01:21:01,848::migration::161::vm.Vm::(_recover)
> > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::XML error: graphics listen
> > attribute 10.35.160.55 must match address attribute of first listen element
> > (found none)
> > Thread-905::ERROR::2015-09-04 01:21:02,170::migration::260::vm.Vm::(run)
> > vmId=`96aa6da9-b0c0-4047-a001-7df31f288f91`::Failed to migrate
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/virt/migration.py", line 246, in run
> >     self._startUnderlyingMigration(time.time())
> >   File "/usr/share/vdsm/virt/migration.py", line 335, in
> > _startUnderlyingMigration
> >     None, maxBandwidth)
> >   File "/usr/share/vdsm/virt/vm.py", line 702, in f
> >     ret = attr(*args, **kwargs)
> >   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> > 111, in wrapper
> >     ret = f(*args, **kwargs)
> >   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in
> > migrateToURI2
> >     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
> > dom=self)
> > libvirtError: XML error: graphics listen attribute 10.35.160.55 must match
> > address attribute of first listen element (found none)
> 
> 
> OK, so both the source and destination host are running VDSM (RHEL 7.2):
> vdsm-4.16.26-1.el7ev, right?
> 
> I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm?
> 
> Do the plain, user-triggered migration between the twsto host work?

1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev
2. I tested (manually):
    user triggered migration - it works fine
    maintenance with one VM - works fine
But it automation those cases failed, not only in virt testing.

Comment 8 Michal Skrivanek 2015-09-08 13:33:47 UTC
(In reply to Israel Pinto from comment #7)
> > 
> > OK, so both the source and destination host are running VDSM (RHEL 7.2):
> > vdsm-4.16.26-1.el7ev, right?
> > 
> > I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm?
> > 
> > Do the plain, user-triggered migration between the twsto host work?
> 
> 1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev
> 2. I tested (manually):
>     user triggered migration - it works fine
>     maintenance with one VM - works fine
> But it automation those cases failed, not only in virt testing.

Israel, please answer the question. Please confirm libvirt version, on those automation machines

Comment 9 Francesco Romani 2015-09-08 14:54:46 UTC
I fully understand the urgency and I'm working to reproduce the issue and find the root cause.

However, what I'd like to ask is:
According from https://bugzilla.redhat.com/show_bug.cgi?id=1260409#c0 
- automatic migration with just one VM fails
- it always fails

But on VDSM, all the flows uses the same verb, so this mean that every migration must fail, whatever the reason it is triggered, automatic or as per user request.

Is migration completely broken, then? I surely can't reproduce this.

Comment 10 Israel Pinto 2015-09-08 15:17:19 UTC
libvirt version:
libvirt-1.2.17-6.el7

Comment 11 Francesco Romani 2015-09-08 16:15:36 UTC
(In reply to Israel Pinto from comment #7)

> > I suppose libvirt is 1.2.17-5.el7 on both hosts, can you confirm?
> > 
> > Do the plain, user-triggered migration between the twsto host work?
> 
> 1. Both hosts are rhel 7.2 with vdsm-4.16.26-1.el7ev
> 2. I tested (manually):
>     user triggered migration - it works fine
>     maintenance with one VM - works fine
> But it automation those cases failed, not only in virt testing.

OK, so I need to see full automation logs. If both maintenance and manual migration works, it's something else - and much less critical.

Please point me to any recent automation failure, with full logs (engine, source, destination). A link to jenkins job is fine.

Comment 14 Yaniv Kaul 2015-09-16 10:52:24 UTC
Has anyone looked at the specific issue libvirt is complaining about: 
XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none) 

?

Comment 15 Francesco Romani 2015-09-16 11:01:10 UTC
(In reply to Yaniv Kaul from comment #14)
> Has anyone looked at the specific issue libvirt is complaining about: 
> XML error: graphics listen attribute 10.35.160.55 must match address
> attribute of first listen element (found none) 
> 
> ?

Yes. We already seen this error, but not in this context.
This is indeed worrysome, if confirmed, so I tried to reproduce it few times, without luck. Please note that in the very same environment the issue does not happen 100% of times.

Comment 16 Michal Skrivanek 2015-09-17 09:35:36 UTC
I see few more runs now...so it seems we have:
#92 - fails 
#93 - fails
#94 - fixed
#95 bogus - env error
#96 network and storage failures, this particular test was skipped
#97 bogus - env error

so..is it still failing actually?

Comment 17 Yaniv Kaul 2015-09-20 14:45:07 UTC
Any chance it's related to bug 1261007 ?

Comment 18 Ilanit Stein 2015-09-22 08:42:29 UTC
Error, same as in this bug description, seen on rhevm-3.5.4.2-1.3.el6ev,
for VM migration 

From:
-RHEL 7.2:
vdsm-4.16.27-1.el7ev.x86_64
libvirt-1.2.17-9.el7.x86_64

To:
 -RHEV-H 7.1 for RHEV 3.5.4-1 ASYNC (rhev-hypervisor7-7.1-20150911.0):
vdsm-4.16.26-1.el7ev.x86_64
libvirt-1.2.8-16.el7_1.3.x86_64

Comment 19 Ilanit Stein 2015-09-22 08:44:12 UTC
Removing need info from ipinto, following the Depends On: 1265111, by fromani.

Comment 20 Michal Skrivanek 2015-09-22 10:42:23 UTC
(In reply to Ilanit Stein from comment #18)
> Error, same as in this bug description, seen on rhevm-3.5.4.2-1.3.el6ev,
> for VM migration 
> 
> From:
> -RHEL 7.2:
> vdsm-4.16.27-1.el7ev.x86_64
> libvirt-1.2.17-9.el7.x86_64
> 
> To:
>  -RHEV-H 7.1 for RHEV 3.5.4-1 ASYNC (rhev-hypervisor7-7.1-20150911.0):
> vdsm-4.16.26-1.el7ev.x86_64
> libvirt-1.2.8-16.el7_1.3.x86_64

that is a great finding and helped identifying a real compatibility issue, but we need a confirmation this is what was going on in these automated tests here as well

Comment 21 Michal Skrivanek 2015-09-22 10:50:29 UTC
#98 user aborted
#99 bogus  - env error
#100 - 18 other errors (storage and sla), the one in question worked ok
#101 - end abruptly, test skipped
#102 - end abruptly, test skipped

We have 2 successes and no failures recently, I suggest to close the bug and get environment stable. And focus on 1265111 instead

Comment 22 Michal Skrivanek 2015-09-23 12:53:55 UTC
(In reply to Michal Skrivanek from comment #21)
> #98 user aborted
> #99 bogus  - env error
> #100 - 18 other errors (storage and sla), the one in question worked ok
> #101 - end abruptly, test skipped
> #102 - end abruptly, test skipped
> 
> We have 2 successes and no failures recently, I suggest to close the bug and
> get environment stable. And focus on 1265111 instead

finally, #103 fails clearly again, same error

Comment 23 Michal Skrivanek 2015-09-23 13:28:40 UTC
so, finally digged into all the jenkins logs and contrary to what the jenkins page says the migration actually happened from 7.1 -> 7.2 -> 7.1, the last one failed. That is consistent with the manual testing findings (e.g. in https://bugzilla.redhat.com/show_bug.cgi?id=1265111#c5)

Comment 24 Michal Skrivanek 2015-09-24 07:53:26 UTC
keeping open for .spec bump up

Comment 25 Michal Skrivanek 2015-09-25 10:33:40 UTC
after all the libvirt change will be in 7.2 GA hence no need for spec bump

Note you need libvirt-1.2.17-11.el7 to test this, regardless RHEV/oVirt version

Comment 26 Fangge Jin 2015-09-28 11:14:38 UTC
I can reproduce this bug with build:
rhel7.2: libvirt-1.2.17-10.el7.x86_64
rhel7.1: libvirt-1.2.8-16.el7_1.3.x86_64

Steps:
1.Register rhel7.1 host and rhel7.2 host to rhevm
2.Create a guest on rhel7.2 host via rhevm
3.Migrate the guest to rhel7.1 host, migration failed with the error in vdsm.log:

Thread-336::ERROR::2015-09-28 18:59:55,284::migration::161::vm.Vm::(_recover) vmId=`6cf1a976-bd6f-4208-9793-fd0cb2b90188`::XML error: graphics listen attribute 10.66.106.26 must match address attribute of first listen element (found none)
Thread-336::ERROR::2015-09-28 18:59:55,310::migration::260::vm.Vm::(run) vmId=`6cf1a976-bd6f-4208-9793-fd0cb2b90188`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 246, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 325, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 689, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1701, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: XML error: graphics listen attribute 10.66.106.26 must match address attribute of first listen element (found none)

Comment 27 Fangge Jin 2015-09-28 11:16:16 UTC
Verify pass on build libvirt-1.2.17-11.el7.x86_64

Steps are same as comment 26, migration succeed.

Comment 29 Francesco Romani 2016-01-19 15:27:18 UTC
libvirt issue, no need to mention this in RHEV docs.

Comment 31 errata-xmlrpc 2016-03-09 19:45:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html