Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1169831

Summary: [rhev-upgrade] migration fail with libvirt error after vdsm upgrade
Product: Red Hat Enterprise Virtualization Manager Reporter: Michael Burman <mburman>
Component: ovirt-engineAssignee: Nobody <nobody>
Status: CLOSED WORKSFORME QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: danken, ecohen, gklein, iheim, lpeer, lsurette, lvernia, mavital, mburman, michal.skrivanek, ofrenkel, rbalakri, Rhev-m-bugs, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-15 09:27:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
migration fail logs none

Description Michael Burman 2014-12-02 14:32:51 UTC
Created attachment 963768 [details]
migration fail logs

Description of problem:
[rhev-upgrade] migration fail with libvirt error after vdsm upgrade.
In the upgrade environment was performing migrations over migration network,
after host was updated  vdsm-4.16.7.4-1.el6ev >> vdsm-4.16.7.5-1.el6ev
migrations failed with error:' Migration failed, No available host found (VM: Students_, Source: orange-vdsd.qa.lab.tlv.redhat.com).'
there was a connectivity between source and destinations.
The issue was fixed with shutting vm down and then run again.

Version-Release number of selected component (if applicable):
3.5.0-0.22.el6ev
vdsm-4.16.7.5-1.el6ev
libvirt-0.10.2-46.el6_6.2

How reproducible:
Not sure if this is reproducible.

Additional info:
libvirt and vdsm logs from source and destination attached, also engine log.

Comment 1 Omer Frenkel 2014-12-03 07:28:30 UTC
well it just looks like your other host doesn't have enough memory to run the vm:

2014-12-02 15:41:03,273 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand] (org.ovirt.thread.pool-7-thread-46) [1ba74e91] Running command: MigrateVmCommand internal: false. Entities affected :  ID: d4e7dcb2-1bc7-4df2-be72-9dbfbce24b0f Type: VMAction group MIGRATE_VM with role type USER,  ID: d4e7dcb2-1bc7-4df2-be72-9dbfbce24b0f Type: VMAction group EDIT_VM_PROPERTIES with role type USER,  ID: fc27297c-1b31-4a51-9f63-9b900346ef38 Type: VdsGroupsAction group CREATE_VM with role type USER
2014-12-02 15:41:03,280 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (org.ovirt.thread.pool-7-thread-46) [1ba74e91] Candidate host orange-vdsc.qa.lab.tlv.redhat.com (ff17c2c4-fa88-4dee-8846-21c494a9a16b) was filtered out by VAR__FILTERTYPE__INTERNAL filter Memory (correlation id: 1ba74e91)
2014-12-02 15:41:03,288 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-46) [1ba74e91] Correlation ID: 1ba74e91, Job ID: 75d6cfcd-c0b3-41b6-8416-bb35b6e68b2b, Call Stack: null, Custom Event ID: -1, Message: Migration failed, No available host found (VM: Students_, Source: orange-vdsd.qa.lab.tlv.redhat.com).

are you sure orange-vdsc has enough memory to run the vm?

Comment 2 Michael Burman 2014-12-03 07:36:45 UTC
Yes indeed.
Max free Memory for scheduling new VMs:2355 MB

Please connect my setup 10.35.161.37, it still happening.

Comment 3 Michal Skrivanek 2014-12-05 10:16:56 UTC
(In reply to Michael Burman from comment #2)
> Yes indeed.
> Max free Memory for scheduling new VMs:2355 MB
> 
> Please connect my setup 10.35.161.37, it still happening.

sorry, but you didn't put any more access information, which cluster, which VM...
that said it's most likely the memory. Since there are no other reports I'm removing the blocker request.

please check the scheduling err about memory when it happens to you

Comment 7 Michael Burman 2014-12-07 08:30:26 UTC
Michal, 

please enter the setup, migration still fail, i'm not sure what is the problem, it's in this stage from last week, i didn't change any thing.

Comment 8 Michal Skrivanek 2014-12-08 10:40:35 UTC
I see a host installation is going on right now...so please try to find some sable time or gather relevant logs when it reproduces. Thanks

Comment 9 Michael Burman 2014-12-08 10:50:45 UTC
All relevant logs are attached.
The setup is already upgraded to 3.5 and i can't reproduce this issue at the moment.

I leaved the setup whole a week for investigation until yesterday.

Comment 10 Michal Skrivanek 2014-12-08 11:25:17 UTC
well, too bad the logs are wrong, the libvirt log is 10 days older than vdsm. 

so...it doesn't reproduce anymore? Do you have any better logs from yesterday's attempt?

Comment 11 Michael Burman 2014-12-08 11:51:47 UTC
Please connect setup and hosts, should be there from yesterday's migration attempt.

Comment 13 Michal Skrivanek 2014-12-08 20:58:15 UTC
Seems to be related to networking. Some err about can't set MTU on an interface on dst. Looks like screwed setup....

Comment 14 Michal Skrivanek 2014-12-09 09:14:12 UTC
failing on destination on:
libvirtError: Cannot get interface MTU on 'qbr7a9364a8-67': No such device

I would suspect some network setup discrepancy or bug. Lior?

Comment 15 Lior Vernia 2014-12-09 09:43:18 UTC
I've asked ibarkan to look into it. Not necessarily related to MTU, might be a misleading error message.

Comment 16 Lior Vernia 2014-12-09 11:21:31 UTC
I'm having trouble correlating what you're describing to the right log entries. Please reproduce, and point us EXACTLY to the right log entries when the failure occurs.

Comment 17 Michael Burman 2014-12-09 11:41:15 UTC
Hi Lior,

Like i wrote in the description, i'm not sure this issue reproducible. 
This issue accrued in a mixed upgrade setup, after updating vdsm on my 2 hosts to  vdsm-4.16.7.4-1.el6ev >> vdsm-4.16.7.5-1.el6ev
The setup was in this way for almost a week before restarting engine, update him to latest build and update vdsm on hosts to latest version.

When we will take this setup back from a snap shot, i will try to reproduce.

Comment 18 Dan Kenigsberg 2014-12-09 15:10:46 UTC
(In reply to Michal Skrivanek from comment #14)
> failing on destination on:
> libvirtError: Cannot get interface MTU on 'qbr7a9364a8-67': No such device

qbr7a9364a8-67 stinks of openstack's neutron. Has this host been used by the openstack-net hook? By openstack directly?

Comment 19 Dan Kenigsberg 2014-12-09 15:12:43 UTC
Michal, where did you see that libvirt error? grepping the attached logs produced nothing.

Comment 21 Michal Skrivanek 2014-12-09 15:16:54 UTC
(In reply to Dan Kenigsberg from comment #19)
> Michal, where did you see that libvirt error? grepping the attached logs
> produced nothing.

yes, "thanks" to the IOProcess verbose logging it's not so easy to dig it out from the logs:-) Better to see it live.

Comment 22 Michael Burman 2014-12-10 06:46:16 UTC
(In reply to Dan Kenigsberg from comment #18)
> (In reply to Michal Skrivanek from comment #14)
> > failing on destination on:
> > libvirtError: Cannot get interface MTU on 'qbr7a9364a8-67': No such device
> 
> qbr7a9364a8-67 stinks of openstack's neutron. Has this host been used by the
> openstack-net hook? By openstack directly?

Hi Dan,

This hosts have openstack packages installed:
openstack-neutron-openvswitch-2014.1.3-12.el6ost.noarch
vdsm-hook-openstacknet-4.16.8.1-2.el6ev.noarch
openstack-utils-2014.1-3.2.el6ost.noarch
openstack-neutron-2014.1.3-12.el6ost.noarch

- But, at the stage when this migration issue's happened, no Neutron appliance was running on this hosts, only openstack packages were installed.

Comment 23 Dan Kenigsberg 2014-12-10 11:27:07 UTC
Michael, given the log rotation, we must have a reproduction in order to continue.

Please make sure that http://gerrit.ovirt.org/35930 is applied to /etc/vdsm/logger.conf so logs are not rotated as vigorously.

Comment 24 Michael Burman 2014-12-14 09:58:58 UTC
Dan, when we take this mixed upgrade setup from snapshot, i will try to reproduce this one and apply this to /etc/vdsm/logger.conf.

Comment 25 Michael Burman 2014-12-15 09:09:57 UTC
Didn't managed to reproduce this migration issue.

Comment 26 Lior Vernia 2014-12-15 09:27:56 UTC
So let's close this for now, please reopen if it happens again! Or just open a new bug with the accurate scenario.