Bug 1104733
| Summary: | VDSM failure on migration destination causes stuck migration task | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jake Hunsaker <jhunsaker> |
| Component: | vdsm | Assignee: | Francesco Romani <fromani> |
| Status: | CLOSED ERRATA | QA Contact: | Artyom <alukiano> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.3.0 | CC: | bazulay, dsulliva, fromani, gklein, iheim, jhunsaker, lpeer, mavital, michal.skrivanek, mkalinin, ofrenkel, sherold, vfeenstr, yeylon |
| Target Milestone: | --- | Keywords: | UseCase |
| Target Release: | 3.5.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | virt | ||
| Fixed In Version: | vt2.2 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-11 21:11:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1142923, 1156165 | ||
|
Description
Jake Hunsaker
2014-06-04 14:48:23 UTC
taking the bug One confirmed issue is VDSM can go out of sync if it is restarted, or down for whatever reason, when migrations completes. The events sequence is: - migration is in progress - VDSM goes down - migration completes -> the VM is UP on the dst host according to libvirt! - VDSM returns up, does recovery and possibly does not properly recognize what happened in the meangime In that case VDSM will diligently wait for the full migration timeout to expire before to report the VM as UP; the default value for the timeout is 21600s, so 6h. I'll make a patch to make sure VDSM handles this case correctly. posted tentative patch. Needs careful testing, in progress. Jake, After deeper investigation I think I narrowed down the issue, and your last report confirms that this is also a matter of a specific -and unfortunate- sequence of events. The logs are no longer required, thanks. easier way to reproduce and test: - start migration; - stop VDSM on dst host; migration will continue to run as soon as libvirt and qemu are up and running - once migration is done, restart VDSM on dst host - now the VM should be in unknown state for the said 6 hours despite being actually up and running. @Michal, this bug doesn't have a DEV ack yet. QE will acked/nacked based on the the target release and time frames, in the regular Bugzilla workflow. @Gil, the question is more about 3.5 vs 3.4 vs 3.3 considerations. missing dev_ack is due to me not agreeing with backports to 3.3. nor 3.4. I'm fine with 3.5 fix (adding back original needinfo on dave) Patches merged to ovirt 3.5 (see http://gerrit.ovirt.org/#/c/31671/ and its deps), will be included in to the next RC, moving to MODIFIED Verified on rhevm-3.5.0-0.10.master.el6ev.noarch Just instead of stop vdsm I stopped network(because Soft Fencing), migration failed and vm stay on the source host. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0159.html |