Bug 1594793
| Summary: | Unexpected behaviour of HA VM when host VM was running ended up Non-responsive. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
| Component: | vdsm | Assignee: | Milan Zamazal <mzamazal> |
| Status: | CLOSED ERRATA | QA Contact: | Israel Pinto <ipinto> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.2.3 | CC: | amarchuk, dfediuck, lsurette, mgoldboi, michal.skrivanek, mzamazal, rabraham, rbalakri, Rhev-m-bugs, rhodain, srevivo, ycui, ylavi |
| Target Milestone: | ovirt-4.2.4-1 | Keywords: | ZStream |
| Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1593568 | Environment: | |
| Last Closed: | 2018-07-02 18:58:51 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1593568 | ||
| Bug Blocks: | |||
|
Description
RHV bug bot
2018-06-25 12:31:54 UTC
please take a look at rh04 log 2018-06-18 11:57:13,281+1000 Seems the incoming migration VM disk has no "source" attribute at all in _srcDomXML, though there is one in regular params with empty file='' (Originally by michal.skrivanek) rh01: 09:45:03 VDSM restart apparently not recovered correctly (failed due to same issue as in comment #5) 09:48:40 shutdown, but VM possibly not correctly undefined start on 09:49:11 fails with VM machine already exists 09:49:21 again a destroy attempt 10:55:05,220 hmm? - INFO (periodic/1) [vds] Recovered new external domain" 11:49:16 VDSM restart 11:49:18 VM recovered and detected as "Changed state to Down: VM terminated with error (code=1)" 11:52:28 start VM succeeds 11:52:43 VM guest reboot rh04 09:49:22 VM start succeeded 10:33:27 VM guest reboot 10:37:15 VM guest reboot 11:44:57 shutdown, succeeded 13:10:59 VM start 13:40:25 shutdown engine.log starts a day later at 2018-06-19 03:37:01 Please attach a correct log. (Originally by michal.skrivanek) please also attach an earlier log from rh01 capturing the VM start prior to 2018-06-18 07:01:01, same for engine.log (Originally by michal.skrivanek) (In reply to Michal Skrivanek from comment #5) > please take a look at rh04 log 2018-06-18 11:57:13,281+1000 > Seems the incoming migration VM disk has no "source" attribute at all in > _srcDomXML, though there is one in regular params with empty file='' @mzamazal: the one on rh01 at 09:45:03 is the first occurrence, perhaps rather look there worth noting there were a lot of snapshot manipulations in previous days (live storage merge) (Originally by michal.skrivanek) Recovery fails due to missing `file' attribute. The failed recovery means the VM startup domain wrapper is never replaced with a running VM wrapper and most libvirt operations are rejected, while the VM is running. I can't inspect why `file' attribute is missing until Vdsm logs since the VM start are provided. (Originally by Milan Zamazal) Actually <source> element of the CD-ROM drive is missing. This happens after CD-ROM ejection and is not handled in Vdsm. I'm looking for a fix. (Originally by Milan Zamazal) Verify with: Engine: Software Version:4.2.4.5-0.1 (rhv-release-4.2.4-7-001.noarch) Hosts: OS Version:RHEL - 7.5 - 8.el7 Kernel Version:3.10.0 - 862.6.3.el7.x86_64 KVM Version:2.10.0 - 21.el7_5.4 LIBVIRT Version:libvirt-3.9.0-14.el7_5.6 VDSM Version:vdsm-4.20.32-1.el7ev Steps: 1. Create 2 HA VMs, attach CD to each VM: VM_1 is with lease and resume behavior "KILL" VM_2 is without lease and resume behavior "KILL" Both VM running with ISCSI disk Did not test with NFS since: https://bugzilla.redhat.com/show_bug.cgi?id=1481022 2. Start VMs on Host_1 and eject CD 3. Block connection to ISCSI storage with iptables on Host_1 4. Both VMs switch to pause 5. VMs started on Host_2 6. Check on Host_1 that no running VM # virsh -r list --all Id Name State ---------------------------------------------------- Results: PASS Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2118 sync2jira sync2jira The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |