Bug 1357468
Summary: | Cross migration fails with error qemu-kvm: ... 'pci@800000020000000:00.0/ohci' | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Zheng <dzheng> |
Component: | libvirt | Assignee: | Andrea Bolognani <abologna> |
Status: | CLOSED CANTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | Jiri Herrmann <jherrman> |
Priority: | unspecified | ||
Version: | 7.3 | CC: | abologna, bugproxy, dgilbert, dyuan, dzheng, fjin, gsun, hannsj_uhl, jdenemar, jsuchane, mdeng, michal.skrivanek, mkolaja, mzhan, qzhang, rbalakri, zpeng |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | 7.3 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Migration of certain guests from Red Hat Enterprise Linux 7.2 to 7.3 hosts is not possible
Prior to this update, the PCI address of any USB controller that did not have an explicitly specified `model` value was ignored on IBM Power guest virtual machines. This bug has been fixed, but as a consequence of the fix, it is not possible to perform a live migration of guests that use the described USB controllers from a Red Hat Enterprise Linux 7.2 host to a Red Hat Enterprise Linux 7.3 host, due to the different PCI addresses of the USB controller.
To work around this problem, edit the guest XML file and add a `model` attribute with the `pci-ohci` value to the USB <controller> element, for example as follows:
<controller type='usb' model='pci-ohci' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</controller>
Afterwards, shut down the guest and start it again for the changes to take effect. As a result, the guest can be migrated from Red Hat Enterprise Linux 7.2 to 7.3.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-07 15:57:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1230910, 1287890, 1289202, 1299681, 1359843, 1362179, 1369086 | ||
Attachments: |
Adding Jirka to CC. Fix proposed upstream: https://www.redhat.com/archives/libvir-list/2016-July/msg00727.html Michale, can you please comment on how RHEV uses USB controller model? Is is it specified in the guest configuration? Left blank? Can you please estimate what could be impact of this bz on RHEV? Thanks. (In reply to Jaroslav Suchanek from comment #4) > Michale, can you please comment on how RHEV uses USB controller model? Is is > it specified in the guest configuration? Left blank? Can you please estimate > what could be impact of this bz on RHEV? Thanks. we let libvirt to pick a model So, just to confirm, the XML for a RHEV guest looks like <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> with no 'model' attribute for the <controller> element, right? right After discussing this upstream[1], we have concluded that making live migration between RHEL 7.2 and 7.3 work when the guest is using the default USB controller is not possible without reintroducing a broken behavior. Closing as CANTFIX. As a workaround, it is possible to edit the guest XML and add a 'model' attribute with value 'pci-ohci' to the relevant <controller> element, so that it looks like <controller type='usb' model='pci-ohci' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> The PCI address might of course be different. After a full power cycle (shutdown followed by start), the guest will be live-migratable to RHEL 7.3. Please note that this will cause a change in guest ABI, because the USB controller will now be at eg. PCI address 00:03.0 instead of 00:00.0. [1] https://www.redhat.com/archives/libvir-list/2016-July/msg00727.html Andrea: Wouldn't applying that fix downstream only solve the problem? *** Bug 1368972 has been marked as a duplicate of this bug. *** (In reply to Dr. David Alan Gilbert from comment #9) > Andrea: Wouldn't applying that fix downstream only solve the problem? It would. But 1) we would have to keep that patch around forever, because reverting it at any time would again break migration, and it's clear that getting it upstream is just not happening 2) most importantly, doing so would *undo* the fix that allows users to pick whatever PCI address they want for the USB controller: once again, the address would be ignored by libvirt (In reply to Andrea Bolognani from comment #11) > (In reply to Dr. David Alan Gilbert from comment #9) > > Andrea: Wouldn't applying that fix downstream only solve the problem? > > It would. But > > 1) we would have to keep that patch around forever, because > reverting it at any time would again break migration, and > it's clear that getting it upstream is just not happening Yep, we've got a bunch of those in qemu - it's not pretty, but it does mean that the users can keep their migration compatibility. We can drop old ones when we no longer support migration from old VMs. > 2) most importantly, doing so would *undo* the fix that > allows users to pick whatever PCI address they want for > the USB controller: once again, the address would be > ignored by libvirt That is the nice thing about machine types at the qemu level; we can tie broken things to machine type versions and say that something is only fixed on new machine types. > > (In reply to Dr. David Alan Gilbert from comment #9) > > > Andrea: Wouldn't applying that fix downstream only solve the problem? > > > > It would. But > > > > 1) we would have to keep that patch around forever, because > > reverting it at any time would again break migration, and > > it's clear that getting it upstream is just not happening > > Yep, we've got a bunch of those in qemu - it's not pretty, but it does mean > that the users can keep their migration compatibility. We can drop old ones > when we no longer support migration from old VMs. > > > 2) most importantly, doing so would *undo* the fix that > > allows users to pick whatever PCI address they want for > > the USB controller: once again, the address would be > > ignored by libvirt > > That is the nice thing about machine types at the qemu level; we can tie > broken things to machine type versions and say that something is only fixed > on new machine types. We don't usually key stuff off machine type versions in upstream libvirt, because it's simply impossible to implement in a way that works reliably for both upstream *and* downstream versioned machine types. That said, I think it's okay to perform such a check in a downstream-only patch. I've posted an implementation of the approach you suggested, that doesn't involve reverting the fix for Bug 1297020, for downstream review. Test below 6 scenarios. and all PASS. No product bug is found. The 7.3->7.2 scenarios will be updated when done later. 1. 7.2->7.3, ppc64le, guest os 7.2, without model 2. 7.2->7.3, ppc64, guest os 7.2, without model 3. 7.2->7.3, ppc64, guest os 6.8, without model 4. 7.2->7.3, ppc64le, guest os 7.2, with model 5. 7.2->7.3, ppc64, guest os 7.2, with model 6. 7.2->7.3, ppc64, guest os 6.8, with model Details: Case 1: 7.2->7.3, ppc64le, guest os 7.2, without model "21:36:16 (1/12) virsh.migrate_vm.positive_testing.live_migration.pause_vm Result: PASS 74.41 s 21:37:31 (2/12) virsh.migrate_vm.positive_testing.live_migration.cpuset Result: PASS 100.50 s 21:39:13 (3/12) virsh.migrate_vm.positive_testing.live_migration.with_hugepages Result: FAIL 119.79 s 21:41:13 (4/12) virsh.migrate_vm.positive_testing.p2p_migration.listen_address.with_tcp Result: PASS 75.83 s 21:42:30 (5/12) virsh.migrate_vm.positive_testing.migration_with_ipv6.with_tls Result: PASS 102.56 s 21:44:13 (6/12) virsh.migrate_vm.positive_testing.migration_with_devices.attach_virtual_nic Result: PASS 76.87 s 21:45:30 (7/12) virsh.migrate_vm.positive_testing.cross_rhel_platform_migration.with_io_throttling.total_bytes_sec Result: PASS 73.22 s 21:46:44 (8/12) virsh.migrate_vm.positive_testing.live_storage_migration.backing_file_with_copy_storage_inc Result: SKIP 68.16 s 21:47:53 (9/12) virsh.migrate_vm.negative_testing.live_migration.noexist_xml Result: PASS 60.88 s 21:48:55 (10/12) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 72.91 s 21:50:09 (11/12) virsh.migrate_vm.negative_testing.p2p_migration.unreachable_destenation.with_tcp Result: PASS 187.12 s 21:53:16 (12/12) virsh.migrate_vm.negative_testing.p2p_migration.invalid_listen_address.with_ssh Result: PASS 65.86 s Case 2: 7.2->7.3, ppc64, 7.2, without model "22:33:03 (1/7) virsh.migrate_vm.positive_testing.p2p_migration.basic.with_tls Result: PASS 100.06 s 22:34:44 (2/7) virsh.migrate_vm.positive_testing.cross_rhel_platform_migration.with_watchdog.i6300esb Result: PASS 98.84 s 22:36:23 (3/7) virsh.migrate_vm.negative_testing.live_migration.stop_libvirtd_remotely Result: PASS 66.73 s 22:37:31 (4/7) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 70.38 s 22:38:42 (5/7) virsh.migrate_vm.negative_testing.live_migration.cancel_migration Result: PASS 65.67 s 22:39:48 (6/7) virsh.migrate_vm.negative_testing.p2p_migration.invalid_listen_address.with_tls Result: PASS 54.40 s 22:40:43 (7/7) virsh.migrate_vm.negative_testing.rdma_migration.no_rdma_env_rdma_pin_all Result: PASS 43.59 s " Case 3: 7.2->7.3, ppc64, 6.8, without model "00:59:12 (1/6) virsh.migrate_vm.positive_testing.live_migration.track_statistics Result: FAIL 84.03 s 01:00:37 (2/6) virsh.migrate_vm.positive_testing.p2p_migration.with_keepalive_protocol.default_conf_less_than_keepalive_time Result: PASS 661.27 s 01:11:39 (3/6) virsh.migrate_vm.positive_testing.tunnelled_migration.basic.with_ssh Result: PASS 65.04 s 01:12:45 (4/6) virsh.migrate_vm.positive_testing.tunnelled_migration.basic.with_tls Result: PASS 86.59 s 01:14:12 (5/6) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 67.75 s 01:15:20 (6/6) virsh.migrate_vm.negative_testing.live_storage_migration.no_create_target_image.simple Result: PASS 37.20 s " Case 4: 7.2->7.3, ppc64le, 7.2, with model "01:30:16 (1/5) virsh.migrate_vm.positive_testing.live_migration.listen_address Result: PASS 72.19 s 01:31:29 (2/5) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 70.76 s 01:32:41 (3/5) virsh.migrate_vm.negative_testing.live_migration.restart_local_libvirtd Result: PASS 199.96 s 01:36:01 (4/5) virsh.migrate_vm.negative_testing.p2p_migration.unreachable_destenation.with_ssh Result: PASS 184.35 s 01:39:07 (5/5) virsh.migrate_vm.negative_testing.live_storage_migration.mutually_exclusive_options Result: PASS 49.03 s " Case 5: 7.2->7.3, ppc64, 7.2, with model "02:13:21 (1/4) virsh.migrate_vm.positive_testing.live_migration.timeout Result: PASS 79.19 s 02:14:40 (2/4) virsh.migrate_vm.negative_testing.live_migration.unprivileged_user Result: PASS 60.97 s 02:15:42 (3/4) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 70.87 s 02:16:54 (4/4) virsh.migrate_vm.negative_testing.rdma_migration.no_rdma_env_turn_off_rdma_pin_all Result: PASS 42.04 s " Case 6: 7.2->7.3, ppc64, 6.8, with model "02:25:48 (1/5) virsh.migrate_vm.positive_testing.live_migration.reboot_vm Result: PASS 98.78 s 02:27:27 (2/5) virsh.migrate_vm.positive_testing.p2p_migration.listen_address.with_ssh Result: PASS 64.10 s 02:28:32 (3/5) virsh.migrate_vm.positive_testing.tunnelled_migration.basic.with_tcp Result: PASS 65.57 s 02:29:39 (4/5) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 63.54 s 02:30:43 (5/5) virsh.migrate_vm.negative_testing.tunnelled_migration.restart_local_libvirtd Result: PASS 164.42 s " The tests in comment 15 and comment 16 are using below environment. [7.2.z] OS tree: RHEL7.2-20151030.0 + updated with z repo kernel-3.10.0-327.el7.ppc64le qemu-kvm-rhev-2.3.0-31.el7_2.21.ppc64le SLOF-20150313-5.gitc89b0df.el7.noarch libvirt-1.2.17-13.el7_2.5.ppc64le [7.3] Host 2:(RHEL7.3) OS: RHEL-7.3-20160901.1 kernel-3.10.0-495.el7.ppc64le qemu-kvm-rhev-2.6.0-22.el7.ppc64le SLOF-20160223-6.gitdbbfda4.el7.noarch libvirt-2.0.0-6.el7.abologna.bz1357468.ppc64le Test left 6 scenarios. 7. 7.3->7.2, ppc64le, guest os 7.2, without model FAIL 8. 7.3->7.2, ppc64, guest os 7.2, without model FAIL 9. 7.3->7.2, ppc64, guest os 6.8, without model FAIL 10. 7.3->7.2, ppc64le, guest os 7.2, with model PASS 11. 7.3->7.2, ppc64, guest os 7.2, with model PASS 12. 7.3->7.2, ppc64, guest os 6.8, with model PASS Details: Case 7 ~9: Fail Using usb controller without model setting, the VM can not start up. # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: process exited while connecting to monitor: 2016-09-07T07:15:42.174828Z qemu-kvm: -device usb-kbd,id=input0,bus=usb.0,port=1: Bus 'usb.0' not found See the guest XML in attachment. The vm migrated from 7.2 to 7.3 is using '-usb' in qemu process cmd line. # ps -ef|grep qemu -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 ***-usb***-drive file=/usr/share/avo Case 10: 7.3->7.2, ppc64le, guest os 7.2, with model 02:20:33 (1/4) virsh.migrate_vm.positive_testing.p2p_migration.basic.with_ssh Result: PASS 39.01 s 02:21:14 (2/4) virsh.migrate_vm.positive_testing.cross_rhel_platform_migration.with_io_throttling.read_bytes_sec Result: PASS 38.62 s 02:21:54 (3/4) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 42.37 s 02:22:38 (4/4) virsh.migrate_vm.negative_testing.live_storage_migration.no_create_target_image.basic Result: PASS 108.01 s Case 11: 7.3->7.2, ppc64, guest os 7.2, with model 02:38:11 (1/4) virsh.migrate_vm.positive_testing.p2p_migration.listen_address.with_tls Result: PASS 62.68 s 02:39:16 (2/4) virsh.migrate_vm.positive_testing.cross_rhel_platform_migration.with_io_throttling.total_iops_sec Result: PASS 38.92 s 02:39:56 (3/4) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 41.76 s 02:40:39 (4/4) virsh.migrate_vm.negative_testing.rdma_migration.no_rdma_env Result: PASS 14.34 s Case 12: 7.3->7.2, ppc64, guest os 6.8, with model 02:53:14 (1/3) virsh.migrate_vm.positive_testing.live_migration.iscsi.ipv6 Result: PASS 40.22 s 02:53:55 (2/3) virsh.migrate_vm.positive_testing.migration_with_devices.attach_virtual_disk Result: PASS 117.70 s 02:55:54 (3/3) virsh.migrate_vm.negative_testing.live_migration.abort_job Result: PASS 52.06 s Added for failure of Case 7 ~9. The VM can not start on RHEL 7.3 machine without USB controller model configured. See attachment for guest xml. Created attachment 1198542 [details] guest can not start without usb controller model in libvirt-2.0.0-6.el7.abologna.bz1357468.ppc64le The proposed approach didn't survive a round of testing, so I'm moving the bug back to CLOSED CANTFIX. |
Created attachment 1181007 [details] qemu command line log for local host (above star line) and remote host (under star line) Description of problem: It is to migrate PPC64LE OS RHEL7 guest from 7.2 to 7.3, but fails with qemu error. Version-Release number of selected component (if applicable): Host 1: [RHEL7.2.z] OS tree: RHEL7.2-20151030.0 kernel-3.10.0-327.el7.ppc64le qemu-kvm-rhev-2.3.0-31.el7_2.18.ppc64le SLOF-20150313-5.gitc89b0df.el7.noarch libvirt-1.2.17-13.el7_2.5.ppc64le Host 2:(RHEL7.3) OS: RHEL-7.3-20160707.2 kernel-3.10.0-461.el7.ppc64le qemu-kvm-rhev-2.6.0-11.el7.ppc64le SLOF-20160223-4.gitdbbfda4.el7.noarch libvirt-2.0.0-1.el7.ppc64le How reproducible: 100% Steps to Reproduce: 1. Setup NFS and start guest 2. virsh migrate avocado-vt-vm1 --live --verbose --unsafe qemu+ssh://10.19.112.45:22/system root.112.45's password: Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2016-07-18T09:34:03.327713Z qemu-kvm: Unknown savevm section or instance 'pci@800000020000000:00.0/ohci' 0 2016-07-18T09:34:03.328136Z qemu-kvm: load of migration failed: Invalid argument Actual results: See above. Guest is still running on local host. No host is migrated to remote host. Expected results: Migration ok. Additional info: qemu command line on local host after starting guest: qemu command line on remote host after migration: See attachment qemu_command_line.log