| Summary: | Guest will hang up when restore from a saved img by using virt-manager | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | zhe peng <zpeng> | ||||||||||
| Component: | kvm | Assignee: | Amit Shah <amit.shah> | ||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 5.8 | CC: | asias, bgollahe, bsarathy, chayang, dallan, dyuan, juzhang, jwu, michen, mkenneth, mzhan, rhod, rwu, virt-maint | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2012-07-23 11:13:21 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 807971 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
zhe peng
2012-02-07 07:58:03 UTC
Created attachment 559862 [details]
libvirt.log
Created attachment 559863 [details]
virt-manager log
I'm assuming this is a libvirt issue, since virt-manager isn't doing much except calling libvirt APIs. Reassigning Please verify that you can reproduce using 'virsh save' and 'virsh restore' (or managed-save and start, if that's what virt-manager supports. if virt-manager is asking you for a path to save the file, you need to use 'virsh save') I can reproduce this issue using 'virsh save' and 'virsh restore' step: 1:virsh start rhel5.8rc3 2:login guest,run command 'modprobe acpiphp' 3:in host run #virsh attach-disk rhel5.8rc3 /dev/sda2 --target vdb --driver qemu Disk attached successfully login guest ,the vdb hotplug successful 4:#virsh save rhel5.8rc3 /tmp/rhel5.save Domain rhel5.8rc3 saved to /tmp/rhel5.save 5:#virsh restore /tmp/rhel5.save Domain restored form /tmp/rhel5.save 6:#virt-viewer rhel5.8rc3 in guest ,the mouse and keyboard not worked. Created attachment 560132 [details]
/var/log/message
Created attachment 560133 [details]
guest xml file
Is the guest actually responsive, or is the guest OS hung? I thought the guest OS is hung, i can't ssh to guest after restored,before save,ssh can worked well. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. I couldn't reproduce the problem on my test machine. I will try to reproduce it on reporter's machine. The problem could be reproduced 100 percent on this situation: 1, guest is RHEL5u8 release 2, host is RHEL5u8 release 3, Before runing "virsh save", we attach a hotpluggable virtio disk into the guest. (For rhel5 guest ,it is necessary to first insert acpiphp kernel module by using modprobe acpiphp in the guest for successful attaching) 4, save guest into a file. 5, run virsh restore <file.save> to restore this guest. After the above 5 steps, the guest is hung up forever. The problem couldn't be encountered on the any of following situations: The guest is RHEL6 or above, The guest is RHEL5u8 without attached disk before saving. The host is RHEL6 or above rather than RHEL5u8 During the process of vm restore, libvirt first forks a subprocess to run qemu-kvm command lines with the stdin set to the file descriptor of that opened file.save file. Then, the parent process are going to do other stuff, one of these things the parent process did is to set balloon memory via qemu monitor. In the case of bug, after libvirt send 'balloon 1024' to qemu monitor, the guest will forever hang up, even though libvirt sends 'cont' to start the qemu process later, the vm just hangs up. If we add usleep() to sleep the parent process for a while just before setting the balloon memory, the restore will success. So I think this is a bug of qemu-kvm on RHEL5u8. Probably a race happened. I used the newest kvm RHEL5 version: kvm-83-254.el5 on brew, the issue still exists. How to reproduce by hand using qemu command line. 1. start a guest by using the following command line. /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name rhel5u8 -monitor unix:/var/lib/libvirt/qemu/rhel5u8.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/var/lib/libvirt/images/rhel5u8.img,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio 2, run 'modprobe acpiphp' in the rhel5u8 guest. Then, hotplug a virtio disk via qemu monitor console pci_add pci_addr=auto storage file=/var/lib/libvirt/images/attachdisk,if=virtio 3, save the guest into a file via qemu monitor console migrate exec:cat>/tmp/rhel5u8.save.hand 4, kill the previous qemu process, start a new qemu process with the following command line. /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name rhel5u8 -monitor unix:/var/lib/libvirt/qemu/rhel5u8.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/var/lib/libvirt/images/rhel5u8.img,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -drive file=/var/lib/libvirt/images/attachdisk,if=virtio,format=raw -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -incoming "exec:cat /tmp/rhel5u8.save.hand" -balloon virtio 5, Quickly connect the qemu mointor console, and run "balloon 1024" nc -U /var/lib/libvirt/qemu/rhel5u8.monitor (qemu) balloon 1024 Then the guest will hang up foreverl, even though we send 'cont' to qemu later it doesn't work anymore. This problems doesn't exist for RHEL6 kvm version. According to the reproducing procedure described via qemu-kvm command lineon comments 12, It seems like a bug on qemu-kvm that maybe fixed on RHEL6. So I change component to qemu-kvm for help. Tested more on kvm-83-256.el5, 2.6.18-322.el5. Here is what I get: 1. with '-S', '-balloon virtio' in cli, can reproduce this issue with the same steps in #12. If don't send balloon but cont to monitor, not reproducible. 2. with '-S', '-balloon none' in cli, reproducible with the same steps in #12 3. without '-S', with '-balloon virtio' in cli, only get some call trace in guest while issuing balloon to monitor 4. without '-S', with '-balloon virtio' in cli, guest works fine if don't issue balloon to monitor 5. without '-S', with '-balloon none' in cli, guest restores fine with/without issuing balloon Anyway, after doing hot plug/unplug, then migrate guest, (monitor)info pci in src and dst tells difference. Also from bz652146#c4, "In RHEL5.x you can't do hot-plug/unplug and then migrate. It is known not to work." SRC host:
--------
(qemu) pci_add pci_addr=auto storage file=/root/test.img,if=virtio
pci_add pci_addr=auto storage file=/root/test.img,if=virtio
OK domain 0, bus 0, slot 5, function 0
(qemu) info pci
info pci
Bus 0, device 0, function 0:
Host bridge: PCI device 8086:1237
Bus 0, device 1, function 0:
ISA bridge: PCI device 8086:7000
Bus 0, device 1, function 1:
IDE controller: PCI device 8086:7010
BAR4: I/O at 0xc000 [0xc00f].
Bus 0, device 1, function 2:
USB controller: PCI device 8086:7020
IRQ 11.
BAR4: I/O at 0xc020 [0xc03f].
Bus 0, device 1, function 3:
Bridge: PCI device 8086:7113
IRQ 9.
Bus 0, device 2, function 0:
VGA controller: PCI device 1013:00b8
BAR0: 32 bit memory at 0xc2000000 [0xc3ffffff].
BAR1: 32 bit memory at 0xc4000000 [0xc4000fff].
Bus 0, device 3, function 0:
Ethernet controller: PCI device 10ec:8139
IRQ 11.
BAR0: I/O at 0xc100 [0xc1ff].
BAR1: 32 bit memory at 0xc4001000 [0xc40010ff].
Bus 0, device 4, function 0:
RAM controller: PCI device 1af4:1002
IRQ 11.
BAR0: I/O at 0xc200 [0xc21f].
Bus 0, device 5, function 0:
SCSI controller: PCI device 1af4:1001
IRQ 0.
BAR0: I/O at 0x1000 [0x103f].
DST host:
--------
(qemu) info pci
info pci
Bus 0, device 0, function 0:
Host bridge: PCI device 8086:1237
Bus 0, device 1, function 0:
ISA bridge: PCI device 8086:7000
Bus 0, device 1, function 1:
IDE controller: PCI device 8086:7010
BAR4: I/O at 0xc000 [0xc00f].
Bus 0, device 1, function 2:
USB controller: PCI device 8086:7020
IRQ 11.
BAR4: I/O at 0xc020 [0xc03f].
Bus 0, device 1, function 3:
Bridge: PCI device 8086:7113
IRQ 9.
Bus 0, device 2, function 0:
VGA controller: PCI device 1013:00b8
BAR0: 32 bit memory at 0xc2000000 [0xc3ffffff].
BAR1: 32 bit memory at 0xc4000000 [0xc4000fff].
Bus 0, device 3, function 0:
Ethernet controller: PCI device 10ec:8139
IRQ 11.
BAR0: I/O at 0xc100 [0xc1ff].
BAR1: 32 bit memory at 0xc4001000 [0xc40010ff].
Bus 0, device 4, function 0:
SCSI controller: PCI device 1af4:1001
IRQ 0.
BAR0: I/O at 0x1000 [0x103f].
Bus 0, device 5, function 0:
RAM controller: PCI device 1af4:1002
IRQ 11.
BAR0: I/O at 0xc200 [0xc21f].
Please paste guest kernel output from serial console when the guest hangs up. Are there any panic/oops messages? From comment 14: what is the call trace you get in the guest? (In reply to comment #16) > Please paste guest kernel output from serial console when the guest hangs up. > I don't see any output of serial port when guest hangs up. > Are there any panic/oops messages? > > From comment 14: what is the call trace you get in the guest? irq 10: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff800be9d2>] __report_bad_irq+0x30/0x7d [<ffffffff800bec10>] note_interrupt+0x1f1/0x232 [<ffffffff800be110>] __do_IRQ+0x114/0x15b [<ffffffff8006d4d1>] do_IRQ+0xe9/0xf7 [<ffffffff8005d615>] ret_from_intr+0x0/0xa [<ffffffff80012519>] __do_softirq+0x51/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d646>] do_softirq+0x2c/0x7d [<ffffffff8006d4d6>] do_IRQ+0xee/0xf7 [<ffffffff8006be03>] default_idle+0x0/0x50 [<ffffffff8005d615>] ret_from_intr+0x0/0xa <EOI> [<ffffffff8006be2c>] default_idle+0x29/0x50 [<ffffffff80048f92>] cpu_idle+0x95/0xb8 [<ffffffff8046d809>] start_kernel+0x220/0x225 [<ffffffff8046d22f>] _sinittext+0x22f/0x236 handlers: [<ffffffff8819b5a0>] (cp_interrupt+0x0/0x360 [8139cp]) [<ffffffff8824515f>] (vp_interrupt+0x0/0xc1 [virtio_pci]) Disabling IRQ #10 OK - so this happens only if a disk is hotplugged before saving and restoring the guest. Saving and restoring the guest using migrate-to-file is the same as migrating the guest to a different host. This scenario isn't supposed to work, as also mentioned in 652146. The only suspicious thing coule be that RHEL6 guest works fine on RHEL5 host with disk hot-plug/unplug. Can you confirm this works fine (or not)? If yes, we may have to look at the problem differently. Please try a few times, since if it indeed is a race, it may not trigger immediately. (In reply to comment #17) > > From comment 14: what is the call trace you get in the guest? > > irq 10: nobody cared (try booting with the "irqpoll" option) > > Call Trace: > <IRQ> [<ffffffff800be9d2>] __report_bad_irq+0x30/0x7d I think what's happening is that the devices may not be on the exact same pci address before and after save/resume. This obviously will confuse the guest. Since with RHEL5 we don't have a mechanism to place devices on specific addresses, this may very well end up not being a supported workflow at all. Closing, The infrastructure change is too big for RHEL5. |