Bug 961200

Summary: guest is going to shutdown instead before libvirt-guests can suspend it
Product: Red Hat Enterprise Linux 7 Reporter: Sibiao Luo <sluo>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, chayang, flang, hhuang, jdenemar, juzhang, lnykryn, michen, mrezanin, pbonzini, qzhang, sluo, systemd-maint-list, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-09 16:22:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sibiao Luo 2013-05-09 05:54:58 UTC
Description of problem:
the libvirt-guests (under /etc/sysconfig/)service is currently enable default to suspend guests to disk when host shutdown, but the VM fail to be resumed correctly in rhle7 host. Btw, i tested the rhel6 host has no such issue.

Version-Release number of selected component (if applicable):
host info:
kernel-3.9.0-0.55.el7.x86_64
qemu-kvm-1.4.0-4.el7.x86_64
guest info:
kernel-3.9.0-0.55.el7.x86_64

# cat /etc/sysconfig/libvirt-guests 
...
# action taken on host shutdown
# - suspend   all running guests are suspended using virsh managedsave
# - shutdown  all running guests are asked to shutdown. Please be careful with
#             this settings since there is no way to distinguish between a
#             guest which is stuck or ignores shutdown requests and a guest
#             which just needs a long time to shutdown. When setting
#             ON_SHUTDOWN=shutdown, you must also set SHUTDOWN_TIMEOUT to a
#             value suitable for your guests.
#ON_SHUTDOWN=suspend
...

How reproducible:
100%

Steps to Reproduce:
1.configure libvirt-guests (under /etc/sysconfig/)service to suspend guests to disk on host shutdown.(this is currently the default)
2.start a virtual machine by libvirt.
3.reboot host while VM is running.
4.check whether VM would be resumed when host reboot up.
  
Actual results:
after step 4, the VM fail to resumed.

Expected results:
the VM could be resumed correctly.

Additional info:
# cat /var/log/libvirt/qemu/name.log 
2013-05-09 05:28:45.792+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name name -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 1024 -smp 2,sockets=2,cores=1,threads=1 -uuid b6378a0a-43a2-3036-1f22-63d468c2836c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/name.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/RHEL-Server-7.0-64-virtio.raw,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:48:5a:16,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
char device redirected to /dev/pts/6 (label charserial0)
qemu: terminating on signal 15 from pid 1

Comment 1 Sibiao Luo 2013-05-09 06:01:39 UTC
(In reply to comment #0)
> Steps to Reproduce:
> 1.configure libvirt-guests (under /etc/sysconfig/)service to suspend guests
> to disk on host shutdown.(this is currently the default)
> 2.start a virtual machine by libvirt.
we need to chmod 666 to the /dev/kvm in rhel7 host(bug 911644) before run virt-manager tools.
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 101, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 123, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1174, in startup
    self._backend.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 692, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: internal error process exited while connecting to monitor: char device redirected to /dev/pts/1 (label charserial0)
Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied
# ls -l /dev/kvm
crw-------. 1 root root 10, 232 May  9 13:36 /dev/kvm
# chmod 666 /dev/kvm
# ls -l /dev/kvm
crw-rw-rw-. 1 root root 10, 232 May  9 13:36 /dev/kvm
> 3.reboot host while VM is running.
> 4.check whether VM would be resumed when host reboot up.
we can use 'px aux | grep qemu-kvm' to check the qemu-kvm process.

Comment 2 Miroslav Rezanina 2013-07-22 08:31:37 UTC
Can this be retest with latest qemu-kvm? Problem was probably related to incorrect /dev/kvm permissions that has been already fixed.

Comment 3 Sibiao Luo 2013-07-23 06:08:01 UTC
(In reply to Miroslav Rezanina from comment #2)
> Can this be retest with latest qemu-kvm? Problem was probably related to
> incorrect /dev/kvm permissions that has been already fixed.
yes, the /dev/kvm permission has been fixed correctly.
# ls -lh /dev/kvm
crw-rw-rw-. 1 root kvm 10, 232 Jul 23 13:41 /dev/kvm
And i retried it on qemu-kvm-1.5.1-2.el7.x86_64but still hit this issue. After reboot host, the VM fail to be resumed when host reboot up, the log like as following:
# cat /var/log/libvirt/qemu/bug961200.log
2013-07-23 05:39:28.606+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name bug961200 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 0af39310-75f6-cd43-caa5-d5d13be54c3b -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/bug961200.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=c,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/home/RHEL-7.0-20130628.0-Server-x86_64.qcow3,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:2a:f4:fc,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
char device redirected to /dev/pts/3 (label charserial0)
((null):1357): SpiceWorker-Warning **: red_worker.c:11477:dev_destroy_primary_surface: double destroy of primary surface
((null):1357): SpiceWorker-Warning **: red_worker.c:9663:red_create_surface: condition `surface->context.canvas' reached
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 0.170000 ms, bitrate 37925925925 bps (36168.981481 Mbps)
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
red_channel_client_disconnect: 0x7fe9f183f7e0 (channel 0x7fe9f16780f0 type 3 id 0)
red_peer_receive: Connection reset by peer
red_channel_client_disconnect: 0x7fe94c249c70 (channel 0x7fe94c21f360 type 2 id 0)
red_channel_client_disconnect: 0x7fe9f18399a0 (channel 0x7fe9f166d250 type 1 id 0)
main_channel_client_on_disconnect: rcc=0x7fe9f18399a0
red_client_destroy: destroy client with #channels 4
red_dispatcher_disconnect_cursor_peer: 
red_channel_client_disconnect: 0x7fe94c2a6c90 (channel 0x7fe94c21f920 type 4 id 0)
red_channel_client_disconnect: 0x7fe94c2a6c90 (channel 0x7fe94c21f920 type 4 id 0)
red_channel_client_disconnect: 0x7fe9f183f7e0 (channel 0x7fe9f16780f0 type 3 id 0)
red_channel_client_disconnect: 0x7fe9f183f7e0 (channel 0x7fe9f16780f0 type 3 id 0)
red_dispatcher_disconnect_display_peer: 
red_channel_client_disconnect: 0x7fe94c249c70 (channel 0x7fe94c21f360 type 2 id 0)
red_channel_client_disconnect: 0x7fe94c249c70 (channel 0x7fe94c21f360 type 2 id 0)
red_channel_client_disconnect: 0x7fe9f18399a0 (channel 0x7fe9f166d250 type 1 id 0)
red_channel_client_disconnect: 0x7fe9f18399a0 (channel 0x7fe9f166d250 type 1 id 0)
qemu: terminating on signal 15 from pid 1

Best Regards,
sluo

Comment 4 Sibiao Luo 2013-07-23 06:10:50 UTC
Additional info:
host info:
3.10.0-0.rc7.64.el7.x86_64
qemu-kvm-1.5.1-2.el7.x86_64
libvirt-1.1.0-2.el7.x86_64
virt-manager-0.10.0-1.el7.noarch
seabios-1.7.2.2-2.el7.x86_64
seabios-bin-1.7.2.2-2.el7.noarch

guest info:
3.10.0-0.rc7.64.el7.x86_64

Comment 5 Miroslav Rezanina 2013-12-05 08:27:09 UTC
Problem is that guest is shutdown before libvirt-guests can suspend it so it is
not restored on boot.

Comment 6 Sibiao Luo 2013-12-05 08:56:58 UTC
(In reply to Miroslav Rezanina from comment #5)
> Problem is that guest is shutdown before libvirt-guests can suspend it so it
> is
> not restored on boot.
Maybe just this caused that fail to suspended on host, IRRC that rhel6 host no such issue, so I modify the bug title more clearly, and could you help fix it as we have test plan/case to cover it, thanks.

Best Regards,
sluo

Comment 7 Miroslav Rezanina 2013-12-05 10:50:05 UTC
Ok, it's systemd who kills qemu process directly (see last line of log in #3). As this happens before libvirt-guests stops so guest can't be suspend. 

I do not know why is guest killed so soon. Reassign to systemd to investigate/solve issue.

Comment 8 Lukáš Nykrýn 2013-12-05 13:22:44 UTC
Can you please attach here unit files which are involved in this case (or at least their name)?

Comment 9 Paolo Bonzini 2013-12-09 15:02:38 UTC
The involved unit files are libvirt-guests.service and libvirtd.service.

I think the problem is too small TimeoutStopSec= in libvirt-guests.service.

Would it be enough to make it larger in libvirt-guests.service, or do we need a larger timeout on libvirtd.service too? libvirtd.service has a "Before" dependency on libvirt-guests.service.

Comment 10 Jiri Denemark 2013-12-09 16:22:25 UTC

*** This bug has been marked as a duplicate of bug 1032695 ***