Bug 694801
Summary: | Guest fail to resume from S4 if guest using kvmclock | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Chao Yang <chayang> | ||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.1 | CC: | amit.shah, bcao, drjones, ehabkost, juzhang, michen, mkenneth, qzhang, shuang, shu, sluo, tburke, virt-maint | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-251.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 07:39:52 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 720669, 748534, 753024, 756082, 767187, 786141 | ||||||
Attachments: |
|
Description
Chao Yang
2011-04-08 13:18:35 UTC
And anther issue, suspend guest with clock=pmtmr to mem, after resuming from s4, networking is unavailable, cannot ping remote, need to reboot network to restore networking. Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Hit same issue on rhel6.1 guest with/without kvmclock on AMD host. Guest kernel: 2.6.32-128.el6.x86_64 CLI: /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot c -drive file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server In guest: # cat /sys/devices/system/clocksource/clocksource0/current_clocksource kvm-clock Steps: 1. do S4 in guest # echo disk>/sys/power/state 2. after suspend, boot again to resume from s4 Actual Result: Only got a black screen and monitor exited with: (qemu) Guest moved used index from 36574 to 0 # nc -U /home/seri �could not read byte from child: Success Since kvmclock is our default clocksource.so suggest fix in rhel6.1.mark rhel‑6.1.0 ? and blocker ? (In reply to comment #4) > Hit same issue on rhel6.1 guest with/without kvmclock on AMD host. > Guest kernel: 2.6.32-128.el6.x86_64 > CLI: > /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64 > -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection > -boot c -drive > file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none > -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev > tap,id=hostnet1 -device > virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device > usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial > unix:/home/seri,nowait,server > In guest: > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > kvm-clock > Steps: > 1. do S4 in guest > # echo disk>/sys/power/state > 2. after suspend, boot again to resume from s4 > Actual Result: > Only got a black screen and monitor exited with: > (qemu) Guest moved used index from 36574 to 0 > # nc -U /home/seri > �could not read byte from child: Success Hi,chayang Would you please retest it without virtio related devices.comment0 CML indicate that no virtio related devices.however,this comment includes virtio nic and virtio device. (In reply to comment #8) > (In reply to comment #4) > > Hit same issue on rhel6.1 guest with/without kvmclock on AMD host. > > Guest kernel: 2.6.32-128.el6.x86_64 > > CLI: > > /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 6144 -smp 4 -name rhel5.6-64 > > -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection > > -boot c -drive > > file=/root/rhel6.1-ide.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none > > -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev > > tap,id=hostnet1 -device > > virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device > > usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial > > unix:/home/seri,nowait,server > > In guest: > > # cat /sys/devices/system/clocksource/clocksource0/current_clocksource > > kvm-clock > > Steps: > > 1. do S4 in guest > > # echo disk>/sys/power/state > > 2. after suspend, boot again to resume from s4 > > Actual Result: > > Only got a black screen and monitor exited with: > > (qemu) Guest moved used index from 36574 to 0 > > # nc -U /home/seri > > �could not read byte from child: Success > > Hi,chayang > > Would you please retest it without virtio related devices.comment0 CML > indicate that no virtio related devices.however,this comment includes virtio > nic and virtio device. Hi Junyi, Dor, It's my fault that I didn't keep my eye on the cli, but I tried 3 times without virtio devices, vnc still shows a black screen when resuming from S4 with kvm-clock but didn't complains "(qemu) Guest moved used index from 36574 to 0". This image is a fresh installed one with kernel 2.6.32-130.el6.x86_64 Steps: 1. boot rhel6.1-64 image /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 4 -name rhel6.1-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot nc -drive file=/root/RHEL-Server-6.1-64.qcow2,if=none,id=drive-ide0-0-0,media=disk,format=qcow2,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server 2. connect with nc # nc -U /home/seri 3. echo disk > /sys/power/state 4. resume from S4 Actual result: guest can suspend to S4, but got a black screen when resuming from S4. # nc -U /home/seri � Host info: # rpm -qa|grep qemu qemu-kvm-debuginfo-0.12.1.2-2.156.el6.x86_64 qemu-kvm-0.12.1.2-2.156.el6.x86_64 qemu-img-0.12.1.2-2.156.el6.x86_64 gpxe-roms-qemu-0.9.7-6.7.el6.noarch qemu-kvm-tools-0.12.1.2-2.156.el6.x86_64 # uname -r 2.6.32-130.el6.x86_64 processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 1150.000 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips : 4587.42 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate ----------------S4 without kvm-clock------------------------ boot same guest with same cli except adding "-cpu cpu64-rhel6,-kvmclock" to cli and clock=pmtmr to guest kernel line, seems guest succeed to suspend and resume from S4. CLI: /usr/libexec/qemu-kvm -M rhel6.1.0 -enable-kvm -m 4096 -smp 4 -cpu cpu64-rhel6,-kvmclock -name rhel6.1-64 -uuid `uuidgen` -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -boot nc -drive file=/root/RHEL-Server-6.1-64.qcow2,if=none,id=drive-ide0-0-0,media=disk,format=qcow2,cache=none -device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:40:81:11:53 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none -serial unix:/home/seri,nowait,server Based on comment7 and comment9,reopen this issue temporary which propose to rhel6.2.any mistake,please correct. Moving to NEW, bugs assigned to virt-maint shouldn't be on ASSIGNED. I totally misunderstood this bug. The guest needs to be able to survive an S4 suspend of the HOST, but here we are doing S4 suspend of the GUEST. Not sure why that is being done, but I'm pretty sure that has never been tested before with KVM clock. There are a number of possible things that could go wrong here, we may be failing to re-register the kvm clock on boot and then end up using stale data. In all probability, this is a guest bug and not going to be fixable from the hypervisor (which can't detect that the guest is doing an S4 resume). Time to figure out how exactly suspend & resume interact with the clock code. Chances are the clock code is not designed to deal with the clock source changing address between S4 suspend and S4 resume... I just fixed some potentially related issues in bug 751742. Might as well look at this one next (after Jarod has built some official test kernels with the up to date clock code). Well, it reproduces here with the very latest RHEL6 kernel. I caught a core dump and will be extracting serial console debugging info too, to see what is going on. *** Bug 716706 has been marked as a duplicate of this bug. *** So I've been trying a few things to fix this, and turns out the guest doesn't hang on resume, it merely stalls for a while. It takes about 15 seconds on my testing for the guest to resume after the hibernation image is loaded. The guest's timekeeping does go off after resume, though. Also, a second s4 attempt freezes the resumed guest for a very long period. I gave up waiting for the guest to unstall after about 20 minutes. However, the stall vs hang thing clarifies things a lot. Amit, Marcelo said that the page that the guest & host share for the pvclock is not being synced post resume - so its a basic issue we need to first to start w/ (In reply to comment #20) > Amit, Marcelo said that the page that the guest & host share for the pvclock is > not being synced post resume - so its a basic issue we need to first to start > w/ I just saw Marcelo's patch upstream but that's not enough; I have a similar patch cooked up for a while but it still doesn't solve the issue for me completely. I'll take this discussion upstream. Created attachment 562544 [details]
rhel6-kvmclock-s4.patch
Patch(es) available on kernel-2.6.32-251.el6 Test on kernel-2.6.32-262.el6 and repeat more than 20 times with rhel6.3 32bit and 64bit with kvmclock. Guest can hibernate and resume successfully. Related packages: kernel-2.6.32-262.el6.x86_64 (for both host and guest) qemu-kvm-0.12.1.2-2.285.el6.x86_64 seabios-0.6.1.2-19.el6.x86_64 CLI: /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu Conroe -enable-kvm -uuid d782bf5c-e817-411b-a9cf-545ae7c0f101 -rtc base=localtime,driftfix=slew -m 8G -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3-64 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/home/rhel6.3-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,scsi=off -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/tmp/socket-1,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -monitor stdio -boot c -qmp tcp:0:5555,server,nowait -spice port=5930,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl -device qxl,id=video1,vram_size=67108864,bus=pci.0,addr=0x7 -device sga -device virtio-balloon-pci,id=balloon0,bus=pci.0,id=0x6 -bios /usr/share/seabios/bios-pm.bin -chardev socket,path=/tmp/qzhang-test,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 So this bug is fixed. Based on above comment 27, setting to VERIFIED. Set this issue against on kernel component since this issue is fixed in kernel part. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html |