Red Hat Bugzilla – Bug 835023
win7.64 keep on rebooting at the end of installation
Last modified: 2013-01-09 20:02:15 EST
Created attachment 594145 [details]
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
/usr/libexec/qemu-kvm -monitor stdio \
-drive file='/home/Win7-64-virtio.qcow2',if=virtio,media=disk,cache=none,boot=on,snapshot=off,format=qcow2 -net nic,vlan=0,model=virtio,macaddr='9a:a2:3b:f5:41:d5' -net tap,vlan=0,script=/home/scripts/qemu-ifup-switch -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu 'qemu64' -drive file='/home/images/win7-64-virtio-error.qcow2',media=cdrom -drive file='/home/windows/winutils.iso',media=cdrom -drive file='/home/windows/virtio-win.iso.el5',media=cdrom -soundhw ac97 -fda '/home/autotest-devel/client/tests/kvm/images/win7-64/answer.vfd' -redir tcp:5000::10023 -vnc :0 -vga std -rtc-td-hack -M rhel5.6.0 -boot c -usbdevice tablet
Created attachment 594146 [details]
keep on rebooting at this step
Is it different from
(In reply to comment #3)
> Is it different from
> https://bugzilla.redhat.com/show_bug.cgi?id=804888 ?
Looks diff, the bz804888 is file missing, but cant see anything like 'file missing' from this one.
shuang, can we have a try with ide drive?
(In reply to comment #5)
> shuang, can we have a try with ide drive?
repeat 150 times with ide, can not reproduce it.
Is it worth to test w/ the rhel6 virtio-win blk driver?
(In reply to comment #7)
> Is it worth to test w/ the rhel6 virtio-win blk driver?
It's always good to test with the latest drivers.
Please check the latest drivers. We want to use them in 5.9 anyhow.
try 25 times with raw, virtio-win-1.5.3-1.el6, didn't reproduce it.
Created attachment 602324 [details]
Created attachment 602325 [details]
Created attachment 602326 [details]
1. add werror=stop parameter
2. if you exit qemu after it enters reboot loop and start it again with the same qcow image can it boot or it enters reboot loop again?
Add werror=stop, it enters reboot loop again.
(In reply to comment #17)
> Add werror=stop, it enters reboot loop again.
make a mistake, will reproduce it with werror=stop
with werror=stop, now hit guest hanging on the 'windows error recovery' screen.
check the 'info blockstats' in monitor for many times, the outputs are constant:
(qemu) info blockstats
virtio0: rd_bytes=1334828032 wr_bytes=8491436544 rd_operations=43199 wr_operations=198296
ide0-cd0: rd_bytes=3409486336 wr_bytes=0 rd_operations=1664786 wr_operations=0
ide0-cd1: rd_bytes=195072 wr_bytes=0 rd_operations=70 wr_operations=0
ide1-cd0: rd_bytes=66048 wr_bytes=0 rd_operations=32 wr_operations=0
floppy0: rd_bytes=252416 wr_bytes=7680 rd_operations=493 wr_operations=15
sd0: rd_bytes=0 wr_bytes=0 rd_operations=0 wr_operations=0
and here's the top info:
# top -Hp 19197 -n 1
Tasks: 7 total, 1 running, 6 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 2.5%sy, 0.0%ni, 94.2%id, 2.9%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 12159136k total, 12042996k used, 116140k free, 153608k buffers
Swap: 16777208k total, 232k used, 16776976k free, 7407704k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19207 root 25 0 4364m 4.0g 2536 R 100.0 34.8 216:52.39 qemu-kvm
19197 root 15 0 4364m 4.0g 2536 S 0.0 34.8 0:29.28 qemu-kvm
19206 root 15 0 4364m 4.0g 2536 S 0.0 34.8 0:01.94 qemu-kvm
19208 root 15 0 4364m 4.0g 2536 S 0.0 34.8 2:24.53 qemu-kvm
19209 root 15 0 4364m 4.0g 2536 S 0.0 34.8 3:13.35 qemu-kvm
19210 root 15 0 4364m 4.0g 2536 S 0.0 34.8 3:11.66 qemu-kvm
19211 root 15 0 4364m 4.0g 2536 S 0.0 34.8 0:00.00 qemu-kvm
(In reply to comment #22)
# kvm_stat -1
efer_reload 10414764 0
exits 3031402419 225296
fpu_reload 9913755 51
halt_exits 1583689 0
halt_wakeup 1219003 0
host_state_reload 41045646 51
hypercalls 0 0
insn_emulation 2571736213 196444
insn_emulation_fail 0 0
invlpg 0 0
io_exits 8695918 19
irq_exits 16310318 758
irq_injections 2642275 19
irq_window 580946 11
kvm_request_irq 0 0
largepages 0 0
mmio_exits 103618 0
mmu_cache_miss 10284 0
mmu_flooded 0 0
mmu_pde_zapped 0 0
mmu_pte_updated 0 0
mmu_pte_write 4856685 0
mmu_recycled 0 0
mmu_shadow_zapped 8471 0
mmu_unsync 0 0
mmu_unsync_global 0 0
nmi_injections 7 0
nmi_window 0 0
pf_fixed 3625172 0
pf_guest 0 0
remote_tlb_flush 2772 0
request_nmi 0 0
signal_exits 305 0
tlb_flush 1110364617 84190
Created attachment 602674 [details]
trying 835023 but 804888 happens
(In reply to comment #22)
> with werror=stop, now hit guest hanging on the 'windows error recovery'
guest is not completely dead, will reponse my keyboard event few mins later
I hit the enter key via vnc, the guest will then reboot few mins later(before it reboots, no picture changes).
IRC message following comment 25
<xwei/#kvm> rhod, ping
<xwei/#kvm> rhod, about Bug 835023 - win7.64 keep on rebooting at the end of installation : till now , it didn't reproduced for many rounds of attempts, but bz804888 happens, I submitted some logs to bz835023.
<xwei/#kvm> rhod, and, with the 'werror=stop', the guest hangs after rebooting.
It sounds unlikely that this has never been tested before, so is this a regression?
Created attachment 603253 [details]
On 2012-7-30 Xiaoqing wrote in a mail regarding regression test of this bug and Bug 804888
I tried  RHEL.5.6 host with virtio-win-1.0.3-52454.iso and
 RHEL.5.9 host with virtio-win-1.0.0-45801.iso
Both combinations can reproduce the above two bugs, but the ratio of
reproducible looks  >  for now(I make autotest repeat 200 rounds,
but ~ 150 rounds yet). will re-send an email when all finished and
summarize the ratio.
Thanks and Best Regards,
Please try qcow2 metadata preallocation: add "-o preallocation=metadata" to the qemu-img command line which creates the image. If it still reproduces, that points the finger at virtio rather than qcow2 (though it does not rule it out completely).
Regarding bug 835023, bug 840715, and bug 804888
We are late in RHEL5, and we also do not want to put too many resources into RHEL5 bugs, so I am closing these bugs.
- These bugs are difficult to reproduce, both Vadim and Kevin failed to reproduce them, for days.
- These bugs are related to the initial installation, so no real harm done.
- Reproducing it requires a cycle of installation, that takes a minimum of 1/2 an hour, and even more if we want to try it during high load.
- This is not a regression.
- They were not reported by customers, which probably means that customers rarely install Windows guests on RHEL5.9. They probably install once and use a template (if at all). Since we are talking about RHEL5 host, this probably will not change.
- A simple workaround is usually to simply retry. Other workarounds are to use raw file (instead of QCOW), or to add virtio-stor driver only after the initial installation
Once any of these bugs is reported by a customer, we will reconsider. In most cases the customer should be able to retry and forget about it.
With uncomfortable feelings, Ronen.
An after thought,
Did we try rebooting an installed system repeatedly (without quitting QEMU).
Is there a difference between the first boot (during the installation) and the following boots?
(In reply to comment #38)
> An after thought,
> Did we try rebooting an installed system repeatedly (without quitting QEMU).
Yes, we have this testcase automated(reboot guest 25 times), run it as an regular test for every build, and didn't hit bz#804888 and bz#835023 so far.
> Is there a difference between the first boot (during the installation) and
> the following boots?
I dont know the exactly, but at least I think there's one thing diff [though this looks irrelevant to this bug :) ]
During installation, OS has lesser RAM resource as it need to put all of its files in RAM(caused it dont have a 'swap' partition/file yet)
Just wanna to say there could be more diff between booting a installed/installing OS
(In reply to comment #39)
> (In reply to comment #38)
> > An after thought,
> > Did we try rebooting an installed system repeatedly (without quitting QEMU).
> Yes, we have this testcase automated(reboot guest 25 times), run it as an
> regular test for every build, and didn't hit bz#804888 and bz#835023 so far.
Thanks, this is good. So we stay with the current decision for now.
> > Is there a difference between the first boot (during the installation) and
> > the following boots?
> I dont know the exactly, but at least I think there's one thing diff [though
> this looks irrelevant to this bug :) ]
> During installation, OS has lesser RAM resource as it need to put all of
> its files in RAM(caused it dont have a 'swap' partition/file yet)
> Just wanna to say there could be more diff between booting a
> installed/installing OS
> Xiaoqing Wei.