| Summary: | guest hangs in bios after s3 | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Gerd Hoffmann <kraxel> | ||||||||||||||||
| Component: | seabios | Assignee: | Gleb Natapov <gleb> | ||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | 6.2 | CC: | acathrow, bcao, bsarathy, ehabkost, juzhang, knoel, lcapitulino, lersek, mkenneth, qzhang, syeghiay, tburke, virt-maint, xigao | ||||||||||||||||
| Target Milestone: | rc | ||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | seabios-0.6.1.2-9.el6 | Doc Type: | Bug Fix | ||||||||||||||||
| Doc Text: |
No documentation needed.
|
Story Points: | --- | ||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2012-06-20 12:54:46 UTC | Type: | --- | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Gerd Hoffmann
2012-01-05 13:56:16 UTC
Created attachment 550906 [details]
libvirt config for the guest
(qemu) info registers info registers EAX=000003c5 EBX=0000359e ECX=0000ff89 EDX=00000500 ESI=00000002 EDI=0000d1b2 EBP=00000500 ESP=00000f58 EIP=00000f90 EFL=00010297 [--S-APC] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES =7d50 0007d500 0000ffff 0000f300 CS =c000 000c0000 0000ffff 0000f300 SS =0000 00000000 0000ffff 0000f300 DS =0000 00000000 0000ffff 0000f300 FS =0000 00000000 0000ffff 0000f300 GS =0000 00000000 0000ffff 0000f300 LDT=0000 00000000 0000ffff 00008200 TR =0000 feffd000 00002088 00008b00 GDT= 000fcd78 00000037 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 It doesn't allways hang, sometimes it resumes successfully, but it is like once or twice out of ten times. I'm also testing S3 for qemu-ga related work but instead of a hang I get a black screen, but I'm sure the guest is still running as explained in bug 772614. Three things about the bug: 1. In xml you are using your version of bios.bin. Why? 2. Hang is c000:0f90 which corresponds to vga option rom. 3. You are using QXL. Looks like QXL related bug to me. (In reply to comment #6) > Three things about the bug: > > 1. In xml you are using your version of bios.bin. Why? Probably because I disabled S3 in rhel6 one :) Created attachment 555802 [details]
libvirt config, stripped down
It's not QXL. I've stripped the config down meanwhile, see new attachment. No virtio any more, no spice any more, no sound any more. Still see the hang.
Yes, the custom bios is just for enabling S3.
The hang is still in the same place? At c000:0f90? Same place, yes. Are you using non modified vga bios? Can you disassemble around rip? Yes, unmodified vga bios. Looking at vgabios-stdvga.txt which is generated by the build the rip address seems to be somewhere in the vga font data ... I noticed that too. Can you check disassembly? May be vgabios rearrange things in memory? No, it's unmodified: [ vgabios-stdvga.txt ] 04551 0F90 FF .byte $FF 04552 0F91 DB .byte $DB 04553 0F92 FF .byte $FF 04554 0F93 C3 .byte $C3 04555 0F94 E7 .byte $E7 04556 0F95 FF .byte $FF 04557 0F96 7E .byte $7E [ gdb ] (gdb) disas /r 0xc0f90,+10 Dump of assembler code from 0xc0f90 to 0xc0f9a: 0x000c0f90: ff db lcall *<internal disassembler error> 0x000c0f92: ff c3 inc %ebx 0x000c0f94: e7 ff out %eax,$0xff 0x000c0f96: 7e 6c jle 0xc1004 0x000c0f98: fe (bad) 0x000c0f99: fe (bad) End of assembler dump. (gdb) I think gdb doesn't disassemble correctly (looks like 32bit whereas the code actually is 16bit). The byte sequence match though. You can disassemble in qemu monitor "x/20i 0xf90-10". So something bad happen to vga rom? Can you compile seabios with debug support and capture the debug output during hang? seabios log (default debug level which is 1 IIRC): In resume (status=254) In 32bit resume Running option rom at c000:0003 (In reply to comment #16) > seabios log (default debug level which is 1 IIRC): > > In resume (status=254) > In 32bit resume > Running option rom at c000:0003 And can you disassemble there? raising debug level to 99 doesn't give much more info: In resume (status=254) In 32bit resume init smm Checking rom 0x000c0000 (sig aa55 size 79) Running option rom at c000:0003 c000:0003 is the vgabios entry point which looks ok too: (qemu) x /10i 0xc0003 x /10i 0xc0003 0x00000000000c0003: jmp 0xc0127 [ ... ] (qemu) x /10i 0xc0127 x /10i 0xc0127 0x00000000000c0127: call 0xc3581 0x00000000000c012a: call 0xc35e0 0x00000000000c012d: call 0xc939d 0x00000000000c0130: push %ds 0x00000000000c0131: xor %ax,%ax 0x00000000000c0133: mov %ax,%ds 0x00000000000c0135: mov $0x151,%ax 0x00000000000c0138: mov %ax,0x40 0x00000000000c013b: mov $0xc000,%ax 0x00000000000c013e: mov %ax,0x42 Doesn't look like the vgabios is corrupted. Sneaking in '-vga none' into the qemu command line makes resume work. seabios prints then: In resume (status=254) In 32bit resume Found option rom with bad checksum: loc=0x000c0000 len=4096 sum=ea Jump to resume vector (10000) The rom with the bad checksum is sgabios I guess. One more try: re-enabled vga, disabled sgabios: Hangs, same place. So it isn't sgabios. What is you host HW? Kernel version? Are you sure you are not loading kvm-intel module with "emulate_invalid_guest_state=1" option? Are you sure kvm is enabled during qemu run? It's my lenovo T500 laptop running RHEL-6.2 (kernel 2.6.32-220.el6.x86_64). model name : Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz Created attachment 556259 [details]
change vgabios debug log port
Patch makes the debug builds log to the seabios debug port (0x402) too.
vgabios stops in the middle of the id string printing. no fixed place. kvm_stat output while hanging: kvm statistics efer_reload 0 0 exits 195103390 588008 fpu_reload 532 0 halt_exits 126745 0 halt_wakeup 64863 0 host_state_reload 1767420 565 hypercalls 0 0 insn_emulation 1149528 0 insn_emulation_fail 0 0 invlpg 95419 0 io_exits 1651772 0 irq_exits 395158 999 irq_injections 501674 1002 irq_window 324296 961 largepages 0 0 mmio_exits 48804 0 mmu_cache_miss 55239 0 mmu_flooded 35662 0 mmu_pde_zapped 64762 0 mmu_pte_updated 239045 0 mmu_pte_write 358970 0 mmu_recycled 0 0 mmu_shadow_zapped 66397 0 mmu_unsync 0 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 1072890 0 pf_guest 379453 0 remote_tlb_flush 287 0 request_irq 0 0 signal_exits 1 0 tlb_flush 435238 0 1000 irq injections. Hmm. timer interrupt still running? Does RHEL-5 use 1000 Hz by default? Created attachment 556261 [details]
trace
when it hangs this now and then (guess this is where the 1000 irq injections are coming from).
qemu-system-x86-6799 [000] 762634.226603: kvm_entry: vcpu 0
qemu-system-x86-6799 [000] 762634.226604: kvm_exit: [FAILED TO PARSE] exit_reason=0 guest_rip=0xf26
qemu-system-x86-6799 [000] 762634.226605: kvm_inj_exception: [FAILED TO PARSE] exception=6 has_error=0 error_code=0
qemu-system-x86-6799 [000] 762634.226605: kvm_entry: vcpu 0
kvm-pit-wq-6798 [001] 762634.226642: kvm_set_irq: gsi 0 level 1 source 1
kvm-pit-wq-6798 [001] 762634.226643: kvm_pic_set_irq: chip 0 pin 0 (edge)
kvm-pit-wq-6798 [001] 762634.226643: kvm_ioapic_set_irq: pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
kvm-pit-wq-6798 [001] 762634.226643: kvm_set_irq: gsi 0 level 0 source 1
kvm-pit-wq-6798 [001] 762634.226644: kvm_pic_set_irq: chip 0 pin 0 (edge)
kvm-pit-wq-6798 [001] 762634.226644: kvm_ioapic_set_irq: pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
kvm-pit-wq-6798 [001] 762634.227678: kvm_set_irq: gsi 0 level 1 source 1
kvm-pit-wq-6798 [001] 762634.227679: kvm_pic_set_irq: chip 0 pin 0 (edge)
kvm-pit-wq-6798 [001] 762634.227679: kvm_ioapic_set_irq: pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
kvm-pit-wq-6798 [001] 762634.227679: kvm_set_irq: gsi 0 level 0 source 1
kvm-pit-wq-6798 [001] 762634.227680: kvm_pic_set_irq: chip 0 pin 0 (edge)
kvm-pit-wq-6798 [001] 762634.227680: kvm_ioapic_set_irq: pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
qemu-system-x86-6799 [000] 762634.227816: kvm_exit: [FAILED TO PARSE] exit_reason=0 guest_rip=0xf26
qemu-system-x86-6799 [000] 762634.227817: kvm_inj_exception: [FAILED TO PARSE] exception=6 has_error=0 error_code=0
qemu-system-x86-6799 [000] 762634.227817: kvm_entry: vcpu 0
There is not irq injections actually in the trace. There is one, but before hang. IRQ injection is "kvm_inj_virq". There is #UD exception for some reason though. Created attachment 556273 [details]
trace, second try
[15:14] <gleb> kraxel, can you check with upstream kernel? [15:14] <gleb> kraxel, need to go now [15:15] <kraxel> gleb: did a quick test on fedora 16 this moring and saw a hang too [15:15] <kraxel> (upstream seabios+qemu). [15:15] <kraxel> again not investigated in detail. [15:16] <kraxel> just tried because lots of emulation fixes went upstream last months ... One more data point: guest kernel plays a role too. RHEL-5 guest (32bit) fails (see original report). RHEL-6 guest (64bit) works without trouble. I tried with rhel5 32bit pae (I had the image handy) and was not able to reproduce. Created attachment 556305 [details]
upstream kernel trace
Can you attach the bios.bin and bios.bin.elf that were used to get the trace? Don't have elf, it is the binary shipped with upstream/master, which is at rel-1.6.3.1 right now. Created attachment 556795 [details]
init pic on resume
Can you try attached patch. But only with userspace irq chip. Kernel one has a bug that prevents the patch from working.
Works (tested upstream seabios + upstream qemu). Gleb, are you posting the patch? (In reply to comment #37) > Gleb, are you posting the patch? There are two. One for seabios (upstream already) another is for kernel (waits for review for a week now) . When the kernel one hits upstream I will post them. Hi, all This bug can be reproduced with RHEL5.8-32 guest (but can not be reproduced with rhel5.8-64) with seabios-0.6.1.2-4.el6. And verified with the build Luiz provided in Comment 48, guests do not hang during S3. I tested rhel6.3-64, rhel5.8-32, rhel6.8-64 for more than 20 times and it passed. Steps: 1. Boot a rhel5.3-32 guest: /usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -drive file=/home/RHEL-Server-5.8-32.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -vnc :10 -monitor stdio -boot c 2. Inside guest: #pm-suspend So, this bug is fixed in the build provided in Comment 48. Reproduced on seabios-0.6.1.2-4.el6.x86_64 and verified pass on seabios-0.6.1.2-16.el6.x86_64. seabios-0.6.1.2-4.el6.x86_64: rhel5.8-32: failed. (reproduced the hang issue). rhel5.8-64: pass. rhel6.3-32: pass rhel6.3-64: pass seabios-0.6.1.2-16.el6.x86_64: rhel5.8-32: pass rhel5.8-64: pass rhel6.3-32: pass rhel6.3-64: pass. (pass means this bug is not reproduced, but have other bug like Bug 808391) Steps: 1. Boot a rhel5.3-32 guest: /usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -drive file=/home/RHEL-Server-5.8-32.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -vnc :10 -monitor stdio -boot c 2. Inside guest: #pm-suspend So this issue is fixed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
No documentation needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0802.html |