771946 – guest hangs in bios after s3

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 771946 - guest hangs in bios after s3

Summary: guest hangs in bios after s3

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	seabios
Sub Component:
Version:	6.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Gleb Natapov
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-01-05 13:56 UTC by Gerd Hoffmann
Modified:	2013-12-09 00:56 UTC (History)
CC List:	14 users (show)
Fixed In Version:	seabios-0.6.1.2-9.el6
Doc Type:	Bug Fix
Doc Text:	No documentation needed.
Clone Of:
Environment:
Last Closed:	2012-06-20 12:54:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
libvirt config for the guest (2.49 KB, text/plain) 2012-01-05 13:57 UTC, Gerd Hoffmann	no flags	Details
libvirt config, stripped down (1.89 KB, text/plain) 2012-01-17 14:48 UTC, Gerd Hoffmann	no flags	Details
change vgabios debug log port (652 bytes, application/octet-stream) 2012-01-19 12:27 UTC, Gerd Hoffmann	no flags	Details
trace (39.06 KB, text/plain) 2012-01-19 12:54 UTC, Gerd Hoffmann	no flags	Details
trace, second try (675.73 KB, text/plain) 2012-01-19 13:43 UTC, Gerd Hoffmann	no flags	Details
upstream kernel trace (36.42 KB, text/plain) 2012-01-19 15:38 UTC, Gerd Hoffmann	no flags	Details
init pic on resume (297 bytes, patch) 2012-01-22 15:54 UTC, Gleb Natapov	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:0802	0	normal	SHIPPED_LIVE	seabios bug fix and enhancement update	2012-06-19 19:51:36 UTC

Description Gerd Hoffmann 2012-01-05 13:56:16 UTC

Description of problem:

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.213.el6.x86_64
rhel5 (32bit) guest

How reproducible:
(1) boot guest
(2) echo mem > /sys/power/state
  
Actual results:
guest hangs.

Expected results:
guest resumes successfully.

Additional info:
trapped into this while investigating 736012 ...

Comment 1 Gerd Hoffmann 2012-01-05 13:57:22 UTC

Created attachment 550906 [details]
libvirt config for the guest

Comment 2 Gerd Hoffmann 2012-01-05 13:58:38 UTC

(qemu) info registers
info registers
EAX=000003c5 EBX=0000359e ECX=0000ff89 EDX=00000500
ESI=00000002 EDI=0000d1b2 EBP=00000500 ESP=00000f58
EIP=00000f90 EFL=00010297 [--S-APC] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =7d50 0007d500 0000ffff 0000f300
CS =c000 000c0000 0000ffff 0000f300
SS =0000 00000000 0000ffff 0000f300
DS =0000 00000000 0000ffff 0000f300
FS =0000 00000000 0000ffff 0000f300
GS =0000 00000000 0000ffff 0000f300
LDT=0000 00000000 0000ffff 00008200
TR =0000 feffd000 00002088 00008b00
GDT=     000fcd78 00000037
IDT=     00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000

Comment 3 Gerd Hoffmann 2012-01-05 14:01:31 UTC

It doesn't allways hang, sometimes it resumes successfully,
but it is like once or twice out of ten times.

Comment 5 Luiz Capitulino 2012-01-09 12:59:48 UTC

I'm also testing S3 for qemu-ga related work but instead of a hang I get a black screen, but I'm sure the guest is still running as explained in bug 772614.

Comment 6 Gleb Natapov 2012-01-17 10:55:34 UTC

Three things about the bug:

1. In xml you are using your version of bios.bin. Why?
2. Hang is c000:0f90 which corresponds to vga option rom.
3. You are using QXL.

Looks like QXL related bug to me.

Comment 7 Gleb Natapov 2012-01-17 11:12:42 UTC

(In reply to comment #6)
> Three things about the bug:
> 
> 1. In xml you are using your version of bios.bin. Why?
Probably because I disabled S3 in rhel6 one :)

Comment 8 Gerd Hoffmann 2012-01-17 14:48:40 UTC

Created attachment 555802 [details]
libvirt config, stripped down

It's not QXL.  I've stripped the config down meanwhile, see new attachment.  No virtio any more, no spice any more, no sound any more.  Still see the hang.

Yes, the custom bios is just for enabling S3.

Comment 9 Gleb Natapov 2012-01-17 15:17:05 UTC

The hang is still in the same place? At c000:0f90?

Comment 10 Gerd Hoffmann 2012-01-17 16:05:40 UTC

Same place, yes.

Comment 11 Gleb Natapov 2012-01-17 16:07:43 UTC

Are you using non modified vga bios? Can you disassemble around rip?

Comment 12 Gerd Hoffmann 2012-01-17 17:03:08 UTC

Yes, unmodified vga bios.  Looking at vgabios-stdvga.txt which is generated by the build the rip address seems to be somewhere in the vga font data ...

Comment 13 Gleb Natapov 2012-01-17 17:10:30 UTC

I noticed that too. Can you check disassembly? May be vgabios rearrange things in memory?

Comment 14 Gerd Hoffmann 2012-01-18 16:00:57 UTC

No, it's unmodified:

[ vgabios-stdvga.txt ]
04551 0F90                        FF            .byte   $FF
04552 0F91                        DB            .byte   $DB
04553 0F92                        FF            .byte   $FF
04554 0F93                        C3            .byte   $C3
04555 0F94                        E7            .byte   $E7
04556 0F95                        FF            .byte   $FF
04557 0F96                        7E            .byte   $7E

[ gdb ]
(gdb) disas /r 0xc0f90,+10
Dump of assembler code from 0xc0f90 to 0xc0f9a:
   0x000c0f90:   ff db  lcall  *<internal disassembler error>
   0x000c0f92:   ff c3  inc    %ebx
   0x000c0f94:   e7 ff  out    %eax,$0xff
   0x000c0f96:   7e 6c  jle    0xc1004
   0x000c0f98:   fe     (bad)  
   0x000c0f99:   fe     (bad)  
End of assembler dump.
(gdb) 

I think gdb doesn't disassemble correctly (looks like 32bit whereas the code actually is 16bit).  The byte sequence match though.

Comment 15 Gleb Natapov 2012-01-18 16:18:24 UTC

You can disassemble in qemu monitor "x/20i 0xf90-10".

So something bad happen to vga rom? Can you compile seabios with  debug support and capture the debug output during hang?

Comment 16 Gerd Hoffmann 2012-01-18 16:30:05 UTC

seabios log (default debug level which is 1 IIRC):

In resume (status=254)
In 32bit resume
Running option rom at c000:0003

Comment 17 Gleb Natapov 2012-01-18 16:38:36 UTC

(In reply to comment #16)
> seabios log (default debug level which is 1 IIRC):
> 
> In resume (status=254)
> In 32bit resume
> Running option rom at c000:0003

And can you disassemble there?

Comment 18 Gerd Hoffmann 2012-01-18 16:52:47 UTC

raising debug level to 99 doesn't give much more info:

In resume (status=254)
In 32bit resume
init smm
Checking rom 0x000c0000 (sig aa55 size 79)
Running option rom at c000:0003

c000:0003 is the vgabios entry point which looks ok too:

(qemu) x /10i 0xc0003
x /10i 0xc0003
0x00000000000c0003:  jmp    0xc0127
[ ... ]
(qemu) x /10i 0xc0127
x /10i 0xc0127
0x00000000000c0127:  call   0xc3581
0x00000000000c012a:  call   0xc35e0
0x00000000000c012d:  call   0xc939d
0x00000000000c0130:  push   %ds
0x00000000000c0131:  xor    %ax,%ax
0x00000000000c0133:  mov    %ax,%ds
0x00000000000c0135:  mov    $0x151,%ax
0x00000000000c0138:  mov    %ax,0x40
0x00000000000c013b:  mov    $0xc000,%ax
0x00000000000c013e:  mov    %ax,0x42

Doesn't look like the vgabios is corrupted.

Comment 19 Gerd Hoffmann 2012-01-18 17:06:14 UTC

Sneaking in '-vga none' into the qemu command line makes resume work.

seabios prints then:

In resume (status=254)
In 32bit resume
Found option rom with bad checksum: loc=0x000c0000 len=4096 sum=ea
Jump to resume vector (10000)

The rom with the bad checksum is sgabios I guess.

Comment 20 Gerd Hoffmann 2012-01-18 17:08:53 UTC

One more try: re-enabled vga, disabled sgabios:  Hangs, same place.  So it isn't sgabios.

Comment 21 Gleb Natapov 2012-01-19 07:10:13 UTC

What is you host HW? Kernel version? Are you sure you are not loading kvm-intel module with "emulate_invalid_guest_state=1" option? Are you sure kvm is enabled during qemu run?

Comment 22 Gerd Hoffmann 2012-01-19 08:40:38 UTC

It's my lenovo T500 laptop running RHEL-6.2 (kernel 2.6.32-220.el6.x86_64).

model name      : Intel(R) Core(TM)2 Duo CPU     T9600  @ 2.80GHz

Comment 23 Gerd Hoffmann 2012-01-19 12:27:33 UTC

Created attachment 556259 [details]
change vgabios debug log port

Patch makes the debug builds log to the seabios debug port (0x402) too.

Comment 24 Gerd Hoffmann 2012-01-19 12:34:06 UTC

vgabios stops in the middle of the id string printing.
no fixed place.

kvm_stat output while hanging:

kvm statistics

 efer_reload                  0       0
 exits                195103390  588008
 fpu_reload                 532       0
 halt_exits              126745       0
 halt_wakeup              64863       0
 host_state_reload      1767420     565
 hypercalls                   0       0
 insn_emulation         1149528       0
 insn_emulation_fail          0       0
 invlpg                   95419       0
 io_exits               1651772       0
 irq_exits               395158     999
 irq_injections          501674    1002
 irq_window              324296     961
 largepages                   0       0
 mmio_exits               48804       0
 mmu_cache_miss           55239       0
 mmu_flooded              35662       0
 mmu_pde_zapped           64762       0
 mmu_pte_updated         239045       0
 mmu_pte_write           358970       0
 mmu_recycled                 0       0
 mmu_shadow_zapped        66397       0
 mmu_unsync                   0       0
 nmi_injections               0       0
 nmi_window                   0       0
 pf_fixed               1072890       0
 pf_guest                379453       0
 remote_tlb_flush           287       0
 request_irq                  0       0
 signal_exits                 1       0
 tlb_flush               435238       0

1000 irq injections.  Hmm.  timer interrupt still running?  Does RHEL-5 use 1000 Hz by default?

Comment 25 Gerd Hoffmann 2012-01-19 12:54:06 UTC

Created attachment 556261 [details]
trace

Comment 26 Gerd Hoffmann 2012-01-19 12:59:08 UTC

when it hangs this now and then (guess this is where the 1000 irq injections are coming from).


 qemu-system-x86-6799  [000] 762634.226603: kvm_entry:            vcpu 0
 qemu-system-x86-6799  [000] 762634.226604: kvm_exit:             [FAILED TO PARSE] exit_reason=0 guest_rip=0xf26
 qemu-system-x86-6799  [000] 762634.226605: kvm_inj_exception:    [FAILED TO PARSE] exception=6 has_error=0 error_code=0
 qemu-system-x86-6799  [000] 762634.226605: kvm_entry:            vcpu 0
      kvm-pit-wq-6798  [001] 762634.226642: kvm_set_irq:          gsi 0 level 1 source 1
      kvm-pit-wq-6798  [001] 762634.226643: kvm_pic_set_irq:      chip 0 pin 0 (edge)
      kvm-pit-wq-6798  [001] 762634.226643: kvm_ioapic_set_irq:   pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
      kvm-pit-wq-6798  [001] 762634.226643: kvm_set_irq:          gsi 0 level 0 source 1
      kvm-pit-wq-6798  [001] 762634.226644: kvm_pic_set_irq:      chip 0 pin 0 (edge)
      kvm-pit-wq-6798  [001] 762634.226644: kvm_ioapic_set_irq:   pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
      kvm-pit-wq-6798  [001] 762634.227678: kvm_set_irq:          gsi 0 level 1 source 1
      kvm-pit-wq-6798  [001] 762634.227679: kvm_pic_set_irq:      chip 0 pin 0 (edge)
      kvm-pit-wq-6798  [001] 762634.227679: kvm_ioapic_set_irq:   pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
      kvm-pit-wq-6798  [001] 762634.227679: kvm_set_irq:          gsi 0 level 0 source 1
      kvm-pit-wq-6798  [001] 762634.227680: kvm_pic_set_irq:      chip 0 pin 0 (edge)
      kvm-pit-wq-6798  [001] 762634.227680: kvm_ioapic_set_irq:   pin 2 dst 0 vec=0 (Fixed|physical|edge|masked)
 qemu-system-x86-6799  [000] 762634.227816: kvm_exit:             [FAILED TO PARSE] exit_reason=0 guest_rip=0xf26
 qemu-system-x86-6799  [000] 762634.227817: kvm_inj_exception:    [FAILED TO PARSE] exception=6 has_error=0 error_code=0
 qemu-system-x86-6799  [000] 762634.227817: kvm_entry:            vcpu 0

Comment 27 Gleb Natapov 2012-01-19 13:10:58 UTC

There is not irq injections actually in the trace. There is one, but before hang.
IRQ injection is "kvm_inj_virq". There is #UD exception for some reason though.

Comment 28 Gerd Hoffmann 2012-01-19 13:43:40 UTC

Created attachment 556273 [details]
trace, second try

Comment 29 Gerd Hoffmann 2012-01-19 14:20:32 UTC

[15:14] <gleb> kraxel, can you check with upstream kernel?
[15:14] <gleb> kraxel, need to go now
[15:15] <kraxel> gleb: did a quick test on fedora 16 this moring and saw a hang too
[15:15] <kraxel> (upstream seabios+qemu).
[15:15] <kraxel> again not investigated in detail.
[15:16] <kraxel> just tried because lots of emulation fixes went upstream last months ...

Comment 30 Gerd Hoffmann 2012-01-19 14:46:18 UTC

One more data point: guest kernel plays a role too.
RHEL-5 guest (32bit) fails (see original report).
RHEL-6 guest (64bit) works without trouble.

Comment 31 Gleb Natapov 2012-01-19 14:55:15 UTC

I tried with rhel5 32bit pae (I had the image handy) and was not able to reproduce.

Comment 32 Gerd Hoffmann 2012-01-19 15:38:05 UTC

Created attachment 556305 [details]
upstream kernel trace

Comment 33 Gleb Natapov 2012-01-20 11:53:53 UTC

Can you attach the bios.bin and bios.bin.elf that were used to get the trace?

Comment 34 Gerd Hoffmann 2012-01-20 13:05:26 UTC

Don't have elf, it is the binary shipped with upstream/master, which is at rel-1.6.3.1 right now.

Comment 35 Gleb Natapov 2012-01-22 15:54:54 UTC

Created attachment 556795 [details]
init pic on resume

Can you try attached patch. But only with userspace irq chip. Kernel one has a bug that prevents the patch from working.

Comment 36 Gerd Hoffmann 2012-01-23 11:14:22 UTC

Works (tested upstream seabios + upstream qemu).

Comment 37 Dor Laor 2012-02-02 15:21:51 UTC

Gleb, are you posting the patch?

Comment 38 Gleb Natapov 2012-02-02 15:41:37 UTC

(In reply to comment #37)
> Gleb, are you posting the patch?

There are two. One for seabios (upstream already) another is for kernel (waits for review for a week now) . When the kernel one hits upstream I will post them.

Comment 51 Qunfang Zhang 2012-02-20 05:01:20 UTC

Hi, all
This bug can be reproduced with RHEL5.8-32 guest (but can not be reproduced with rhel5.8-64) with seabios-0.6.1.2-4.el6.  And verified with the build Luiz provided in Comment 48, guests do not hang during S3. I tested rhel6.3-64, rhel5.8-32, rhel6.8-64 for more than 20 times and it passed.

Steps:
1. Boot a rhel5.3-32 guest:
/usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 2048 -smp 2,sockets=1,cores=2,threads=1 -name rhel6.3 -drive file=/home/RHEL-Server-5.8-32.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -vnc :10 -monitor stdio -boot c

2. Inside guest:
#pm-suspend

So, this bug is fixed in the build provided in Comment 48.

Comment 54 Qunfang Zhang 2012-04-05 08:04:56 UTC

Reproduced on seabios-0.6.1.2-4.el6.x86_64 and verified pass on seabios-0.6.1.2-16.el6.x86_64.

seabios-0.6.1.2-4.el6.x86_64:
rhel5.8-32: failed. (reproduced the hang issue).
rhel5.8-64: pass.
rhel6.3-32: pass
rhel6.3-64: pass

seabios-0.6.1.2-16.el6.x86_64:

rhel5.8-32: pass
rhel5.8-64: pass
rhel6.3-32: pass
rhel6.3-64: pass. (pass means this bug is not reproduced, but have other bug like Bug 808391)

Steps:

1. Boot a rhel5.3-32 guest:
/usr/libexec/qemu-kvm -M rhel6.3.0 -enable-kvm -m 2048 -smp
2,sockets=1,cores=2,threads=1 -name rhel6.3 -drive
file=/home/RHEL-Server-5.8-32.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=zhang,cache=none,werror=stop,rerror=stop,aio=native
-device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0
-netdev tap,id=hostnet0 -device
e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:42:0b:00,bus=pci.0,addr=0x3 -vnc :10
-monitor stdio -boot c

2. Inside guest:
#pm-suspend

So this issue is fixed.

Comment 56 Eduardo Habkost 2012-04-23 18:39:41 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No documentation needed.

Comment 58 errata-xmlrpc 2012-06-20 12:54:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0802.html

Note You need to log in before you can comment on or make changes to this bug.