Bug 1035099
| Summary: | "KVM internal error. Suberror: 3" when boot rhel6.5 guest with more than 42(7 AHCI controller) AHCI disks | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Sibiao Luo <sluo> | ||||||||
| Component: | seabios | Assignee: | Gerd Hoffmann <kraxel> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 7.0 | CC: | apetrova, chayang, hhuang, juzhang, knoel, kraxel, michen, pbonzini, qzhang, rbalakri, sluo, virt-maint, xfu, xuhan | ||||||||
| Target Milestone: | rc | Keywords: | TestOnly | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | seabios-1.7.5-1.el7 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2015-03-05 08:14:58 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1101500 | ||||||||||
| Bug Blocks: | 1113520 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Sibiao Luo
2013-11-27 05:24:56 UTC
Created attachment 829572 [details]
ahci-multi-disks-cli.sh
Created attachment 829573 [details]
Screenshot for AHCI guest with Probing EDD (edd=off to disable)...
My host cpu info: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz stepping : 7 microcode : 0x29 cpu MHz : 1598.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 6782.70 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: The CS:IP point to the address of the vector for INT 13h (the BIOS disk I/O services). Note how the hex dump includes address in the ROM area: 0000:0040 (INT 10h) 06 0a 00 c9 c900:0a06 (in sgabios) 0000:0044 (INT 11h) 4d f8 00 f0 f000:f84d (in SeaBIOS) 0000:0048 (INT 12h) 41 f8 00 f0 f000:f841 (in SeaBIOS) 0000:004C (INT 13h) <fe> e3 00 f0 f000:e3fe (in SeaBIOS) 0000:0050 (INT 14h) 9f 07 00 c9 c900:079f (in sgabios) 0000:0054 (INT 15h) 59 f8 00 f0 f000:f859 (in SeaBIOS) 0000:0058 (INT 16h) f7 07 00 c9 c900:07f7 (in sgabios) 0000:005C (INT 17h) d2 ef 00 f0 f000:efd2 (in SeaBIOS) 0000:0060 (INT 18h) 7b c7 00 f0 f000:c77b (in SeaBIOS) 0000:0064 (INT 19h) f2 e6 00 f0 f000:e6f2 (in SeaBIOS) So KVM is really executing data, and the internal error is justified. Changing component to seabios. Created attachment 846919 [details] command line(hang issue) Tested this issue with these component below: qemu-kvm-1.5.3-31.el7.x86_64 kernel-debug-3.10.0-65.el7.x86_64 seabios-1.7.2.2-7.el7.x86_64 Guests: RHEL7 Win2012R2 Steps: 1. boot guest following cmdline attached qemu-kvm cmdline in this comment. Results: While guest booting, have not seen "KVM internal error. Suberror: 3" in comment 0. However, guest hanged during kernel loading. Tested with RHEL7 and Win2012R2 guest all hit this issue. If remove the last AHCI controller and disk, then guest would boot successfully. Any change when booting the guest kernel with "edd=off" ? (In reply to Gerd Hoffmann from comment #6) > Any change when booting the guest kernel with "edd=off" ? No matter appending "edd=off" to rhel6.5 guest kernel line or not which can both hit this issue with the same qemu-kvm command line(attachment 829572 [details]). host info: # uname -r && rpm -q qemu-kvm && rpm -qa | grep seabios 3.10.0-66.el7.x86_64.debug qemu-kvm-1.5.3-31.el7.x86_64 seabios-1.7.2.2-7.el7.x86_64 seabios-bin-1.7.2.2-7.el7.x86_64 guest info: rhel6.5_64bit kernel-2.6.32-424.el6.x86_64 # sh ahci-multi-disks-cli.sh QEMU 1.5.3 monitor - type 'help' for more information (qemu) (qemu) c (qemu) KVM internal error. Suberror: 3 extra data[0]: 80000306 extra data[1]: 31 EAX=00000500 EBX=000fe000 ECX=000062d6 EDX=00000000 ESI=0000fff0 EDI=00009000 EBP=0000feff ESP=00000001 EIP=0000004c EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 CS =0000 00000000 ffffffff 00809b00 SS =9000 00090000 ffffffff 00809300 DS =9000 00090000 ffffffff 00809300 FS =9900 00099000 ffffffff 00809300 GS =9000 00090000 ffffffff 00809300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00009180 00000027 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 06 0a 00 c9 4d f8 00 f0 41 f8 00 f0 <fe> e3 00 f0 9f 07 00 c9 59 f8 00 f0 f7 07 00 c9 d2 ef 00 f0 7b c7 00 f0 f2 e6 00 f0 6e fe (qemu) info status VM status: paused (internal-error) (qemu) c Resetting the Virtual Machine is required (qemu) Do you get more guest kernel messages when booting without "quiet"? (In reply to Gerd Hoffmann from comment #8) > Do you get more guest kernel messages when booting without "quiet"? No any guest kernel message display, it did not go to read the seabios before QEMU quit. Something touches seabios data structures (ahci driver structures to be exact). They get filled with zeros, which can have -- depending on the exact memory layout -- all sorts of funky effects. Jumping to address zero (as seen in this report) by following a cleared function pointer certainly is in the cards. On my machine seabios just hangs. A cleared memory pointer makes seabios and ahci disagree where the cmd block is, therefore ahci never sees the command seabios intended to submit ... Tried to boot rhel7 kernel on the rhel6.5 guest. Hangs too. Given that a rhel7 guest boots fine (see initial report) this points to the rhel6 grub as most likely culprit for the memory corruption. Hi Gerd, According to https://bugzilla.redhat.com/show_bug.cgi?id=1035099#c5, rhel7.0 and Win2012R2 hit this issue as well. You mean it's a different bz? If yes, QE will open new one. Free to add your suggestions? Best Regards, Junyi (In reply to juzhang from comment #13) > Hi Gerd, > > According to https://bugzilla.redhat.com/show_bug.cgi?id=1035099#c5, rhel7.0 > and Win2012R2 hit this issue as well. You mean it's a different bz? If yes, > QE will open new one. Free to add your suggestions? Oops, havn't read comment #5 careful enough. So, the initial comment and #5 disagree whenever rhel7 works or not. Hard to say whenever that is a different issue. Certainly could be the same root cause, but maybe not. Windows being affected too pretty much rules out bootloader / kernel though. Can you retest with the 1.7.5 rebase builds please? http://people.redhat.com/ghoffman/bz1101500/ (In reply to Gerd Hoffmann from comment #18) > Can you retest with the 1.7.5 rebase builds please? > http://people.redhat.com/ghoffman/bz1101500/ Ping (In reply to Gerd Hoffmann from comment #19) > (In reply to Gerd Hoffmann from comment #18) > > Can you retest with the 1.7.5 rebase builds please? > > http://people.redhat.com/ghoffman/bz1101500/ > Retried it with this private build which did not hit such issue any more. host info: 3.10.0-128.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7ev.x86_64 seabios-1.7.5-1.el7_0.bz1101500.3.x86_64 guest info: 2.6.32-452.el6.x86_64 Steps: the same to comment #0. Results: QEMU and KVM guest work well without any quit, all the disks can be detected in guest correctly, no any error in guest dmesg. # ls /dev/sd* | wc -l 43 Best Regards, sluo (In reply to Sibiao Luo from comment #20) > (In reply to Gerd Hoffmann from comment #19) > > (In reply to Gerd Hoffmann from comment #18) > > > Can you retest with the 1.7.5 rebase builds please? > > > http://people.redhat.com/ghoffman/bz1101500/ > > > Retried it with this private build which did not hit such issue any more. Cool. Reproduce this bug with seabios-1.7.2.2-10.el7.x86_64 & qemu-kvm-rhev-2.1.0-3.el7ev.preview.x86_64 & RHEL6.5 guest. (qemu) KVM internal error. Suberror: 1 emulation failure EAX=00000500 EBX=000fe000 ECX=000062d6 EDX=00000000 ESI=0000fff0 EDI=00009000 EBP=0000feff ESP=00000001 EIP=0000004c EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 CS =0000 00000000 ffffffff 00809b00 SS =9000 00090000 ffffffff 00809300 DS =9000 00090000 ffffffff 00809300 FS =9900 00099000 ffffffff 00809300 GS =9000 00090000 ffffffff 00809300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00009180 00000027 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 06 0a 00 c9 4d f8 00 f0 41 f8 00 f0 <fe> e3 00 f0 9f 07 00 c9 59 f8 00 f0 f7 07 00 c9 d2 ef 00 f0 7b c7 00 f0 f2 e6 00 f0 6e fe Verify this bug with seabios-1.7.5-1.el7 & qemu-kvm-rhev-2.1.0-3.el7ev.preview.x86_64. For RHEL6.5 & RHEL7.0 & win2012r2-64 guest. 48 ahci disks are detected inside per guest. and guest and all disks work well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0345.html |