Bug 1411275
Summary: | Migration from RHEL7-2-z to RHEL7-3-z on amd host, guest cannot be reboot normally | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | huiqingding <huding> |
Component: | seabios | Assignee: | Virtualization Maintenance <virt-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | amit.shah, chayang, dgilbert, hhuang, huding, jinzhao, knoel, virt-maint, xfu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: |
A boot (or more commonly reboot) would hang at a point the BIOS was waiting for user input, the BIOS should have timed out and continued.
An error in initialisation order meant that under some circumstances (complex guests typically) the timer could time out before the interrupt could be delivered. The initialisation order has now been changed so that can't happen.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-04-04 17:01:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
huiqingding
2017-01-09 10:26:06 UTC
Hi, Some questions: a) Does 7.2.z->7.2.z hit this bug? b) Does 7.3.z->7.3.z hit this bug? c) Does 7.2 -> 7.3 (not z) hit this bug? d) Does it only happen on opteron? Dave I can reproduce this 7.3.z->7.3.z; boot a 6.9 nightly image migrate it to the same host inside the guest reboot inside the guest reboot a 2nd time <black screen> Hi Huiqing, Can you try a simpler test for me; on the 7.3.z host, just try rebooting the guest 4 or 5 times (log in to the guest and type reboot) - no migration involved. I have a case here where the reboot hangs even without the migration. using a rhel7 machine type seems to make this go away, and the ACPI DSDT= and Copying ACPI RSDP lines go away in the debug. just doing 7.3.z- reboots without migrate, the failures I get seems to vary a bit with different seabios builds. Originally I was using 1.5.1-5.el7 and I was seeing just hangs after that 'Copying ACPI RSDP' line. Then with 1.5.1-5.el7_3.1 I occasionally saw a stream of 'CPU Mhz=....' lines, but the change between those versions seems to be irrelevant and I don't see how we can get that message multiple times anyway. Just rebuilding seabios with a different level of debug gets me a different failure mode. (A reliable KVM error - probably because it's a bad PC) I'm not yet sure what's going on, but it feels very much like a memory corruption - the very smallest changes to the BIOS change the behaviour. Turning upto debug level 9 isn't possible with the smaller rom sizes for rhel6, and so I have a build at debug level 4, where I've added CONFIG_CDROM_BOOT=n,CONFIG_CDROM_EMU=n,CONFIG_MOUSE=n to the config.base and added the following debug to it: diff -urN a/src/fw/biostables.c b/src/fw/biostables.c --- a/src/fw/biostables.c 2017-01-11 12:32:00.069748380 -0500 +++ b/src/fw/biostables.c 2016-01-18 05:13:20.000000000 -0500 @@ -116,7 +116,7 @@ warn_noalloc(); return; } - dprintf(1, "Copying ACPI RSDP from %p to %p\n", pos, newpos); + dprintf(1, "Copying ACPI RSDP from %p to %p (len=0x%x)\n", pos, newpos, length); memcpy(newpos, pos, length); RsdpAddr = newpos; } diff -urN a/src/hw/timer.c b/src/hw/timer.c --- a/src/hw/timer.c 2017-01-11 14:37:51.570684382 -0500 +++ b/src/hw/timer.c 2016-01-18 05:13:20.000000000 -0500 @@ -101,8 +101,10 @@ void timer_setup(void) { - if (!CONFIG_TSC_TIMER || (CONFIG_PMTIMER && TimerPort != PORT_PIT_COUNTER0)) + if (!CONFIG_TSC_TIMER || (CONFIG_PMTIMER && TimerPort != PORT_PIT_COUNTER0)) { + dprintf(1, "timer_setup early exit: TimerPort=0x%x\n", TimerPort); return; + } // Check if CPU has a timestamp counter u32 eax, ebx, ecx, edx, cpuid_features = 0; @@ -120,7 +118,7 @@ { if (!CONFIG_PMTIMER) return; - dprintf(1, "Using pmtimer, ioport 0x%x\n", ioport); + dprintf(1, "Using pmtimer, ioport 0x%x &TimerPort=%p\n", ioport, &TimerPort); TimerPort = ioport; TimerKHz = DIV_ROUND_UP(PMTIMER_HZ, 1000); } and that fails almost every time on my test; (I'd added that debug because the previous symptom was that tsctimer_setup was printing debug even when my understanding was it was using pm timers) from the logs it looked like it was failing just before printing the UUID, so I add: diff -ur /root/src-withmymods/fw/biostables.c src/fw/biostables.c --- a/src-withmymods/fw/biostables.c 2017-01-11 12:32:00.069748380 -0500 +++ b/src/fw/biostables.c 2017-01-13 11:19:06.923168822 -0500 @@ -297,9 +297,12 @@ void display_uuid(void) { + dprintf(1, "display_uuid entry\n"); struct smbios_type_1 *tbl = smbios_next(SMBiosAddr, NULL); int minlen = offsetof(struct smbios_type_1, uuid) + sizeof(tbl->uuid); + dprintf(1, "display_uuid preloop tbl=%p minlen=%d\n", tbl, minlen); for (; tbl; tbl = smbios_next(SMBiosAddr, tbl)) + dprintf(1, "display_uuid loop top tbl=%p\n", tbl); if (tbl->header.type == 1 && tbl->header.length >= minlen) { u8 *uuid = tbl->uuid; u8 empty_uuid[sizeof(tbl->uuid)] = { 0 }; and it's now solid as a rock. Note my current minimal failure line is: /usr/libexec/qemu-kvm \ -name rhel7 \ -S -machine rhel6.6.0,accel=kvm,usb=off,smm=off \ -realtime mlock=off \ -cpu Opteron_G4,check \ -sandbox off \ -m 7680 \ -smp 1, \ -uuid 49a3438a-70a3-4ba8-92ce-3a05e0934608 \ -no-user-config \ -nodefaults \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=discard \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot order=c,menu=on,splash-time=3000,strict=on \ -drive file=rhel-guest-image-6.9-28.x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline --device isa-serial,chardev=mon \ -k en-us \ -device VGA \ -chardev file,id=bioslog,path=bioslog \ -device isa-debugcon,iobase=0x402,chardev=bioslog \ $* as pointed out by Laszlo, may be related to David's bug 1382906. There's lots of different hangs I get as I add different debug - but by far the commonest is it coming back to hanging in that display_uuid loop so still digging around display_uuid and I objdump the ROM, and I find this code: from disassembling src/fw/biostables.o: 73: f3 ab rep stos %eax,%es:(%edi) 75: 8d 43 08 lea 0x8(%ebx),%eax 78: b1 10 mov $0x10,%cl 7a: 8d 54 24 44 lea 0x44(%esp),%edx 7e: e8 fc ff ff ff call 7f <display_uuid+0x7f> 83: 85 c0 test %eax,%eax 85: 0f 84 33 01 00 00 je 1be <display_uuid+0x1be> 8b: 8b 15 00 00 00 00 mov 0x0,%edx 91: 80 7a 06 02 cmpb $0x2,0x6(%edx) <-- major version 95: 0f b6 43 17 movzbl 0x17(%ebx),%eax 99: 77 0c ja a7 <display_uuid+0xa7> 9b: 0f 85 80 00 00 00 jne 121 <display_uuid+0x121> a1: 80 7a 07 05 cmpb $0x5,0x7(%edx) <-- minor version? a5: 76 7a jbe 121 <display_uuid+0x121> <--- old format display now working up as it gets linked, here is from code32flat.o: 10be: f3 ab rep stos %eax,%es:(%edi) 10c0: 8d 43 08 lea 0x8(%ebx),%eax 10c3: b1 10 mov $0x10,%cl 10c5: 8d 54 24 74 lea 0x74(%esp),%edx 10c9: e8 fc ff ff ff call 10ca <maininit+0x10ca> 10ce: 85 c0 test %eax,%eax 10d0: 0f 84 33 01 00 00 je 1209 <maininit+0x1209> 10d6: 8b 15 00 00 00 00 mov 0x0,%edx 10dc: 80 7a 06 02 cmpb $0x2,0x6(%edx) 10e0: 0f b6 43 17 movzbl 0x17(%ebx),%eax 10e4: 77 0c ja 10f2 <maininit+0x10f2> 10e6: 0f 85 80 00 00 00 jne 116c <maininit+0x116c> 10ec: 80 7a 07 05 cmpb $0x5,0x7(%edx) 10f0: 76 7a jbe 116c <maininit+0x116c> ok, so it looks like it got combined with maininit - fine. Now objdump from rom.o: ef79d: f3 ab rep stos %eax,%es:(%edi) ef79f: 8d 43 08 lea 0x8(%ebx),%eax ef7a2: b1 10 mov $0x10,%cl ef7a4: 8d 54 24 74 lea 0x74(%esp),%edx ef7a8: e8 50 46 ff ff call e3dfd <memcmp> ef7ad: 85 c0 test %eax,%eax ef7af: 0f 84 33 01 00 00 je ef8e8 <ExtraStack+0x110> ef7b5: 8b 15 0c 23 0f 00 mov 0xf230c,%edx ef7bb: 80 7a 06 02 cmpb $0x2,0x6(%edx) ef7bf: 0f b6 43 17 movzbl 0x17(%ebx),%eax ef7c3: 77 0c ja ef7d1 <StackPos+0x1> ef7c5: 0f 85 80 00 00 00 jne ef84b <ExtraStack+0x73> ef7cb: 80 7a 07 05 cmpb $0x5,0x7(%edx) ef7cf: 76 7a jbe ef84b <ExtraStack+0x73> Hang on - why are those symbols in 'ExtraStack'; jumps from just before that point are in maininit; but we seem to have run into the '"low" memory' segment?? OK, backup a bit. Turning on debug on the seabios build can cause a completely different bug which is what I think I've been chasing for the last few days - I can end up with ExtraStack/data overlaying the code in maininit and that's why I get crashes in display_uuid; so that's different from the original bug I think where the layout looks more sane. <goes back a few days to figure out what happened> I added some tracing of reboot types (on upstream); it's a bit odd. The kernel normally reboots via the keyboard controller. That lands in the bios and that relatively quickly reboots again with the keyboard controller. That pair happens irrespective of if I tell the guest kernel to do a cold or hot reboot. Sometimes, we also get KVM exiting with KVM_EXIT_SHUTDOWN which I think means we hit a triple fault - it's mostly random when we hit it, except that is it's pretty reliable that the first few reboots follows the sequence of keyboard controller keyboard controller KVM_EXIT_SHUTDOWN keyboard controller so I think that means the first time the guest does a reboot it ends up causing a triple fault after the bios does it's reboot. Normally we recover from the triple fault and carry on; so maybe the triple fault happens more often that we really know and we survivie it. The problem with the reboot seems to be understood now. QEMU doesn't reset the PAM registers on reset, this means the ROM mapping at reset is wrong, and ends up running out of RAM instead. SeaBIOS has a workaround that detects this and reboots again having recopied the image, but the workaround ends up corrupting 3 bytes of memory (TimerLast) and overwriting part of the new copy - if you're unlucky with the memory layout and timer value then it might hang. I have a fix for qemu's reset and Kevin is doing a seabios fix for the workaround. Note I've split this into two bugs; bz 1428347 is now the 'reboot' on rhel6 machine types failing occasionally without the migration. I can trigger this without the migration, but by using the 7.2 bios on 7.3 qemu. e.g. taking a copy of /usr/share/qemu-kvm/bios-256k.bin onto my 7.3 box and using firmware=/root/bios-256k-copied-from.72.bin it hangs at the ESC in the same way as a VM migrated from 7.2; so the bug isn't actually the migration process, it's just the 7.2 bios doesn't like running on 7.3 Similarly if I copy the rhel6 rtl8139 rom and use that as the 8139 ROM with a 7.3 qemu, it hangs at the 'Press Ctrl-B' point. Running with the old bios is 'fun' I've already compiled in a fix for the TimerLast issue, but even so there's a repeatable hang - but at the point it's waiting for input. It's very hard to bisect, because it seems to survive for quite a lot of reboots; initially I thought it failed after 1-2 reboots, but I've got cases of failure after 29. The earliest failure I've seen so far is on 2.4.0, but 2.3.0 I've not seen it fail (after a pair of runs, each with ~30 reboots). The behaviour is basically non-bisectable on qemu version - whenever you think you've got one that fails it decides to work again. I've got a current setup where running with the -debug build of the host kernel it hangs at the same place at boot - not at reboot. What I think is happening there is that it gets the first PIT interrupt at a point when the APIC is masked, so it never delivers it. However the PIT still has the IRR=1, so I think any future PIT interrupt doesn't decide to get it delivered either. I can get it to work using no-kernel-irqchip (sp?) but who knows if that's just changing the timing. What I've not figured out yet is if there's a state where it should be delivering the interrupt and the kernel isn't, or if there is something that the BIOS has setup wrong/hasn't cleared. ok, I believe I understand this (with some help from Paolo), and it's a bug in the 7.2 bios that's fixed in later bioses. In seabios prior to ~1.8.0 - i.e. prior to upstream change bbb3fbac the initialisation order was approximately: timers PCI probe APIC setup (in smp_scan) if the PCI probe was long enough (i.e you had lots of PCI devices), then the main PIT timer could fire before the APIC setup code, that interrupt was never delivered and since it never got to the PIT handler it never got cleared and so no new interrupts were ever delivered. After that change the order is: PCI probe APIC setup timers so that there's no risk of the timer firing early. 7.2 had 1.7.5-11.el7 7.3 had 1.9.1-5.el7 So I'll mark this as 'fixed in current release' - 7.3 and newer should be fine. *** Bug 1571171 has been marked as a duplicate of this bug. *** |