Created attachment 951519 [details] console output Description of problem: Booting under an AMD machine I see the freshly non-PAE kernel installed crash: 4 [ 0.587285] Unpacking initramfs... [ 0.861285] BUG: unable to handle kernel paging request at 35d4e304 [ 0.862015] IP: [<c042e905>] load_microcode_amd+0x25/0x4a0 [ 0.862015] *pde = 00000000 [ 0.862015] Oops: 0000 [#1] SMP [ 0.862015] Modules linked in: [ 0.862015] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1 [ 0.862015] Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014 [ 0.862015] task: f5098000 ti: f50d0000 task.ti: f50d0000 [ 0.862015] EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0 [ 0.862015] EIP is at load_microcode_amd+0x25/0x4a0 [ 0.862015] EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000 [ 0.862015] ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94 [ 0.862015] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 0.862015] CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0 [ 0.862015] Stack: [ 0.862015] 00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f [ 0.862015] f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8 [ 0.862015] c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04 [ 0.862015] Call Trace: [ 0.862015] [<c0d7735a>] ? unpack_to_rootfs+0x27a/0x27a [ 0.862015] [<c0d7735a>] ? unpack_to_rootfs+0x27a/0x27a [ 0.862015] [<c0d80861>] save_microcode_in_initrd_amd+0x95/0xbf [ 0.862015] [<c0d80429>] save_microcode_in_initrd+0x30/0x34 [ 0.862015] [<c0d889a9>] free_initrd_mem+0xe/0x2a [ 0.862015] [<c0d77425>] populate_rootfs+0xcb/0xee [ 0.862015] [<c0d7735a>] ? unpack_to_rootfs+0x27a/0x27a [ 0.862015] [<c0400496>] do_one_initcall+0xc6/0x200 [ 0.862015] [<c0d7735a>] ? unpack_to_rootfs+0x27a/0x27a [ 0.862015] [<c0d75503>] ? repair_env_string+0x12/0x54 [ 0.862015] [<c05e6400>] ? proc_mkdir+0x20/0x20 [ 0.862015] [<c0d75c8e>] kernel_init_freeable+0x15b/0x1e9 [ 0.862015] [<c0a3afb0>] kernel_init+0x10/0xe0 [ 0.862015] [<c0a44341>] ret_from_kernel_thread+0x21/0x30 [ 0.862015] [<c0a3afa0>] ? rest_init+0x70/0x70 Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Install Fedora 21 on a machine, reboot 2. Install Xen (yum install xen) on said machine, reboot in Xen 3. Install the F21 LiveIOS 32-bit HVM guest using virt-install or virt-manager. 4. See it boot and crash Actual results: See attachment for full serial output Expected results: Boot in a nice graphical screen. Additional info:
Specs of the AMD machine: processor : 7 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-8320 Eight-Core Processor stepping : 0 microcode : 0x6000822 cpu MHz : 3511.804 cache size : 2048 KB Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A97 LE R2.0 Version: Rev 1.xx
Pass 'dis_ucode_ldr' on the command line and see if that makes the issue go away. This isn't a properly solution but it might suffice as a workaround. I'd suggest taking the problem report upstream as well.
(In reply to Josh Boyer from comment #2) > Pass 'dis_ucode_ldr' on the command line and see if that makes the issue go > away. This isn't a properly solution but it might suffice as a workaround. > I'd suggest taking the problem report upstream as well. That did it. Trying different builds to see what CONFIG option exposes this as I don't seem to be triggering it with my normal builds.
It could just be a case of the initramfs being too large for the memory allocated to your guest. We enabled early microcode loading a while ago, which means the ucode gets prepended to the initramfs image. Does increasing the memory to the guest also make it boot?
Here's what the asm looks like after Josh pointed me at the kernel in question (btw, there's another bug - 1157157 - which has the same RIP). Annotations mine reconstructed from System.map: c042e8e0 <load_microcode_amd>: c042e8e0: 55 push %ebp c042e8e1: 89 e5 mov %esp,%ebp c042e8e3: 57 push %edi c042e8e4: 56 push %esi c042e8e5: 53 push %ebx # ... callee-saved c042e8e6: 83 e4 f8 and $0xfffffff8,%esp # align stack ptr c042e8e9: 83 ec 2c sub $0x2c,%esp # grow stack c042e8ec: e8 4b 68 61 00 call 0xc0a4513c # mcount c042e8f1: 88 44 24 1f mov %al,0x1f(%esp) c042e8f5: a1 cc d8 e3 c0 mov 0xc0e3d8cc,%eax # equiv_cpu_table c042e8fa: 89 d7 mov %edx,%edi # data c042e8fc: 89 4c 24 28 mov %ecx,0x28(%esp) c042e900: e8 4b 7d 13 00 call 0xc0566650 # vfree() c042e905: 8b 77 08 mov 0x8(%edi),%esi <--- faulting insn c042e908: c7 05 cc d8 e3 c0 00 movl $0x0,0xc0e3d8cc c042e90f: 00 00 00 %edi (copied from %edx) contains the second arg to apply_microcode_amd() which is that that const u8 *data pointer, pointing to the microcode container coming from the initrd. And that %edi looks funny: 0x35d4e2fc which causes the NULL ptr deref. And since save_microcode_in_initrd_amd() checks the container for being 0, it is probably that relocated_ramdisk fun we do which gets the container pointer wrong. Konrad, can you dump those values participating in the computation? It might tell us what is going wrong: if (relocated_ramdisk) container = (u8 *)(__va(relocated_ramdisk) + (cont - boot_params.hdr.ramdisk_image)); Thanks.
Created attachment 952289 [details] serial console with the crash
Created attachment 952290 [details] The patch I used.
Ugh. Time to add more printks..
Created attachment 952526 [details] serial console with the crash The issue appears to be in: ret = load_microcode_amd(eax, container, container_size);
Created attachment 952527 [details] Debug patch
(In reply to Konrad Rzeszutek Wilk from comment #9) > The issue appears to be in: > > ret = load_microcode_amd(eax, container, container_size); Yeah, I thought this was clear from comment #5... In any case, staring at this more, it looks like this happens because we're using the *physical* address of the container *after* we have enabled paging and thus the #PF. Because the ramdisk is exactly there: [ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff] and we fault at 0x35e04304. And since this guest doesn't relocate the ramdisk, we don't do the computation which will give us the correct virtual address and we end up with the PA. So, we should actually be using virtual addresses on 32-bit by the time we're freeing the initrd. How about the attached debug patch? Thanks.
Created attachment 952535 [details] test patch
Created attachment 952587 [details] console with your patch
(In reply to Konrad Rzeszutek Wilk from comment #13) > Created attachment 952587 [details] > console with your patch Which of course boots!
Thanks for testing Konrad, much appreciated. I'll clean it up and send it to tip guys soon. @Josh: you can close this one now. Thanks.
Will be in todays 3.18.0-0.rc3.git0.1 build. Thanks Borislav!