Description of problem: I must admit I cannot reproduce this one locally yet, still trying. However it has happened twice in Koji. It only happens under virtualization on i686. It happens fairly early in the kernel boot process. [ 0.036000] BUG: unable to handle kernel paging request at 55501e06 [ 0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38 [ 0.036000] *pde = 00000000 [ 0.036000] Oops: 0000 [#1] SMP [ 0.036000] Modules linked in: [ 0.036000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-0.rc8.git3.1.fc24.i686 #1 [ 0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 0.036000] task: c0d49ac0 ti: c0d42000 task.ti: c0d42000 [ 0.036000] EIP: 0060:[<c0aae48b>] EFLAGS: 00200046 CPU: 0 [ 0.036000] EIP is at common_interrupt+0xb/0x38 [ 0.036000] EAX: c0aae480 EBX: 0000008d ECX: c0ab1c83 EDX: e4af6810 [ 0.036000] ESI: 029a7802 EDI: 00000003 EBP: c0d43e68 ESP: c0d43e44 [ 0.036000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 0.036000] CR0: 8005003b CR2: 55501e06 CR3: 00ebd000 CR4: 00000690 [ 0.036000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.036000] DR6: 00000000 DR7: 00000000 [ 0.036000] Stack: [ 0.036000] 0000004f c0409c80 00000060 00200202 00200046 c0d43e60 c0ea150c 029a7802 [ 0.036000] 00000000 c0d43fb8 c040a054 c07f1cf0 6c0a1000 ffff0006 00200046 00000043 [ 0.036000] c0ed0bc0 00000000 c0d43e98 c071a6fc c0d43ea8 c0d43ec4 c0ea4c73 c0ea4c7f [ 0.036000] Call Trace: [ 0.036000] [<c0409c80>] ? add_nops+0x90/0xa0 [ 0.036000] [<c040a054>] apply_alternatives+0x274/0x630 [ 0.036000] [<c07f1cf0>] ? wait_for_xmitr+0xa0/0xa0 [ 0.036000] [<c071a6fc>] ? sprintf+0x1c/0x20 [ 0.036000] [<c0aae480>] ? irq_entries_start+0x698/0x698 [ 0.036000] [<c071be4b>] ? memcpy+0xb/0x30 [ 0.036000] [<c07f3950>] ? serial8250_set_termios+0x20/0x20 [ 0.036000] [<c0aad4e3>] ? _raw_write_unlock_irqrestore+0x13/0x20 [ 0.036000] [<c0aad4e3>] ? _raw_write_unlock_irqrestore+0x13/0x20 [ 0.036000] [<c0aad4fd>] ? _raw_spin_unlock_irqrestore+0xd/0x10 [ 0.036000] [<c04b17b9>] ? console_unlock+0x2e9/0x610 [ 0.036000] [<c04b03cd>] ? log_store+0x1cd/0x210 [ 0.036000] [<c04b1d7e>] ? vprintk_emit+0x29e/0x570 [ 0.036000] [<c04b21e1>] ? vprintk_default+0x41/0x60 [ 0.036000] [<c0aa7725>] ? printk+0x17/0x19 [ 0.036000] [<c0dfdd48>] ? identify_boot_cpu+0x7b/0x80 [ 0.036000] [<c0dfca47>] alternative_instructions+0x17/0xc1 [ 0.036000] [<c0dfdda9>] check_bugs+0x32/0x39 [ 0.036000] [<c0df6b57>] start_kernel+0x3ca/0x40a [ 0.036000] [<c0df62e3>] i386_start_kernel+0x91/0x95 [ 0.036000] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 8d 90 90 83 04 24 80 fc 0f a8 0f <a0> 06 1e 50 55 57 56 52 51 53 ba 7b 00 00 00 8e da 8e c2 ba d8 [ 0.036000] EIP: [<c0aae48b>] common_interrupt+0xb/0x38 SS:ESP 0068:c0d43e44 [ 0.036000] CR2: 0000000055501e06 [ 0.036000] ---[ end trace 13b552f26f1b4480 ]--- Version-Release number of selected component (if applicable): kernel-4.2.0-0.rc8.git3.1.fc24.i686 How reproducible: Unknown - happened twice. Steps to Reproduce: 1. Run 'libguestfs-test-tool' or 'qemu-sanity-check' Additional info: https://kojipkgs.fedoraproject.org//work/tasks/5097/10885097/build.log https://kojipkgs.fedoraproject.org//work/tasks/9329/10879329/build.log
OK, I did reproduce it! It took a long time though. My set up is: 32 bit i686 Fedora Rawhide VM running on AMD hardware *Inside* this VM, I'm running: $ while libguestfs-test-tool >/tmp/log 2>&1; do echo -n .; done .................................................................................................................................................................................................................................................................................................................................................................................................................. and boom it fails after ~ 400 iterations with the same backtrace as above. I don't have any convenient 32 bit baremetal machine to test this on.
0: 8d 90 90 83 04 24 lea 0x24048390(%eax),%edx 6: 80 fc 0f cmp $0xf,%ah 9: a8 0f test $0xf,%al >> b: a0 06 1e 50 55 mov 0x55501e06,%al 10: 57 push %edi 11: 56 push %esi Interrupt 0x30 occurred while the alternatives code was replacing the initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized version, 0x8d,0x76,0x00. Only the first byte has been replaced so far, and it makes a mess out of the insn decoding. Reported on lkml: http://marc.info/?l=linux-kernel&m=144098871818266&w=4
Seems to be a generally accepted solution upstream, which I have extensively tested and it works for me. https://marc.info/?l=linux-kernel&m=144127697521764&w=4
Added in Fedora git. It will be in the 4.3.0-0.rc0.git8.1 build (whenever that happens). Thanks for chasing this down everyone.