Bug 1258223 - [i686] BUG: unable to handle kernel paging request at ... in add_nops
Summary: [i686] BUG: unable to handle kernel paging request at ... in add_nops
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2015-08-30 11:51 UTC by Richard W.M. Jones
Modified: 2015-09-08 16:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-09-08 16:11:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2015-08-30 11:51:34 UTC
Description of problem:

I must admit I cannot reproduce this one locally yet, still trying.
However it has happened twice in Koji.  It only happens under
virtualization on i686.  It happens fairly early in the kernel
boot process.

[    0.036000] BUG: unable to handle kernel paging request at 55501e06
[    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
[    0.036000] *pde = 00000000 
[    0.036000] Oops: 0000 [#1] SMP 
[    0.036000] Modules linked in:
[    0.036000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-0.rc8.git3.1.fc24.i686 #1
[    0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[    0.036000] task: c0d49ac0 ti: c0d42000 task.ti: c0d42000
[    0.036000] EIP: 0060:[<c0aae48b>] EFLAGS: 00200046 CPU: 0
[    0.036000] EIP is at common_interrupt+0xb/0x38
[    0.036000] EAX: c0aae480 EBX: 0000008d ECX: c0ab1c83 EDX: e4af6810
[    0.036000] ESI: 029a7802 EDI: 00000003 EBP: c0d43e68 ESP: c0d43e44
[    0.036000]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[    0.036000] CR0: 8005003b CR2: 55501e06 CR3: 00ebd000 CR4: 00000690
[    0.036000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.036000] DR6: 00000000 DR7: 00000000
[    0.036000] Stack:
[    0.036000]  0000004f c0409c80 00000060 00200202 00200046 c0d43e60 c0ea150c 029a7802
[    0.036000]  00000000 c0d43fb8 c040a054 c07f1cf0 6c0a1000 ffff0006 00200046 00000043
[    0.036000]  c0ed0bc0 00000000 c0d43e98 c071a6fc c0d43ea8 c0d43ec4 c0ea4c73 c0ea4c7f
[    0.036000] Call Trace:
[    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
[    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
[    0.036000]  [<c07f1cf0>] ? wait_for_xmitr+0xa0/0xa0
[    0.036000]  [<c071a6fc>] ? sprintf+0x1c/0x20
[    0.036000]  [<c0aae480>] ? irq_entries_start+0x698/0x698
[    0.036000]  [<c071be4b>] ? memcpy+0xb/0x30
[    0.036000]  [<c07f3950>] ? serial8250_set_termios+0x20/0x20
[    0.036000]  [<c0aad4e3>] ? _raw_write_unlock_irqrestore+0x13/0x20
[    0.036000]  [<c0aad4e3>] ? _raw_write_unlock_irqrestore+0x13/0x20
[    0.036000]  [<c0aad4fd>] ? _raw_spin_unlock_irqrestore+0xd/0x10
[    0.036000]  [<c04b17b9>] ? console_unlock+0x2e9/0x610
[    0.036000]  [<c04b03cd>] ? log_store+0x1cd/0x210
[    0.036000]  [<c04b1d7e>] ? vprintk_emit+0x29e/0x570
[    0.036000]  [<c04b21e1>] ? vprintk_default+0x41/0x60
[    0.036000]  [<c0aa7725>] ? printk+0x17/0x19
[    0.036000]  [<c0dfdd48>] ? identify_boot_cpu+0x7b/0x80
[    0.036000]  [<c0dfca47>] alternative_instructions+0x17/0xc1
[    0.036000]  [<c0dfdda9>] check_bugs+0x32/0x39
[    0.036000]  [<c0df6b57>] start_kernel+0x3ca/0x40a
[    0.036000]  [<c0df62e3>] i386_start_kernel+0x91/0x95
[    0.036000] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 8d 90 90 83 04 24 80 fc 0f a8 0f <a0> 06 1e 50 55 57 56 52 51 53 ba 7b 00 00 00 8e da 8e c2 ba d8
[    0.036000] EIP: [<c0aae48b>] common_interrupt+0xb/0x38 SS:ESP 0068:c0d43e44
[    0.036000] CR2: 0000000055501e06
[    0.036000] ---[ end trace 13b552f26f1b4480 ]---

Version-Release number of selected component (if applicable):

kernel-4.2.0-0.rc8.git3.1.fc24.i686

How reproducible:

Unknown - happened twice.

Steps to Reproduce:
1. Run 'libguestfs-test-tool' or 'qemu-sanity-check'

Additional info:

https://kojipkgs.fedoraproject.org//work/tasks/5097/10885097/build.log
https://kojipkgs.fedoraproject.org//work/tasks/9329/10879329/build.log

Comment 1 Richard W.M. Jones 2015-08-30 18:07:49 UTC
OK, I did reproduce it!  It took a long time though.

My set up is:

32 bit i686 Fedora Rawhide VM running on AMD hardware

*Inside* this VM, I'm running:

$ while libguestfs-test-tool >/tmp/log 2>&1; do echo -n .; done
..................................................................................................................................................................................................................................................................................................................................................................................................................

and boom it fails after ~ 400 iterations with the same backtrace
as above.

I don't have any convenient 32 bit baremetal machine to test this on.

Comment 2 Chuck Ebbert 2015-08-31 19:37:42 UTC
   0:	8d 90 90 83 04 24    	lea    0x24048390(%eax),%edx
   6:	80 fc 0f             	cmp    $0xf,%ah
   9:	a8 0f                	test   $0xf,%al
>> b:	a0 06 1e 50 55       	mov    0x55501e06,%al  
  10:	57                   	push   %edi
  11:	56                   	push   %esi

Interrupt 0x30 occurred while the alternatives code was replacing the
initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized
version, 0x8d,0x76,0x00. Only the first byte has been replaced so far,
and it makes a mess out of the insn decoding.


Reported on lkml:
http://marc.info/?l=linux-kernel&m=144098871818266&w=4

Comment 3 Richard W.M. Jones 2015-09-04 13:06:46 UTC
Seems to be a generally accepted solution upstream, which I have
extensively tested and it works for me.

https://marc.info/?l=linux-kernel&m=144127697521764&w=4

Comment 4 Josh Boyer 2015-09-04 17:20:25 UTC
Added in Fedora git.  It will be in the 4.3.0-0.rc0.git8.1 build (whenever that happens).  Thanks for chasing this down everyone.


Note You need to log in before you can comment on or make changes to this bug.