Bug 1194366 - ftrace writes to random memory when loading a module
Summary: ftrace writes to random memory when loading a module
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
TreeView+ depends on / blocked
 
Reported: 2015-02-19 16:28 UTC by Richard W.M. Jones
Modified: 2015-10-08 08:10 UTC (History)
8 users (show)

Fixed In Version: kernel-4.0.0-0.rc1.git0.2.fc23
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-25 12:24:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
crc32-arm64.ko (8.50 KB, application/x-object)
2015-02-19 17:14 UTC, Richard W.M. Jones
no flags Details
crc32.ko (6.34 KB, application/x-object)
2015-02-19 17:15 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2015-02-19 16:28:23 UTC
Description of problem:

kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64 cannot run KVM
guests (sometimes anyway).  It looks like it's missing some
emulation from kvm.ko:

[   66.722766] kvm [1301]: load/store instruction decoding not implemented

It happens when the guest inserts a kernel module.  It may be
`crc32-arm64.ko' but I cannot be completely certain about that,
since it may be inserting another module but not getting the
debug message out.

Version-Release number of selected component (if applicable):

kernel-3.20.0-0.rc0.git7.3.bz1193875.fc23.aarch64
qemu-2.2.0-5.fc22.aarch64

How reproducible:

100%

Steps to Reproduce:
1. Run: libguestfs-test-tool

Comment 1 Richard W.M. Jones 2015-02-19 16:31:25 UTC
I should note that what specifically happens is the guest
starts up, and then suddenly aborts (when inserting the guest
kernel module).

Comment 2 Richard W.M. Jones 2015-02-19 16:46:55 UTC
In the test above, I was using guest kernel == host kernel ==
3.20.0-0.rc0.git7.3.bz1193875.fc23.

I tried this again, with:
guest kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23
host kernel == 3.19.0-0.rc7.git1.1.fc22
This *also* crashes when loading the kernel module in the
guest.

So it seems as if the problem is some new processor instruction
is used in a kernel module (possibly crc32-arm64.ko), which KVM
is unable to emulate.

Comment 3 Richard W.M. Jones 2015-02-19 16:57:55 UTC
A couple of other observations:

(1) Doesn't fail with host == guest == 3.19.0-0.rc7.git1.1.fc22.aarch64

(2) I'm very certain the troublesome kernel module is either
`crc32-arm64.ko' or `crc32.ko', and I'm about 90% certain it is
`crc32-arm64.ko'.

Comment 4 Richard W.M. Jones 2015-02-19 17:14:56 UTC
Created attachment 993700 [details]
crc32-arm64.ko

It occurred to me that maybe people wouldn't be able to
download the suspected modules, so I'll attached them here.

Comment 5 Richard W.M. Jones 2015-02-19 17:15:14 UTC
Created attachment 993701 [details]
crc32.ko

Comment 6 Richard W.M. Jones 2015-02-19 17:24:47 UTC
Here is a diff of the instructions used in crc32-arm64 3.19 vs 3.20.

 ldrh
 mov
 mvn
+nop
 orr
 ret

Seems strange.  I checked the 3.20 module and there is a nop instruction
inserted after every ret or unconditional branch.  I have no idea if
nop would be a problem on aarch64.  Maybe this is a wild goose chase.

Comment 7 Richard W.M. Jones 2015-02-20 09:45:34 UTC
I noticed that qemu dumps the registers on stderr before exiting:

error: kvm run failed Function not implemented
PC=fffffe000046cf4c  SP=fffffe0028383ba0
X00=fffffdfffaa20020 X01=fffffe0028383c00 X02=fffffffffffffffc X03=00000000d503201f
X04=fffffdfffaa20024 X05=ffffffffffffffff X06=0000000000000bb0 X07=fffffe0001a3c3b8
X08=fffffe0028380000 X09=fffffe0000f91000 X10=fffffe0001cfc000 X11=fffffe000123b000
X12=0000000000000000 X13=fffffe0001a3b808 X14=ffff000000000000 X15=ffffffffffffffff
X16=fffffe0000165898 X17=0000000000000001 X18=0000000000000d71 X19=0000040000000000
X20=fffffdfffc000020 X21=0000000000000140 X22=fffffe0029471180 X23=fffffe0000f218e8
X24=0000000000000000 X25=0000000000000000 X26=fffffe0001d6d000 X27=fffffdfffc0007a8
X28=fffffe0029660000 X29=fffffe0028383ba0 X30=fffffe00001e1bb0 PSTATE=600001c5 (flags -ZC-)

Not very helpful without knowing the address space layout of
the guest kernel.

Comment 8 Richard W.M. Jones 2015-02-24 12:17:00 UTC
I resolved PC against the symbol table, and it happens in the
guest kernel function '__copy_to_user', at the place marked
with <<< below:

fffffe00003e3040 <__copy_to_user>:
fffffe00003e3040:       8b020004        add     x4, x0, x2
fffffe00003e3044:       f1002042        subs    x2, x2, #0x8
fffffe00003e3048:       540000a4        b.mi    fffffe00003e305c <__copy_to_user+0x1c>
fffffe00003e304c:       f8408423        ldr     x3, [x1],#8
fffffe00003e3050:       f1002042        subs    x2, x2, #0x8
fffffe00003e3054:       f8008403        str     x3, [x0],#8
fffffe00003e3058:       54ffffa5        b.pl    fffffe00003e304c <__copy_to_user+0xc>
fffffe00003e305c:       b1001042        adds    x2, x2, #0x4
fffffe00003e3060:       54000084        b.mi    fffffe00003e3070 <__copy_to_user+0x30>
fffffe00003e3064:       b8404423        ldr     w3, [x1],#4
fffffe00003e3068:       d1001042        sub     x2, x2, #0x4
fffffe00003e306c:       b8004403        str     w3, [x0],#4   <<<<<<<
fffffe00003e3070:       b1000842        adds    x2, x2, #0x2
fffffe00003e3074:       54000084        b.mi    fffffe00003e3084 <__copy_to_user+0x44>
fffffe00003e3078:       78402423        ldrh    w3, [x1],#2
fffffe00003e307c:       d1000842        sub     x2, x2, #0x2
fffffe00003e3080:       78002403        strh    w3, [x0],#2
fffffe00003e3084:       b1000442        adds    x2, x2, #0x1
fffffe00003e3088:       54000064        b.mi    fffffe00003e3094 <__copy_to_user+0x54>
fffffe00003e308c:       39400023        ldrb    w3, [x1]
fffffe00003e3090:       39000003        strb    w3, [x0]
fffffe00003e3094:       d2800000        mov     x0, #0x0                        // #0
fffffe00003e3098:       d65f03c0        ret


Unfortunately qemu doesn't dump a stack trace before it exits.  I
will try to attach gdb to see if that gives any extra information.

Comment 9 Richard W.M. Jones 2015-02-24 12:24:54 UTC
gdb gives this stack trace, which looks bogus to me:

Program received signal SIGABRT, Aborted.
__copy_to_user () at arch/arm64/lib/copy_to_user.S:43
43	USER(9f, str	w3, [x0], #4	)
(gdb) bt
#0  __copy_to_user () at arch/arm64/lib/copy_to_user.S:43
#1  0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, 
    src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
#2  0x0000000000000000 in ?? ()

More gdb information:

(gdb) info registers 
x0             0xfffffdfffaa20020	-2199113301984
x1             0xfffffe0028343c20	-2198348743648
x2             0xfffffffffffffffc	-4
x3             0xd503201f	3573751839
x4             0xfffffdfffaa20024	-2199113301980
x5             0xffffffffffffffff	-1
x6             0xfffffe0000a1b588	-2199012657784
x7             0xfffffe0000a1b570	-2199012657808
x8             0xfffffe0000a1b558	-2199012657832
x9             0xfffffdfee01a4480	-2203853372288
x10            0x101010101010101	72340172838076673
x11            0x6	6
x12            0x0	0
x13            0xffffffffffffffff	-1
x14            0xffff000000000000	-281474976710656
x15            0xffffffffffffffff	-1
x16            0xfffffe000013a5e0	-2199021967904
x17            0x1	1
x18            0x0	0
x19            0x40000000000	4398046511104
x20            0xfffffdfffc000020	-2199090364384
x21            0x140	320
x22            0x0	0
x23            0xfffffe0000dc17d8	-2199008831528
x24            0xfffffe000009c5b0	-2199022615120
x25            0xfffffe0000f65000	-2199007113216
x26            0x0	0
x27            0x0	0
x28            0xfffffe0029120000	-2198334210048
x29            0xfffffe0028343bc0	-2198348743744
x30            0xfffffe00001a6558	-2199021525672
sp             0xfffffe0028343bc0	0xfffffe0028343bc0
pc             0xfffffe00003e306c	0xfffffe00003e306c <__copy_to_user+44>
cpsr           0x600001c5	1610613189
fpsr           0x0	0
fpcr           0x0	0

(gdb) frame 1
#1  0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>, 
    src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
56		ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);

Comment 10 Richard W.M. Jones 2015-02-24 14:14:32 UTC
Thread on kvmarm:
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/thread.html#13632

Comment 11 Richard W.M. Jones 2015-02-24 16:22:43 UTC
ftrace is implicated:
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/013652.html

Comment 12 Richard W.M. Jones 2015-02-24 18:12:21 UTC
Marc Zyngier posted a patch here which works for me:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/325445.html

I intend to add this to the kernel package in Rawhide unless
someone gets there first.

Comment 13 Richard W.M. Jones 2015-10-08 08:10:00 UTC
Similar new bug in 4.2.0:
https://bugzilla.redhat.com/show_bug.cgi?id=1269779


Note You need to log in before you can comment on or make changes to this bug.