Bug 1194366
| Summary: | ftrace writes to random memory when loading a module | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rawhide | CC: | drjones, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, msalter | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | aarch64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kernel-4.0.0-0.rc1.git0.2.fc23 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-03-25 12:24:59 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 910269 | ||||||||
| Attachments: |
|
||||||||
|
Description
Richard W.M. Jones
2015-02-19 16:28:23 UTC
I should note that what specifically happens is the guest starts up, and then suddenly aborts (when inserting the guest kernel module). In the test above, I was using guest kernel == host kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23. I tried this again, with: guest kernel == 3.20.0-0.rc0.git7.3.bz1193875.fc23 host kernel == 3.19.0-0.rc7.git1.1.fc22 This *also* crashes when loading the kernel module in the guest. So it seems as if the problem is some new processor instruction is used in a kernel module (possibly crc32-arm64.ko), which KVM is unable to emulate. A couple of other observations: (1) Doesn't fail with host == guest == 3.19.0-0.rc7.git1.1.fc22.aarch64 (2) I'm very certain the troublesome kernel module is either `crc32-arm64.ko' or `crc32.ko', and I'm about 90% certain it is `crc32-arm64.ko'. Created attachment 993700 [details]
crc32-arm64.ko
It occurred to me that maybe people wouldn't be able to
download the suspected modules, so I'll attached them here.
Created attachment 993701 [details]
crc32.ko
Here is a diff of the instructions used in crc32-arm64 3.19 vs 3.20. ldrh mov mvn +nop orr ret Seems strange. I checked the 3.20 module and there is a nop instruction inserted after every ret or unconditional branch. I have no idea if nop would be a problem on aarch64. Maybe this is a wild goose chase. I noticed that qemu dumps the registers on stderr before exiting: error: kvm run failed Function not implemented PC=fffffe000046cf4c SP=fffffe0028383ba0 X00=fffffdfffaa20020 X01=fffffe0028383c00 X02=fffffffffffffffc X03=00000000d503201f X04=fffffdfffaa20024 X05=ffffffffffffffff X06=0000000000000bb0 X07=fffffe0001a3c3b8 X08=fffffe0028380000 X09=fffffe0000f91000 X10=fffffe0001cfc000 X11=fffffe000123b000 X12=0000000000000000 X13=fffffe0001a3b808 X14=ffff000000000000 X15=ffffffffffffffff X16=fffffe0000165898 X17=0000000000000001 X18=0000000000000d71 X19=0000040000000000 X20=fffffdfffc000020 X21=0000000000000140 X22=fffffe0029471180 X23=fffffe0000f218e8 X24=0000000000000000 X25=0000000000000000 X26=fffffe0001d6d000 X27=fffffdfffc0007a8 X28=fffffe0029660000 X29=fffffe0028383ba0 X30=fffffe00001e1bb0 PSTATE=600001c5 (flags -ZC-) Not very helpful without knowing the address space layout of the guest kernel. I resolved PC against the symbol table, and it happens in the guest kernel function '__copy_to_user', at the place marked with <<< below: fffffe00003e3040 <__copy_to_user>: fffffe00003e3040: 8b020004 add x4, x0, x2 fffffe00003e3044: f1002042 subs x2, x2, #0x8 fffffe00003e3048: 540000a4 b.mi fffffe00003e305c <__copy_to_user+0x1c> fffffe00003e304c: f8408423 ldr x3, [x1],#8 fffffe00003e3050: f1002042 subs x2, x2, #0x8 fffffe00003e3054: f8008403 str x3, [x0],#8 fffffe00003e3058: 54ffffa5 b.pl fffffe00003e304c <__copy_to_user+0xc> fffffe00003e305c: b1001042 adds x2, x2, #0x4 fffffe00003e3060: 54000084 b.mi fffffe00003e3070 <__copy_to_user+0x30> fffffe00003e3064: b8404423 ldr w3, [x1],#4 fffffe00003e3068: d1001042 sub x2, x2, #0x4 fffffe00003e306c: b8004403 str w3, [x0],#4 <<<<<<< fffffe00003e3070: b1000842 adds x2, x2, #0x2 fffffe00003e3074: 54000084 b.mi fffffe00003e3084 <__copy_to_user+0x44> fffffe00003e3078: 78402423 ldrh w3, [x1],#2 fffffe00003e307c: d1000842 sub x2, x2, #0x2 fffffe00003e3080: 78002403 strh w3, [x0],#2 fffffe00003e3084: b1000442 adds x2, x2, #0x1 fffffe00003e3088: 54000064 b.mi fffffe00003e3094 <__copy_to_user+0x54> fffffe00003e308c: 39400023 ldrb w3, [x1] fffffe00003e3090: 39000003 strb w3, [x0] fffffe00003e3094: d2800000 mov x0, #0x0 // #0 fffffe00003e3098: d65f03c0 ret Unfortunately qemu doesn't dump a stack trace before it exits. I will try to attach gdb to see if that gives any extra information. gdb gives this stack trace, which looks bogus to me:
Program received signal SIGABRT, Aborted.
__copy_to_user () at arch/arm64/lib/copy_to_user.S:43
43 USER(9f, str w3, [x0], #4 )
(gdb) bt
#0 __copy_to_user () at arch/arm64/lib/copy_to_user.S:43
#1 0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>,
src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
#2 0x0000000000000000 in ?? ()
More gdb information:
(gdb) info registers
x0 0xfffffdfffaa20020 -2199113301984
x1 0xfffffe0028343c20 -2198348743648
x2 0xfffffffffffffffc -4
x3 0xd503201f 3573751839
x4 0xfffffdfffaa20024 -2199113301980
x5 0xffffffffffffffff -1
x6 0xfffffe0000a1b588 -2199012657784
x7 0xfffffe0000a1b570 -2199012657808
x8 0xfffffe0000a1b558 -2199012657832
x9 0xfffffdfee01a4480 -2203853372288
x10 0x101010101010101 72340172838076673
x11 0x6 6
x12 0x0 0
x13 0xffffffffffffffff -1
x14 0xffff000000000000 -281474976710656
x15 0xffffffffffffffff -1
x16 0xfffffe000013a5e0 -2199021967904
x17 0x1 1
x18 0x0 0
x19 0x40000000000 4398046511104
x20 0xfffffdfffc000020 -2199090364384
x21 0x140 320
x22 0x0 0
x23 0xfffffe0000dc17d8 -2199008831528
x24 0xfffffe000009c5b0 -2199022615120
x25 0xfffffe0000f65000 -2199007113216
x26 0x0 0
x27 0x0 0
x28 0xfffffe0029120000 -2198334210048
x29 0xfffffe0028343bc0 -2198348743744
x30 0xfffffe00001a6558 -2199021525672
sp 0xfffffe0028343bc0 0xfffffe0028343bc0
pc 0xfffffe00003e306c 0xfffffe00003e306c <__copy_to_user+44>
cpsr 0x600001c5 1610613189
fpsr 0x0 0
fpcr 0x0 0
(gdb) frame 1
#1 0xfffffe00001a6558 in __probe_kernel_write (dst=<optimized out>,
src=<optimized out>, size=<optimized out>) at mm/maccess.c:56
56 ret = __copy_to_user_inatomic((__force void __user *)dst, src, size);
ftrace is implicated: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/013652.html Marc Zyngier posted a patch here which works for me: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/325445.html I intend to add this to the kernel package in Rawhide unless someone gets there first. Similar new bug in 4.2.0: https://bugzilla.redhat.com/show_bug.cgi?id=1269779 |